How Easie helped UCLA digitize 287 years of Japanese historical records using AI

Why submitting full pages to an LLM failed and what we built instead

We are excited to share that Easie contributed to public policy research at UCLA on using artificial intelligence to recover historical social conflict data. The project is part of Easie's ongoing collaboration with UCLA dating back to 2020.

A recent paper published by the UCLA Luskin School of Public Affairs acknowledges Easie for AI, cloud, and Python support and builds a framework for digitizing historical sources spanning Latin America since 1492, Imperial Russia, and Tokugawa Japan.

Easie partnered with UCLA on the most technically demanding part of the project: accurately extracting data from the Japanese source chronology of unrest from 1590 to 1877, which is presented in scanned vertically oriented tables of Kanji, where each event is a column rather than a row. Submitting full pages directly to an LLM produced poor results, so Easie built a pipeline that crops each event into its own image, automatically splits it into ten substrips matching the ten event variables, and applies deterministic enhancements including bilateral filtering, Lanczos interpolation, adaptive histogram equalization, and binarization. From there, the substrips were processed through EasieOps for extracting structured JSON from each image prior to LLM-based translation.

 

The tables are vertically oriented, so a column is the event and each row of the column is a variable. The ten variables are year, month, day, province, region, name, area, cause/demand, form, and source (1).

 

This project is a demonstration of how advanced computer vision and deterministic preprocessing can dramatically improve outputs before AI-based extraction, especially on sources where end-to-end LLM approaches break down.

The paper's cost analysis shows the AI workflow beats traditional research assistant coding for sources longer than roughly 75 pages, making decades of previously inaccessible historical data economically viable to digitize, even from source material that resisted off-the-shelf tooling.

Proud of the team and looking forward to sharing more in the future from the other work we have underway with UCLA.

Need help building advanced data extraction systems?

Easie helps teams design and ship AI-powered extraction infrastructure. If you are working through similar use cases in this article and want hands-on guidance, please reach out.


Next
Next

Considerations for optimizing media retrieval systems using multimodal embeddings