AI Digitization

FEMC — Historical Records Digitization

Turning decades of New York State's ecological paper archives into structured, searchable data.

LLM extraction OCR Data Modeling NY State

Unlocking decades of paper ecology data

The Forest Ecosystem Monitoring Cooperative holds a deep archive of ecological field documents from New York State — surveys, monitoring reports, and study results going back decades. The catch: most of it lived as scanned PDFs and paper documents, effectively invisible to modern analysis.

This project is an AI-assisted digitization effort that turns those static archives into structured, queryable data — without losing the fidelity or context of the original observations.

What I'm building

  • OCR pipelines tuned for the typography and quality variance of historical documents
  • LLM-driven structured extraction — pulling species, locations, measurements, dates, and methodology from prose
  • A normalized data schema that lines up with FEMC's existing ecological datasets
  • Quality-control workflows so a human reviewer can verify extractions before they hit the dataset

Outcome

NY State's historical ecological record becomes searchable and joinable with current monitoring data — opening up long-baseline analyses that weren't practical before.

Have a problem worth solving?

I'd love to hear about it. A short note is all it takes to get started.

Start a conversation