Unlocking decades of paper ecology data
The Forest Ecosystem Monitoring Cooperative holds a deep archive of ecological field documents from New York State — surveys, monitoring reports, and study results going back decades. The catch: most of it lived as scanned PDFs and paper documents, effectively invisible to modern analysis.
This project is an AI-assisted digitization effort that turns those static archives into structured, queryable data — without losing the fidelity or context of the original observations.
What I'm building
- OCR pipelines tuned for the typography and quality variance of historical documents
- LLM-driven structured extraction — pulling species, locations, measurements, dates, and methodology from prose
- A normalized data schema that lines up with FEMC's existing ecological datasets
- Quality-control workflows so a human reviewer can verify extractions before they hit the dataset
Outcome
NY State's historical ecological record becomes searchable and joinable with current monitoring data — opening up long-baseline analyses that weren't practical before.