AI Document Processing Blog & Case Studies

Featured

Introducing Extend UI: open source UI kit for modern document apps

Andrew Luo, Jing Reyhan

June 9, 2026

Introducing Extend UI: open source UI kit for modern document apps

Engineering

LongArray-Extract: A Benchmark for Complete Structured Extraction at Scale

LongArray-Extract tests whether extraction systems can complete structured array outputs across hundreds or thousands of rows.

Jing Reyhan, Joseph Bajor, Cindy Hao

Engineering

RealDoc-Bench: A Real-World Benchmark for Document Agents

RealDoc-Bench evaluates whether parsers preserve the structure agents need across real-world document workflows.

Joon Kim, Ameya Joshi, Cindy Hao, Jing Reyhan

Engineering

Why Layout Matters

How Extend rebuilt its layout model for Parse 2.0, why layout detection drives parsing accuracy, and how stronger document structure improves deterministic pipelines, model routing, cost, and latency.

Jing Reyhan, Eli Badgio

Product

Introducing: Parse 2.0 and RealDoc-Bench

Today we're launching Parse 2.0, our SOTA layout-first document parsing API for agents, alongside RealDoc-Bench, an applied benchmark measuring parsing performance on the real-world documents agents actually encounter in production.

Jing Reyhan

Customers

How Flatiron Health scaled document extraction to 100M+ pages with Extend

How Flatiron Health replicated 6 months of in-house NGS extraction work in 2 weeks with Extend, scaling biomarker data across 5 million people with cancer.

Cindy Hao

Engineering

PoliTax Split: Extend's Document Splitting Benchmark

PoliTax Split evaluates document splitting on long public-sector tax packets with subtle document boundaries.

Joe Bajor, Cindy Hao

Customers

How Nuvocargo hit 99% Document Accuracy for US-Mexico Cross-Border Freight

How Nuvocargo used Extend to hit 97-99% accuracy across document intake, classification, extraction, and shipment attribution, with near zero human involvement.

Cindy Hao

Customers

How Mercury shipped low-latency document processing for 300K+ users

How Mercury uses Extend to power real-time document validation in onboarding, handling dozens of languages and formats with sub-7 second latency.

Cindy Hao

Engineering

How We Built Composer, Our Schema Optimization Agent

Why we abandoned workflows for agents, how we learned to starve our context window, and why our optimization agent accidentally became a data quality tool.

Gus Eggert, Richard Li, Cindy Hao

Extend Blog

Introducing Extend UI: open source UI kit for modern document apps

LongArray-Extract: A Benchmark for Complete Structured Extraction at Scale

RealDoc-Bench: A Real-World Benchmark for Document Agents

Why Layout Matters

Introducing: Parse 2.0 and RealDoc-Bench

How Flatiron Health scaled document extraction to 100M+ pages with Extend

PoliTax Split: Extend's Document Splitting Benchmark

How Nuvocargo hit 99% Document Accuracy for US-Mexico Cross-Border Freight

How Mercury shipped low-latency document processing for 300K+ users

How We Built Composer, Our Schema Optimization Agent

Turn your documents into high quality data

Introducing Extend UI: open source UI kit for modern document apps

LongArray-Extract: A Benchmark for Complete Structured Extraction at Scale

RealDoc-Bench: A Real-World Benchmark for Document Agents

Why Layout Matters

Introducing: Parse 2.0 and RealDoc-Bench

How Flatiron Health scaled document extraction to 100M+ pages with Extend

PoliTax Split: Extend's Document Splitting Benchmark

How Nuvocargo hit 99% Document Accuracy for US-Mexico Cross-Border Freight

How Mercury shipped low-latency document processing for 300K+ users

How We Built Composer, Our Schema Optimization Agent