HeadlinesBriefing favicon HeadlinesBriefing.com

Parsewise Launches API for Cross-Document Reasoning and Structured Data Extraction

Hacker News •
×

Parsewise emerged from YC's P25 batch with an API that processes hundreds or thousands of documents to extract structured data while maintaining full lineage tracking. The platform handles PDFs, spreadsheets, and other file types, outputting schema-compliant results where every value traces back to specific word-level citations across multiple documents.

Founders Greg and Max bring complementary experience to this problem. Greg's background includes building ETL systems and AI workflows at Palantir, while Max tackled complex financial data analysis at Bain. Their target users need to extract information from insurance policies, call transcripts, and email archives without manual point-by-point extraction.

Unlike traditional RAG approaches that sample data, Parsewise exhaustively finds all relevant values for each query. The system achieved state-of-the-art results on the Databricks OfficeQA benchmark using Gemini models for visual reasoning. It employs vLLMs for parsing combined with smaller models for large-scale search, then leverages larger models for resolution decisions and uncertainty detection.

The technology supports multiple cloud providers and private network deployments. Business teams can validate extractions instantly through built-in tools, while technical teams integrate the API into their own applications. This focus on verifiability addresses the trust gap that often blocks adoption of AI-powered data extraction tools.