Changelog

Stay up to date with the latest changes and improvements to Tensorlake

Fixed: Citation filtering now respects page classification limits

Fixed bug where citations ignored page classification filtering, ensuring citations only reference pages you're actually extracting from.

  • Citations now correctly respect page classification boundaries
  • Cleaner results with no citations pointing to irrelevant page content
  • Better RAG pipeline accuracy with properly scoped citations
Read more

Fixed token limit issues with large CSV/Excel tables

Fixed token limit issues with large, dense CSV and Excel tables through automatic splitting and intelligent result merging.

  • Handles 500+ row spreadsheets and extensive financial reports that previously failed
  • Automatic table splitting preserves relationships and maintains extraction accuracy
  • Transparent processing - no configuration changes or manual preprocessing required
Read more

Page classification now includes reasoning explanations

Page classification results now include the model's reasoning for each decision to help with debugging and prompt engineering.

  • Detailed explanations for why pages received specific classifications
  • Helps identify prompt engineering opportunities and debug classification errors
  • Automatically included in all classification results with no performance impact
Read more

Page classification now defaults to multi-label (multiple classes per page)

Page classification now defaults to multi-label mode, allowing pages to receive multiple classification labels simultaneously.

  • Single pages can be classified as multiple page types (e.g., account_info AND transactions)
  • Better handling of complex documents like bank statements and legal docs
  • Backward compatible - multi-class mode still available via configuration
Read more

Page summaries now include optional full-page image context

Optionally reference the full-page during figure and table summarization to preserve spatial context in complex layouts.

  • Full-page image context for better spatial relationship understanding
  • Reduces hallucinations in multi-column and form-based documents
  • Optional setting - maintains existing fragment-level behavior as default
Read more

Document Ingestion now supports XML, DOC, and Markdown files

Document ingestion now supports XML, legacy DOC, and Markdown files with the same parsing capabilities as existing formats.

  • Native XML parsing for config files and structured data exports
  • Legacy DOC file support for older document repositories
  • Markdown processing for documentation and technical specs
Read more

Table Recognition now parses ~1,500-cell tables (with structure preserved)

New model is live—reliably extracting very large, dense tables from PDFs (incl. scans) while preserving header hierarchy, row/col spans, and cell boundaries, with fast HTML/CSV export and bbox for citations.

  • Robust on ~1,500-cell tables; resilient to complex layouts and scanned documents.
  • Preserves header hierarchy and row/column spans; faithful HTML outputs.
  • Improved cell boundary detection and multi-row/multi-col header parsing.
  • +3 more...
Read more
Major Releasev2.0

DocumentAI API v2

V2 of the DocumentAI API is fully in production in the Python SDK and on the Playground, offering unified document processing with advanced structured extraction, page classification, and enrichment capabilities.

  • Unified Parse and Jobs API
  • Advanced Structured Extraction with JSON Schema
  • Page Classification and Signature Detection
  • +2 more...
Read more
Minor Releasev2.2.0

Advanced Schema Extraction

Extract structured data from any document using Pydantic schemas with improved accuracy and multi-format support

  • Research paper metadata extraction
  • Pydantic schema support
  • Multi-format document support
  • +1 more...
Read more

This website uses cookies to enhance your browsing experience. By clicking "Accept All Cookies", you consent to the use of ALL cookies. By clicking "Decline", only essential cookies will be used. Read our Privacy Policy for more details.