Back to posts

New: Vision Language Models for Document Processing

Tensorlake now uses Vision Language Models (VLMs) across multiple features including page classification, figure/table summarization, and structured extraction, enabling faster and more intelligent document understanding.

Key Highlights

  • VLM-powered page classification for efficient large document processing
  • Direct visual understanding for figures, tables, and structured data extraction
  • Skip OCR entirely with VLM-based extraction for more accurate results from harder to parse documents
Open in Colab

What's New#

We've expanded our use of Vision Language Models (VLMs) across multiple DocumentAI features for faster and more accurate document processing on documents with hundreds of pages:

  • Page Classification: Identify relevant pages in large documents
  • Figure and Table Summarization: Extract insights from visual elements
  • Structured Extraction (with `skip_ocr``): Direct visual understanding for more accurate extraction on harder to parse documents (e.g. scanned documents, engineering diagrams, or documents with confusing reading order)

This changelog focuses on our enhanced page classification capabilities for demonstration. With VLM support, you can quickly process large documents by identifying and extracting from only relevant pages.

Key Improvements#

Scale & Performance#

  • Handle Large Documents: Classify documents with hundreds of pages without performance degradation
  • VLM-Powered Classification: Replaced OCR with Vision Language Models for faster, more accurate classification
  • Selective Processing: Only parse pages that matter, reducing processing time and costs
  1. Classify First: Use the classify endpoint to identify relevant pages based on your criteria
  2. Parse Selectively: Set page_range to only process the classified relevant pages
  3. Extract Efficiently: Apply structured extraction only to pages containing the information you need

Use Case Example: SEC Filings Analysis#

This approach is particularly powerful for extracting specific information from lengthy documents like SEC filings. For example, when analyzing cryptocurrency holdings across multiple companies' 10-K and 10-Q reports:

  • Challenge: Each filing can be 100-200+ pages, but crypto-related information might only appear on 10-20 pages
  • Solution: First classify pages containing "digital assets holdings", then extract structured data only from those pages
  • Result: 80-90% reduction in processing time and more focused, accurate extractions

Code Example#

1from tensorlake.documentai import DocumentAI, PageClassConfig 2 3doc_ai = DocumentAI() 4 5# Step 1: Classify pages 6page_classifications = [ 7 PageClassConfig( 8 name="digital_assets_holdings", 9 description="Pages showing cryptocurrency holdings on balance sheet..." 10 ) 11] 12 13parse_id = doc_ai.classify( 14 file_url=filing_url, 15 page_classifications=page_classifications 16) 17 18result = doc_ai.wait_for_completion(parse_id=parse_id) 19 20# Step 2: Parse only relevant pages 21relevant_pages = result.page_classes[0].page_numbers 22page_range = ",".join(str(i) for i in relevant_pages) 23 24final_result = doc_ai.parse_and_wait( 25 file=filing_url, 26 page_range=page_range, 27 structured_extraction_options=[...] 28)

Benefits#

  • Cost Efficiency: Process only what you need
  • Speed: Reduce processing time by focusing on relevant content
  • Accuracy: VLM classification provides better understanding of page content
  • Scalability: Handle large document sets without compromising performance

Try It Out#

Check out our example notebook demonstrating how to extract cryptocurrency metrics from SEC filings using the new classification approach.

Getting Started#

Update to the latest version of Tensorlake:

1pip install --upgrade tensorlake

Then start classifying, summarizing, and extracting with improved efficiency!

This website uses cookies to enhance your browsing experience. By clicking "Accept All Cookies", you consent to the use of ALL cookies. By clicking "Decline", only essential cookies will be used. Read our Privacy Policy for more details.