HomeBlogPricingCareersDocsGitHubSlack community
Field notes/Research/Gemini 3 OCR - Quick Findings

Gemini 3 OCR - Quick Findings

Gemini 3 has capable OCR but delivers unstructured output; Tensorlake adds precise page slicing and well-organized JSON without manual cleanup.

SBX-01C4SBX-01E3SBX-0202SBX-0221SBX-0240SBX-025FSBX-027ESBX-029DSBX-02BCSBX-02DBSBX-02FASBX-0319SBX-0338SBX-0357SBX-0376[ RUNTIME: ACTIVE ] P50 2.45S · P99 4.12S · 5M/PROJECT

TL;DR

Gemini 3 has good OCR but is unstructured and limited. Tensorlake provides precise page slicing and well-structured JSON output with no cleanup required.

Gemini 3-Pro Integration vs Direct Usage

Gemini 3-Pro brings strong OCR capabilities, but its raw output still needs significant structuring before it's usable for downstream document workflows. While integrating Gemini 3 into our Document AI pipeline, I captured a few quick observations comparing direct Gemini 3 usage vs. running it through Tensorlake's unified extraction layer.

PDF Handling

Gemini 3 accepts PDFs directly, but does not handle page slicing. If you want to parse only a subset of pages, the control is limited—you have to manually split the PDF and stitch results back together.

Tensorlake supports precise page slicing out-of-the-box:

parse_id = doc_ai.read(
  file_url=file_url,
  page_range="1-3",            # Parse only pages 1–3
  parsing_options=parsing_options,
)

Result: Users can extract specific pages or ranges without processing the entire document.

Output Structure

Gemini 3 can generate HTML, but the structure is not well organized for downstream use:

  • sections are not clearly separated
  • layout elements aren't grouped
  • users must manually reorganize the structure

Gemini-3 generated HTML example:

<!-- Page 1 -->
<div class="page-container">
  <div class="header">
    <div class="company-info">
      <h1>ARK GLOSS CLOTHING</h1>
      <p>123 SAN SEBASTIAN ST.</p>
      <p>LOS ANGELES, CA 90015 (US)</p>
      <p>(123) 555-1234</p>
      <p>info@arkglossclothing.com</p>
      <p style="margin-top: 10px;">Sales Rep. :</p>
    </div>
 
    <div class="invoice-title">
      <h1>I N V O I C E</h1>
      <h2>INV-20212</h2>
 
      <div class="invoice-details">
        <table>
          <tr><td>INVOICE DATE</td><td>01/23/2024</td></tr>
          <tr><td>CUSTOMER TYPE</td><td>STORE</td></tr>
          <tr><td>PO NUMBER</td><td></td></tr>
          <tr><td>SHIP DATE</td><td>01/26/2024</td></tr>
        </table>
      </div>
    </div>
  </div>
</div>

Tensorlake unified JSON output (Gemini-3 plugged in):

{
  "page_number": 1,
  "page_fragments": [
    {
      "fragment_type": "title",
      "content": {
        "content": "INVOICE"
      },
      "reading_order": 1
    },
    {
      "fragment_type": "text",
      "content": {
        "content": "INV-20212"
      },
      "reading_order": 2
    },
    {
      "fragment_type": "text",
      "content": {
        "content": "ARK GLOSS CLOTHING\n\n123 SAN SEBASTIAN ST.\nLOS ANGELES, CA 90015 (US)\n(123) 555-1234\ninfo@arkglossclothing.com"
      },
      "reading_order": 3
    },
    {
      "fragment_type": "table",
      "content": {
        "content": "INVOICE DATE01/23/2024CUSTOMER TYPESTOREPO NUMBERSHIP DATE01/26/2024",
        "html": "<table><tbody><tr><td>INVOICE DATE</td><td>01/23/2024</td></tr><tr><td>CUSTOMER TYPE</td><td>STORE</td></tr><tr><td>PO NUMBER</td><td></td></tr><tr><td>SHIP DATE</td><td>01/26/2024</td></tr></tbody></table>",
        "markdown": "| INVOICE DATE | 01/23/2024 |\n| CUSTOMER TYPE | STORE |\n| PO NUMBER |  |\n| SHIP DATE | 01/26/2024 |"
      },
      "reading_order": 4
    }
  ]
}

How Tensorlake Differs

Tensorlake's integration produces clean, well-organized structured output, including:

  • clear layout groups
  • well-defined document sections
  • table structures represented cleanly in both HTML and markdown
  • consistent fragment types that work across all OCR/VLM backends

Result: Developers receive a clean, predictable document structure without custom parsing or prompt iteration.

Advanced Usage vs. Simple Usage

Advanced Gemini users can approximate similar structure with multiple prompt iterations and custom post-processing. With Tensorlake, users get a clean, structured result with a single API call.

TT
WRITTEN BYTensorlake TeamEngineering
◆ FIELD NOTES — WEEKLY

Engineering posts, in your inbox.

One dispatch per week from the Tensorlake team — runtime deep-dives, product updates, and the occasional benchmark that surprised us.