Gemini 3 OCR - Quick Findings

TL;DR

Gemini 3 has good OCR but is unstructured and limited. Tensorlake provides precise page slicing and well-structured JSON output with no cleanup required.

Gemini 3-Pro Integration vs Direct Usage

Gemini 3-Pro brings strong OCR capabilities, but its raw output still needs significant structuring before it's usable for downstream document workflows. While integrating Gemini 3 into our Document AI pipeline, I captured a few quick observations comparing direct Gemini 3 usage vs. running it through Tensorlake's unified extraction layer.

PDF Handling

Gemini 3 accepts PDFs directly, but does not handle page slicing. If you want to parse only a subset of pages, the control is limited—you have to manually split the PDF and stitch results back together.

Tensorlake supports precise page slicing out-of-the-box:

parse_id = doc_ai.read(
  file_url=file_url,
  page_range="1-3",            # Parse only pages 1–3
  parsing_options=parsing_options,
)

Result: Users can extract specific pages or ranges without processing the entire document.

Output Structure

Gemini 3 can generate HTML, but the structure is not well organized for downstream use:

sections are not clearly separated
layout elements aren't grouped
users must manually reorganize the structure

Gemini-3 generated HTML example:

<!-- Page 1 -->
<div class="page-container">
  <div class="header">
    <div class="company-info">
      <h1>ARK GLOSS CLOTHING</h1>
      <p>123 SAN SEBASTIAN ST.</p>
      <p>LOS ANGELES, CA 90015 (US)</p>
      <p>(123) 555-1234</p>
      <p>info@arkglossclothing.com</p>
      <p style="margin-top: 10px;">Sales Rep. :</p>
    </div>
 
    <div class="invoice-title">
      <h1>I N V O I C E</h1>
      <h2>INV-20212</h2>
 
      <div class="invoice-details">
        <table>
          <tr><td>INVOICE DATE</td><td>01/23/2024</td></tr>
          <tr><td>CUSTOMER TYPE</td><td>STORE</td></tr>
          <tr><td>PO NUMBER</td><td></td></tr>
          <tr><td>SHIP DATE</td><td>01/26/2024</td></tr>
        </table>
      </div>
    </div>
  </div>
</div>

Tensorlake unified JSON output (Gemini-3 plugged in):

{
  "page_number": 1,
  "page_fragments": [
    {
      "fragment_type": "title",
      "content": {
        "content": "INVOICE"
      },
      "reading_order": 1
    },
    {
      "fragment_type": "text",
      "content": {
        "content": "INV-20212"
      },
      "reading_order": 2
    },
    {
      "fragment_type": "text",
      "content": {
        "content": "ARK GLOSS CLOTHING\n\n123 SAN SEBASTIAN ST.\nLOS ANGELES, CA 90015 (US)\n(123) 555-1234\ninfo@arkglossclothing.com"
      },
      "reading_order": 3
    },
    {
      "fragment_type": "table",
      "content": {
        "content": "INVOICE DATE01/23/2024CUSTOMER TYPESTOREPO NUMBERSHIP DATE01/26/2024",
        "html": "<table><tbody><tr><td>INVOICE DATE</td><td>01/23/2024</td></tr><tr><td>CUSTOMER TYPE</td><td>STORE</td></tr><tr><td>PO NUMBER</td><td></td></tr><tr><td>SHIP DATE</td><td>01/26/2024</td></tr></tbody></table>",
        "markdown": "| INVOICE DATE | 01/23/2024 |\n| CUSTOMER TYPE | STORE |\n| PO NUMBER |  |\n| SHIP DATE | 01/26/2024 |"
      },
      "reading_order": 4
    }
  ]
}

How Tensorlake Differs

Tensorlake's integration produces clean, well-organized structured output, including:

clear layout groups
well-defined document sections
table structures represented cleanly in both HTML and markdown
consistent fragment types that work across all OCR/VLM backends

Result: Developers receive a clean, predictable document structure without custom parsing or prompt iteration.

Advanced Usage vs. Simple Usage

Advanced Gemini users can approximate similar structure with multiple prompt iterations and custom post-processing. With Tensorlake, users get a clean, structured result with a single API call.

WRITTEN BYTensorlake TeamEngineering

Gemini 3 OCR - Quick Findings

TL;DR

Gemini 3-Pro Integration vs Direct Usage

PDF Handling

Output Structure

How Tensorlake Differs

Advanced Usage vs. Simple Usage

Harbor x TensorLake: Infrastructure for Agentic EvalsTensorLake is now integrated as a first-class environment provider in Harbor, enabling scalable agent evaluation with secure ephemeral MicroVMs.

Autoresearch on steroids with sandboxesAn LLM agent can propose incremental training-script improvements, but safely executing untrusted code requires isolated sandboxes with resource limits — that's where Tensorlake comes in. Here's the overnight hill-climb, end to end.

Suspend vs. snapshot: pause a sandbox, or save it for reuse?One is a pause button, the other is a save file. Same state, different question — and the answer shapes your cost model, your fan-out pattern, and which failures you can recover from.

Tensorlake is now an official Harbor environment runtimeHarbor defines and evaluates terminal tasks. Tensorlake provides the MicroVM execution layer. Together they are a full evaluation stack for CLI agents.

Engineering posts, in your inbox.