TL;DR
Gemini 3 has good OCR but is unstructured and limited. Tensorlake provides precise page slicing and well-structured JSON output with no cleanup required.
Gemini 3-Pro Integration vs Direct Usage
Gemini 3-Pro brings strong OCR capabilities, but its raw output still needs significant structuring before it's usable for downstream document workflows. While integrating Gemini 3 into our Document AI pipeline, I captured a few quick observations comparing direct Gemini 3 usage vs. running it through Tensorlake's unified extraction layer.
PDF Handling
Gemini 3 accepts PDFs directly, but does not handle page slicing. If you want to parse only a subset of pages, the control is limited—you have to manually split the PDF and stitch results back together.
Tensorlake supports precise page slicing out-of-the-box:
parse_id = doc_ai.read(
file_url=file_url,
page_range="1-3", # Parse only pages 1–3
parsing_options=parsing_options,
)Result: Users can extract specific pages or ranges without processing the entire document.
Output Structure
Gemini 3 can generate HTML, but the structure is not well organized for downstream use:
- sections are not clearly separated
- layout elements aren't grouped
- users must manually reorganize the structure
Gemini-3 generated HTML example:
<!-- Page 1 -->
<div class="page-container">
<div class="header">
<div class="company-info">
<h1>ARK GLOSS CLOTHING</h1>
<p>123 SAN SEBASTIAN ST.</p>
<p>LOS ANGELES, CA 90015 (US)</p>
<p>(123) 555-1234</p>
<p>info@arkglossclothing.com</p>
<p style="margin-top: 10px;">Sales Rep. :</p>
</div>
<div class="invoice-title">
<h1>I N V O I C E</h1>
<h2>INV-20212</h2>
<div class="invoice-details">
<table>
<tr><td>INVOICE DATE</td><td>01/23/2024</td></tr>
<tr><td>CUSTOMER TYPE</td><td>STORE</td></tr>
<tr><td>PO NUMBER</td><td></td></tr>
<tr><td>SHIP DATE</td><td>01/26/2024</td></tr>
</table>
</div>
</div>
</div>
</div>Tensorlake unified JSON output (Gemini-3 plugged in):
{
"page_number": 1,
"page_fragments": [
{
"fragment_type": "title",
"content": {
"content": "INVOICE"
},
"reading_order": 1
},
{
"fragment_type": "text",
"content": {
"content": "INV-20212"
},
"reading_order": 2
},
{
"fragment_type": "text",
"content": {
"content": "ARK GLOSS CLOTHING\n\n123 SAN SEBASTIAN ST.\nLOS ANGELES, CA 90015 (US)\n(123) 555-1234\ninfo@arkglossclothing.com"
},
"reading_order": 3
},
{
"fragment_type": "table",
"content": {
"content": "INVOICE DATE01/23/2024CUSTOMER TYPESTOREPO NUMBERSHIP DATE01/26/2024",
"html": "<table><tbody><tr><td>INVOICE DATE</td><td>01/23/2024</td></tr><tr><td>CUSTOMER TYPE</td><td>STORE</td></tr><tr><td>PO NUMBER</td><td></td></tr><tr><td>SHIP DATE</td><td>01/26/2024</td></tr></tbody></table>",
"markdown": "| INVOICE DATE | 01/23/2024 |\n| CUSTOMER TYPE | STORE |\n| PO NUMBER | |\n| SHIP DATE | 01/26/2024 |"
},
"reading_order": 4
}
]
}How Tensorlake Differs
Tensorlake's integration produces clean, well-organized structured output, including:
- clear layout groups
- well-defined document sections
- table structures represented cleanly in both HTML and markdown
- consistent fragment types that work across all OCR/VLM backends
Result: Developers receive a clean, predictable document structure without custom parsing or prompt iteration.
Advanced Usage vs. Simple Usage
Advanced Gemini users can approximate similar structure with multiple prompt iterations and custom post-processing. With Tensorlake, users get a clean, structured result with a single API call.