What's new#
Fixed a bug where structured extraction with citations enabled would ignore page classification filtering. Previously, when you limited extraction to specific page classes (e.g., only transactions
pages), citations would still reference content from all pages. Now citations correctly respect page classification boundaries.
Why it matters#
- Accurate citations - citations now only reference the pages you're actually extracting from
- Cleaner results - no more citations pointing to irrelevant page content
- Expected behavior - page filtering works consistently whether citations are on or off
- Better RAG pipelines - citations align with your intended extraction scope
The bug#
When using both page classification filtering AND citations:
1# This configuration should only extract from "transactions" pages
2structured_extraction=StructuredExtractionConfig(
3 schema=transaction_schema,
4 page_classes=["transactions"], # Only extract from transaction pages
5 enable_citations=True
6)
Before (bug): Citations could reference content from account_info
or summary
pages
After (fixed): Citations only reference content from transactions
pages
How to use#
No code changes needed. Existing configurations now work as expected.
Impact#
This fix ensures consistent behavior across all extraction features and improves the reliability of citation-based RAG systems.
Status#
✅ Fixed and live. No configuration changes required.