OCR Explained

OCR Innovation: from TextBridge to AI-powered Archive Optimization | EasyData

OCR Intelligence for Archive Optimization

The highest achievable accuracy, cost-saving recognition for your documents, and 100% GDPR compliance.
That’s what modern OCR brings you, securely in the Dutch cloud.

Schedule a Consultation
EasyData OCR Innovation dashboard
From chaos to control,
OCR that truly understands what you need…

Old & New: The Story of OCR

OCR (Optical Character Recognition) has been the key to digital archive unlocking since the early 1990s. It once started with solutions like TextBridge and OmniPage, where paper documents were converted to searchable files with a lot of manual work. Almost every archive employee remembers the time of ‘counting dots and spots’. ABBYY FineReader brought the first truly reliable OCR solution around the turn of the millennium that merged dots into recognizable letters with its own ‘spot database’, and thus the modern standard was born that took us further in OCR development.

Historical OCR software interface

What distinguished FineReader was the combination of image recognition with linguistic context. Letters were not only seen as pixels; they were directly interpreted as words, with continuous correction through linguistic information and dictionaries.

  • TextBridge: first mass-used OCR, but mediocre with deviating layouts
  • OmniPage: strong in standard fonts, difficulty with complex layout and tables
  • ABBYY FineReader: pioneer in OCR technology, contextual correction and layout analysis

EasyData has been working on practical solutions since 1999: not just good recognition, but also the right mapping of language characteristics per industry and even organization. Think of specific legal terms, clause structures and formal language patterns used in the legal sector.

At the same time, in healthcare it’s about medical terminology, patient record structures and specific documentation standards. And with tax matters, there are unique form layouts, fiscal concepts and legal classifications that make the difference. This is how EasyData years ago already developed custom modules that we now call LLM for tax archives, healthcare records and legal files. This approach ensures that EasyData’s solutions are much more accurate than generic OCR systems and require less manual corrections.

AI & Large Language Models: OCR Reinvented

Before 2020 OCR was mainly a competition of who got the most characters in the right place — correcting afterwards was always the norm. But with the rise of AI and the first Large Language Models (LLMs), everything changed rapidly. EasyData was the first Dutch party to completely switch to LLM-driven OCR in 2020.

OCR with LLMs at EasyData
  • LLM application: recognizes semantics (meaning), not just letters
  • Archive material can be re-OCR’d; thousands of pages at once, much faster and more reliable
  • Correction work and transcription hours drop by 85%
  • Data stays safe in the Netherlands through local cloud processing

Customer example: The Belgian Senate had all their old scans re-recognized with new AI-OCR in 2024. Error percentages dropped from, a not well-scanned archive, from 75% to less than 2%, tables are now automatically exported as Excel files and difficult-to-read minutes are still correctly recognized in context.

Why Are Archives Re-Recognizing Text Now?

    The facts of innovative text recognition:
  • Up to 99% accuracy on old and poor scans
  • Complete re-recognition of millions of pages in weeks, not months
  • Files are delivered as directly searchable / bookmarked PDFs
  • Now also recognize columns, tables, PDF text layers, everything interactive and linked to your database
  • Cost reduction up to 70% compared to manual control and old OCR modules

Example: An organization had 14 million files re-read by EasyData with new OCR techniques. The export of structured data to traceable PDFs and Excel documents delivered a direct saving of €50,000 per year due to less time loss and error corrections.

Bookmarked PDF via OCR

We Recognize: “SESSION ORDINAIRE 1920-1921.”

🔹 Basic Cloud OCR

€0.0055* /per A4 page
  • Fast 1st-line support per ticket
  • Automatic platform updates
  • All EasyData Technology
  • Monthly SLA report
  • OCR process without surprises
  • Secure NextCloud server
  • PDF/A export
  • Grafana online Dashboard
Request Directly
Most Popular

🌟 Professional Cloud OCR

€0.0099* /per A4 page
  • All options from Basic Cloud OCR
  • Separate extraction of tables
  • ALTO XML export
  • Smart Layout analyses
  • Personal contact person
  • Custom metadata export
Request Directly

🏆 Enterprise Support

On Request
  • Options from ongoing packages
  • Custom OCR recognition
  • Your own trained LLMs
  • 2 million+ pages in 24 hours
  • EasyVerify for online analysis
  • EasyData Security Guarantee
Request Quote

* No startup costs from 250,000 pages per year.

Innovation: Structure, Tables and Layout Fully Automated

Modern OCR is more than just perfect recognition. EasyData introduces advanced page analysis:

Column & Table Recognition

  • Multiple columns automatically as separate text fields
  • Tables remain saved as separate spreadsheets, including line endings and cell structure
  • Output directly to Excel, CSV or database with traceable location information

ALTO/Metadata & Archive Enrichment

  • Each text unit (paragraph, footnote, heading) gets a unique location code and context tag
  • Possibility for batch unlocking to your existing archive software
  • Including automatic filling of database fields with relevant parameters

Document Archive Benefits

  • Quick search in documents via bookmarks & search terms in PDF
  • Make healthcare record data searchable per patient, period and measurement value
  • Integrate tables in your financial workflow, with smart error detection
OCR tables and layout analysis

Data Extraction: From Simple OCR to Knowledge Unlocking

Through the use of LLMs and AI, OCR becomes a full-fledged instrument for progressive data unlocking:

  • Prompt-cascading: Each question automatically generates follow-up questions so that more and more hidden connections become visible.
  • Associative knowledge archiving: New patterns and relationships emerge because AI connects data in a context-sensitive way.
  • Dialogic data exploration: Researchers, archivists or IT professionals can literally ‘converse’ with the archive for deeper insights.
Dialogic data exploration with OCR and AI

The Development of OCR Accuracy (2000-2030)

Development from ±70% to almost perfect AI-OCR.
Hover or tap on a point for that year’s innovation.

Export & Archive Integration: Interactive and Maximally Usable

New OCR Exports (2024):

  • Fully searchable, bookmarked PDF — ideal for colleagues and external clients
  • ALTO/XML: direct connection to archive software with automatic metadata mapping
  • Excel/CSV: tables and datasets directly reusable in analyses or financial systems
Example:
A municipal archive has millions of old building files as new PDFs with bookmarks and extractions.
Employees now search by name/street/year without browsing.

OCR archive data export and integration

Discover What AI-OCR Means for Your Archive

Personal analysis of your documents, concrete results within 48 hours. Free, no obligations.

💶

Direct Price Advice

Independent ROI calculation based on your current document processing

📊

Live Demo on Your Data

Personal analysis of 500-1000 sample documents from your archive

🔒

100% Dutch Cloud

GDPR-compliant, ISO27001 certified, your data stays in the Netherlands

25+ years expertise
99% accuracy
500+ satisfied organizations

Still available this week: Free proof-of-concept for archives from 10,000 documents

“EasyData’s OCR demo on our medical records was immediately convincing. From 75% to 99% accuracy meant €50,000 savings per year.”
– IT Manager, Dutch Healthcare Institution

Extensive FAQ About OCR & AI Innovation

How much better is modern AI-OCR than classic OCR tools like ABBYY FineReader?
New AI-OCR structurally achieves >99% accuracy, even with old or mediocre scans. Where classic OCR like ABBYY FineReader was around 85-90% accurate, AI-OCR consistently achieves 99%+. This makes correction work virtually nil and error percentages drop by 85-95%. Moreover, AI-OCR understands the context and semantics of documents, so unclear texts are also correctly interpreted.
Can I have re-OCR done on existing scanned material?
That’s exactly one of the biggest advantages: complete archives can be re-recognized with the latest AI engine. Even material scanned 10-20 years ago now yields dramatically better results. You gain in usability, searchability and the value of the archive rises directly. Many customers see this as a ‘no-brainer’ investment that pays for itself within months.
How does automatic table export to Excel work exactly?
AI-OCR automatically recognizes table structures in documents and exports them as full-fledged Excel files. Column names, cells, formulas and data remain intact — including location references to the original document. This means no more manual copying, and tables are directly usable for analyses, reports or further data processing. Even complex tables with merged cells are correctly interpreted.
What file formats can I expect as output?
EasyData delivers various outputs: searchable PDFs with bookmarks for easy navigation, ALTO/XML for archive software integration, Excel/CSV for tables and datasets, and DOCX for word processing. All formats maintain the link to the original document and contain metadata for tracking and compliance. You choose which format best suits your workflow.
How fast does AI-OCR process large volumes of documents?
Thanks to cloud parallelization, EasyData processes thousands of pages per hour. An archive of 1 million pages is typically fully recognized and structured within 1-2 weeks — including table extraction and metadata enrichment. For urgent projects, accelerated processing is possible. The big advantage: all processing happens in the Dutch cloud, so no data export abroad.
Is everything secure and 100% Dutch? What does this mean for GDPR compliance?
All processing runs on ISO 27001-certified, Dutch cloud servers. 100% European data sovereignty, fully NIS2-compliant and GDPR-compliant, no vendor lock-in. Your documents never leave Dutch/EU borders and are processed according to the strictest privacy standards. EasyData acts as a data processor under Dutch/EU legislation, with transparent DPAs (Data Processing Agreements) and regular compliance audits.
Who has access to my documents during processing?
Documents are processed completely automatically without human intervention. Only authorized EasyData technicians have access in exceptional cases (troubleshooting), and then only under strict logging and supervision. All employees are screened (VGB) and bound by confidentiality agreements. Optionally, you can choose on-premise processing or dedicated cloud instances for extra sensitive documents.
What are the concrete cost savings of AI-OCR?
Customers report an average of 70-85% cost savings on manual document processing. A typical example: 40 hours of manual work per week for document control is reduced to 6 hours. At €35/hour this saves €1,190 per week, or €61,880 per year. In addition, data quality rises dramatically, so fewer errors and follow-up work is needed. The investment usually pays for itself within 3-6 months.
How does OCR integrate with existing archive systems?
EasyData has standard connections with all common archive systems (SharePoint, Documentum, Alfresco, OpenText, etc.). Via REST APIs and standard export formats (ALTO/XML, CSV, JSON) OCR integrates seamlessly into your existing workflow. Metadata is automatically mapped to your database fields, and bulk import of thousands of documents happens without workflow interruption. For custom connections we offer dedicated development hours.
What does “dialogic data exploration” mean in practice?
This is a groundbreaking development: instead of only searching for keywords, you can literally ‘converse’ with your archive. Ask questions like “Show all contracts from 2019 with extension clauses” or “Which patient records contain medication changes after surgery?” The AI understands context and not only gives answers, but also suggests follow-up questions that can yield new insights. This way your archive becomes an active knowledge source instead of a passive database.
How accurate is handwriting recognition with AI-OCR?
Handwriting recognition has improved significantly thanks to AI: printed text achieves depending on document quality up to 99%+ accuracy, neat handwriting 75-95%, and even difficult-to-read handwriting is now often acceptably recognized. For handwriting-intensive archives (such as medical records or historical documents) we use specialized AI models trained on specific writing styles and terminology. Combination with context analysis leads to surprisingly good results.
Which languages does EasyData’s AI-OCR solution support?
Dutch documents are processed most accurately (99%+ accuracy), but the system supports 100+ languages including English, German, French, Spanish, and many other European languages. For multilingual documents (e.g. EU reports) the correct language is automatically detected per text block. Specialized models are available for technical terminology, legal texts, and medical documents in different languages.
How do I start with a pilot project for my organization?
We always start with a free proof-of-concept on a representative part of your archive (500-2000 documents). You get concrete results within 1 week: accuracy scores, export examples, and cost estimation for the complete project. After approval we plan phased rollout: first non-critical documents, then expansion to the complete archive. This way we minimize risks and maximize your learning effects.
What happens if AI-OCR makes errors in critical documents?
For critical documents we use a multi-layer approach: AI-OCR with 99%+ accuracy, plus optional human verification of key-fields, plus confidence scoring per extracted data. Documents below a certain confidence threshold are automatically offered for review. Moreover, the original document always remains available with direct link to the OCR output, so verification is simple. For extra certainty we offer SLAs with guaranteed accuracy levels.
Can we get on-premise implementation for extra sensitive data?
Yes, EasyData offers on-premise solutions for organizations with the highest security requirements (government, defense, health insurers). The complete AI-OCR stack can be installed locally, including the latest LLM models. Updates and new features are rolled out via secured channels. On-premise implementation does require higher hardware specifications and dedicated support, but offers absolute control over data flows and processing.

📝 About the Author

Rob Camerlink - CEO EasyData

Rob Camerlink
CEO & Founder of EasyData

25+ years pioneer in Dutch document automation | Expert in GDPR-compliant digital transformation | Expert in intelligent data solutions that help Dutch companies move forward since 1999. Registered under number FG001914 with the Dutch Data Protection Authority.

Ready to Go from Stacks of Paper to Smart Data?

Our AI-OCR delivers 99% accuracy, 85% less correction work and complete re-recognition of millions of pages. Join organizations in healthcare, legal sector and government that have transformed their archives into searchable, intelligent knowledge sources.

Guaranteed Results with European Technology

✓ GDPR-compliant processing in Dutch datacenter
✓ 25+ years expertise in document automation
✓ No vendor lock-in, transparent Dutch pricing
✓ Free proof-of-concept on your own archive material