98% Accuracy Rate with ChatGPT-4o Vision OCR

Situation

A client reached out to us because they were interested in using AI to automatically extract data from paperwork. They had heard about ChatGPT-4o’s OCR capabilities, and wanted to see if it could be used to automate out the heavy amounts of paperwork processing their business was used to.

The challenge was that the paperwork was not standardized. Different fields had different spellings, formats, and were often located in different places on each document. What further complicated the matter was that some documents scanned in weren’t valid documents to process either.

They had 10 fields per document they wanted to extract from each valid document. 

Task

We were tasked with developing a tool that could extract all of the set fields in documents automatically. We were to first attempt this using ChatGPT-4o and aim to achieve a 90% accuracy rate with all extractions. If this rate wasn’t hit, we were to try different techniques and technologies until that rate was achieved.

Action

We first established a baseline for how well ChatGPT Vision API could extract the data from their paperwork on a first pass. We discovered that ChatGPT had a baseline accuracy rate between 80-90% for all documents with our first attempt. This indicated that it was a suitable base to build off of, and the rest of the work was handling the edge cases to bring the rate over 90%.

We then tried to collect all of the different formats and spellings for each field and passed in more description prompts to ChatGPT. Combined with other OCR technologies and techniques, we would take multiple scans through each document and compare results to capture all edge cases that the baseline Vision API could not capture.

Results

Our final delivered product managed to extract data 98% of the time, across hundreds of tested fields and documents. We delivered the initial prototype within 1 month, and spent the next month improving accuracy to get the rate up as high as possible. The client was pleased with the results and we are currently helping them onboard more users onto the tool.

Key Lessons

  • ChatGPT-4o Vision API performs extremely well and can capture most use-cases even in quick prototypes.
  • Smart prompting, and a combination of different OCR technologies together can help improve accuracy
  • The quality of the image is critical for higher extraction rates. We found that improving the initial image resolution, upscaling the image, and being more diligent on how the papers were scanned in were critical to improve the accuracy rate of the product.

Download RAG System Design Guide

“Complete the form below to access our guide on how to design RAG chatbots and optimize it for performance and accuracy.

    See our 2024 Capabilities Brochure

      Download Our Success Measurement Guide

      Complete the form below to access our guide on how we measure AI success. Learn about the KPIs we track and how we ensure measurable returns on your AI investments, so you can maximize the impact of your AI projects.

        Download Our Methodologies Deck

        Fill out the form below to get our Methodologies Deck, where we walk you through our systematic approach to delivering AI projects. Learn how we align our solutions with your business objectives and ensure a seamless project execution process.

          Download Our Capabilities Deck

          Please fill out the form below to access our Capabilities Deck and discover how our AI solutions can drive transformation in your organization. Gain insights into our services, real-world case studies, and how we can help your business leverage the power of AI.