Let Claude read your Gas Meter with this Amazing new Feature

Anthropic's Claude models now uniquely integrate the ability to read PDFs, processing both text and images simultaneously. This breakthrough allows for a more coherent understanding of content, enhancing accuracy in real-world applications, such as gas meter readings or passport photo verificatio...

Let Claude read your Gas Meter with this Amazing new Feature

While many large language models (LLMs) boast multimodal capabilities, allowing them to process text and images, none had the ability to read both simultaneously within the same context.

Here is the official announcement from Anthropic.

PDF support (beta) - Anthropic
The Claude 3.5 Sonnet models now support PDF input and understand both text and visual content within documents.

Traditionally, if you wanted to process a combination of text and images—as is common in many PDF files—you had to input them separately. This separation often led to a loss of context and a less efficient experience.

But Anthropic has recently announced an incredible update in this area:

Claude's enhanced PDF reading capabilities that seamlessly integrate text and images, processing them interactively as they appear in a PDF file.

This means Claude can maintain the context of images alongside the text, providing a more coherent and accurate understanding of the content.

Sounds ordinary?

It is not. When you consider the possible use cases, the potential is astounding.

Let me share a recent example from a client in Germany's energy industry. They require their customers to read gas meters according to specific instructions. Customers who have done it before find it easy the second time. However, for first-time users, it can be cumbersome to read through an entire manual or watch a lengthy video just to submit their meter readings to the provider.

But with this new feature from Claude, this challenge has become a thing of the past.

There are many tools that can help you understand images in PDF, including tables, etc. with powerful tools such as unstructured. But supporting this capability in ChagGPT and Cluade for everyday use cases (but also via API) takes it to a whole new level.

Now, all the customer needs to do is upload the PDF with the instructions and a picture of their gas meter. Claude does the rest, accurately interpreting the meter reading and even explaining how to read it for future reference. This not only simplifies the process but also empowers users to learn and become more proficient over time.

So, I conducted a test, and Claude did not disappoint...

The Gas Meter Experiment: A Real-World Demonstration

To illustrate Claude’s new capabilities, let’s dive into the experiment I conducted with the client.

Initial Attempt with Low-Resolution Images

The first trial involved a sample gas meter image and a PDF instruction manual:

Prompt:

“What is the gas meter reading according to the instructions?”

Provided Resources:

Instruction Manual:

Gas Meter Image:

Result:

Claude interpreted the reading as 024.435 m³. However, according to the instructions, the correct reading should omit the decimal, displaying only 024 m³ and 223 before the decimal.

The Problem:

The resolution of the image was below 600 pixels, making it difficult for Claude to accurately interpret the digits.

Second Attempt with High-Resolution Images

Determined to achieve accuracy, I provided a higher-resolution image:

High-Resolution Gas Meter Image:

Reusing the Same Prompt and Instructions:

“What is the gas meter reading according to the instructions?”

Result:

Claude successfully interpreted the reading as 028 m³ and 223, completely adhering to the manual’s guidelines.

Final Output:

Evauluation:

This experiment showcases Claude’s ability to accurately read and interpret gas meter readings by effectively processing both the image and the accompanying PDF instructions in context.

The initial hiccup with low-resolution images highlights the importance of quality inputs, but once addressed, Claude delivered precise results.

Beyond Gas Meters: Expanding the Possibilities

Now, let's abstract this use case. With Claude's new feature, you can match a set of instructions with images to a current state captured either in a document or even in the physical world (by simply taking a photo).

Sounds too abstract?

Here are a few more simple examples you might be familiar with

1. Passport Photo Verification

Scenario:

When applying for a passport, your photo must meet specific requirements regarding size, background color, head position, and more.

How Claude Helps:

  • Manual Reference: Access the official passport photo requirements PDF, complete with example images of compliant and non-compliant photos.
  • Photo Analysis: Simultaneously analyze your submitted photo against these guidelines.
  • Instant Feedback: Receive real-time feedback on whether your photo meets all necessary criteria, reducing the chances of rejection due to minor errors.

2. Furniture Assembly Assistance

Scenario:

Assembling furniture, especially intricate pieces from brands like IKEA, often involves following detailed instruction manuals with step-by-step images.

How Claude Helps:

  • Manual Integration: Access the IKEA assembly manual in PDF format, complete with illustrative diagrams for each step.
  • Progress Monitoring: Take photos of your current assembly stage.
  • Guided Assistance: Compare your progress photo with the corresponding manual diagrams, providing guidance on next steps or highlighting discrepancies if something isn’t aligned correctly.

And many more...

The Context Revolution: Why This Matters

Many LLMs can process text and images, but most lack the ability to simultaneously interpret both within the same context. This often requires users to input elements separately, leading to a fragmented understanding and increased chance of errors.

Claude’s Breakthrough:

Anthropic has introduced a groundbreaking update that enables Claude to process PDFs containing both text and images interactively, maintaining the context as they appear within the document. This means:

  • Integrated Processing: Claude can analyze a PDF’s text and embedded images together, understanding their relationship and context.
  • Seamless Context Retention: Unlike traditional systems where text and images are handled separately, Claude preserves the contextual link between them, enhancing accuracy and reliability.

Why It’s a Game-Changer:

Imagine trying to learn to read a gas meter for the first time. You wouldn't just read the manual and look at the meter separately; you'd naturally refer back and forth between them. Claude mimics this human approach, making interactions with AI more intuitive and effective.

Abstracting the Use Case in an phyiscal World

Claude’s ability to match instructions with images extends beyond specific examples. Here are some more applications I could think of:

  1. Vehicle Inspection Documentation:
  2. Manual: Diagrams of proper tire tread depth measurements.
  3. Inspection Target: High-resolution photos of actual tire treads.
  4. AI Task: Compare tread depth against manual specifications.
  5. Food Label Compliance:
  6. Manual: Regulatory guidelines for nutritional label formats.
  7. Inspection Target: Photos of food packaging labels.
  8. AI Task: Verify label compliance with formatting and nutritional information.
  9. Medical Device Calibration:
  10. Manual: Technical documentation for calibration standards.
  11. Inspection Target: Photos of device displays during calibration.
  12. AI Task: Ensure calibration readings match standards.

Final Thoughts

With this new capability, Claude opens up a whole new world of possibilities for using cases that are not just text-based, but heavily image-based. Please note, however, that even with these new powerful capabilities, there are certain limitations in terms of resolution and the amount of information that can be processed at one time. And the best way to find out how this technology can help you in your everyday life is to try it out.