Automation

Has Anthropic Claude just wiped out an entire industry? ( Part 2/3)

Discover the revolutionary "Computer Use" feature from Anthropic that can transform tedious tasks like booking doctor appointments into a seamless process. This blog explores a real-world example, showcasing how AI can autonomously navigate tasks with minimal human intervention. Join us as we del...

In the first part of this series, I wrote about an incredible feature called Computer Use that Anthropic released a few weeks ago. This could be a bombshell for the IT industry and many other sectors (I'm afraid the resonance is yet to come).

In the second part of the blog post, I will take you through a real-world scenario that demonstrates how Computer Use can automate the process of booking a doctor's appointment. We'll cover understanding the task, introducing Computer Use, setting up the application, executing the task, and summarizing the experience along with future implications.

Although we are still in the early days, this technology is, in my humble opinion, the first truly capable tool to perform a task from A to Z with very little human intervention. The autonomy and ease of use are amazing — something I've never seen before. And believe me, I've tried a lot! 😄

To go beyond the theory and the news, I decided to share with you a real-life scenario of one of the most boring tasks I have to do from time to time (thankfully rarely) — finding a doctor's appointment.

About the Task to Be Performed

In many parts of the world, getting an appointment with a doctor can be a nightmare if you're on government insurance, especially if it's urgent. In the past, we were used to doing this over the phone, but now many countries have started to offer online search services.

Recently, I was thinking about seeing a dermatologist for a minor inflammation, but as always, I have to admit that I was just too lazy to go through the hassle of finding an appointment (which is sometimes even more hassle than going to the doctor 😄).

Anyway, I decided to do it, but this time with the help of my new friend: Computer Use from Claude.

If you don't already know how it works, I suggest you read my previous articles on the subject. But here again, in a highly compressed form:

Computer Use is a new feature of the Claude model (like ChatGPT) that can understand user interfaces on your screen and issue commands to control your screen like any human would. As opposed to the old-school way of macro-style screen control, we now issue the goal (like booking an appointment at XYZ or searching for this and that) and let the model do the magic, completely autonomously. And it works. It really does.

And Here's How

The goal: Book an appointment at website XYZ. Period.

The model has some limitations in terms of privacy, but I have always managed to let it log in to various websites for testing purposes. However, I strongly recommend NOT doing this. Instead, I recommend logging in first and then giving it instructions to do the rest of the magic by itself. You can see in the screenshots below how to stop the chat and transfer the control to yourself (Human in the Loop) to log in. In this example, however, we will let Claude do the complete task without any manual intervention.

I use the appliance provided by Claude in a Docker container and then proceed as follows:

1. Quick Setup of Computer Use

First, start the appliance.

Next, configure the model provider to use. The options are Anthropic, Bedrock, and Vertex from Google. I use Bedrock because, unlike Anthropic Claude, it has fewer restrictions on token usage per minute and per day. On the other hand, it has no prompt caching, which leads to higher token usage costs.

The other arrow in the top right corner is very useful when you need to regain control and manage your screen yourself. This is very useful if the model gets stuck or if you need to enter a username and password.

You can also stop the application at any time by pressing the stop button here. This is actually a Streamlit app feature, not a Claude feature, but it is super helpful when using the device.

2. Let the Magic Begin

With the device configured, we can now let the magic begin.

Here is the initial and only prompt:

Book a general practice appointment any time this month after 4pm at doctap.co.uk. Jane Smith Date of Birth: 1990-04-25 Gender: Female Address Line 1: 456 Elm Street Address Line 2: Apt 12B City: Manchester Postcode: M1 2AB Contact Number: +44 7911 123456 Email Address: jane.smith@example.com

A few minutes later, coming back from a coffee break...

3. Evaluation

After 3-5 minutes or so and no intervention from my side, the results are just remarkable (the last step was interrupted because of a network error, but I have no doubt it would have completed as well).

Practical Usage Tips

The model performs pretty well in the default settings, but there are a few things you need to pay special attention to, to save money and time in lengthy tasks that go beyond a simple test.
Many websites have this annoying “cookie” dialog. If you expect Claude to navigate through multiple websites, I highly recommend you install “I Don’t Care About Cookies for Firefox” before it starts.
Login in advance: In the example above, we use random test data. However, if you use existing accounts, I highly recommend you log in to all required sites before you start to not only avoid sharing your credentials with Claude but also to speed up the process (especially if you have two-factor authentication).
The Docker container exposes different ports, among others 8080 which is the default, but you can also watch the session directly in the Streamlit app running on Port 8501. If you do only browser automation and limited window size, this might be the better option.
You can export the conversation anytime simply using the print feature as you can see below.

Reduce the history of the screenshots sent to Claude to 5 or so. This is in most cases sufficient for it to catch up from previous experiences. Remember, Claude still has access to the whole text conversation all the time, so it knows quite a bit as it progresses.

Let Claude save intermediate results if you have lengthy conversations, for example to a file. This will come in handy if the session is interrupted and you want it to simply continue from where it left off. In my experiences, this happened only for difficult tasks.

Limitations:

I talked in previous blog posts about general limitations of Claude in lengthy conversations, and similar limitations apply here as well:

If you seek 100% autonomous accomplishment of your task, make sure you provide Claude with all necessary data. In the example above, this was, for example, the user data (unless you are not logged in). Note that in case it comes across a step where it does not know what to do, it is likely to stop and ask you for input, which is better than guessing sometimes.
Anthropic has hard token limit sizes for minute and daily limits. This could be cumbersome after some time, especially if you transfer all 10 screenshots with each iteration. Bedrock, on the other hand, does not have such rigid limitations, which is why I always use it for Computer Use (though higher cost because of lack of prompt caching).
The virtualization solution provided in this appliance could take a significant amount of memory. Pay attention that you don’t hit a memory limit on your device in the middle of a session, with possible loss of the whole conversation.
Anthropic Claude has support for prompt caching, which can significantly reduce the input token usage cost (at the cost of daily limits). Bedrock does not have prompt caching yet, which leads to more cost over time. It’s up to you to decide what is more important for you (either try again when you hit the limit and pay less or just let it do its thing and pay more).
Security restirction might occur in some steps like registrations for obious reasons as mentioned earlier (If there were no restrictions, anyone could make as many fake accounts as they wanted, which isn't good for anyone.)

Other Use Cases

If you understand how Computer Use works and consider its limitations, the possibilities are endless. Here are a just few ideas from Claude/GPT that are worth trying.

Automating E-commerce Shopping: Automatically find the best deals for a product on multiple websites. Compare prices and available discounts, then add the item to the cart and checkout.
Scheduling Appointments: Automate the process of scheduling appointments for haircuts, fitness classes, or other personal services. Search for available times and book the slot that fits your schedule.
Email Management: Automate inbox cleanup by deleting unwanted emails, archiving old threads, or marking specific emails as important. Set up filters to automatically categorize incoming messages.
Social Media Content Scheduling: Use the technology to schedule posts on platforms like Instagram, Twitter, or Facebook. Automate responses to messages or comments based on specific triggers.
Form Filling: Automatically fill in online forms (job applications, surveys, or registration forms) using pre-filled data or templates.
Hotel or Flight Booking: Automate booking processes for travel (e.g., flights, hotels) based on a set of preferences (location, price range, time).
Data Extraction from Websites: Extract data from various websites (such as news sites, real estate listings, or research papers) and compile it into a structured format for analysis.
Online Research Tasks: Automate the process of gathering information about a specific topic (e.g., by scraping multiple resources) and summarizing it in a document.

Wrap Up:

This piece of technology is just the beginning, but it is already so powerful that it could literally save us an incredible amount of time on boring tasks like booking a doctor’s appointment, finding a flight, etc.

But with great power comes great responsibility and care. So even if it can do all the magic on its own:

Don’t share sensitive information.
Don’t let the AI do the actual booking just yet.
And don’t trust the AI blindly.