Has Anthropic just wiped out an entire industry?
If you have been following the news, you may have read about a new feature (or should I call it a product) in the Claude API - it is called Computer Use.
This technology is not just another nice feature of Claude, but could actually be the beginning of the end for many industries. But before we look at the implications, let's take a look at what Computer Use actually is.
Here is the official announcement from Anthropic regarding computer use:
https://www.anthropic.com/news/3-5-models-and-computer-use
But before we delve into it and its implications, I have to mention that controlling computers with AI is not really something new. There have been many projects that have gone in this direction using agents.
Some of the open source projects I have in my Github favourites:
https://github.com/e2b-dev/awesome-ai-agents
https://github.com/xlang-ai/OSWorld
These projects have a lot of potential to make this technology available to everyone, even on-premises using local models (the Claude solution is closed-source and currently relies heavily on Anthropic models).
However, as is often the case, the challenge is in the execution. The architecture of these agents may be similar, but it all comes down to the reasoning capabilities of the model.
If the model can reason well, then it can probably perform the tasks it is asked to perform with little or no user intervention.
Sound good? Let's have a look:
How it works
The idea is simple:
- First, a client (such as a Python application) takes commands from the user and passes them to Claude along with a screenshot of the desktop environment.
- Then Claude interprets the command and reads the image with the desktop information to determine what action to take based on the current state of the desktop.
The client then interprets the control commands from Claude and actually executes those commands with simulated mouse movements and clicks, acting as a human would.
Note, however, that Claude does not (and cannot) actually execute these instructions; it simply returns them as output tokens to the caller (the Python application in this case), and the application is responsible for executing them on the host machine (Linux, Mac, whatever).
Here is a quickstart client-application provided by Anthropic that uses a VM with Docker to simulate this behaviour. As you can see, the screen is divided into two parts. On the left you can chat and give instructions, and on the right you can see the actual robot behaviour.
If you want to try it out, it's super simple with Docker.
There are two things I suggest you consider when choosing which model to use (Anthropic Claude vs Bedrock Claude):
The Anthropic model has Context caching, which is great for saving money on the cost of the input tokens! However, it does have some limits, so just be careful you don't hit them if you go beyond a few simple test scenarios.
The AWS bedrock is almost the exact opposite! The limits are much higher (I never actually hit them), but on the downside, it has no caching, so you'll pay more for the same task than the Anthropic Claude.
https://github.com/anthropics/anthropic-quickstarts/blob/main/computer-use-demo/README.md
docker run \
-e API_PROVIDER=bedrock \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e AWS_REGION=us-west-2 \
-v $HOME/.anthropic:/home/computeruse/.anthropic \
-p 5900:5900 \
-p 8501:8501 \
-p 6080:6080 \
-p 8080:8080 \
-it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Key Advancements
I have experimented a bit with RPA in the past, and more recently with Anthropic's Computer Use, and in my opinion there are two key differences:
First, unlike some RPAs that are tailored to specific scenarios, Computer Use is 100% generic and can work with any application and in any environment (if you have a mouse and a client that controls the OS).
Secondly, previous generic solutions did not have sufficient reasoning capabilities to perform complex tasks involving many steps, and most importantly were not able to recover when something unexpected happened.
With Computer Use in Claude, this seems to have improved significantly.
I tried it with some complex scenarios and it was able to execute them quite well.
Limitations
Despite the huge potential of this technology, there are a few limitations we need to be aware of.
The Token Limitations
Depending on the length of the task, you may hit a daily or minute token limit when using the Anthropic API. Keep in mind that the conversation is not just text, but your client must continuously send a screenshot of your desktop, which dramatically increases the number of tokens used.
Note that the limits are much higher if you are using Bedrock rather than the Anthropic API. More information on current limits can be found on the provider's website.
Cost
Depending on the task you want to perform, you will also need to consider the cost. As with the token limits, they increase massively with the number of screenshots you take. Of course, this depends a lot on the model you use and the caching you do.
Note that you can save a lot of money by using context caching, which is currently only available if you use the Anthropic API directly, not through AWS Bedrock. Context caching saves up to 90% of the cost of input tokens, so it is worth considering if the token limit is not an issue.
For more information on current pricing and context cache pricing, please visit the Anthropic website.
Speed of execution
Using the Computer Use requires a lot of multimedia interaction with the Claude API, including the transfer of images. This has a significant impact on bandwidth, but also on the speed of execution of these tasks. For recurring tasks that are highly structured and can be handled by an API, using Computer Use would not be the solution.
Anthropic Guardrails
Anthropic has strict guardrails on its models and does not allow every action the user requests. For example, it will not log into a website if you give it a username and password. It may also be reluctant to open certain applications or perform certain actions in some websites or applications. This can make or break your use case if you are relying on an entirely autonomous solution because of its unpredictability (it might work one time and fail another). For this reason, a human in the loop may be necessary at this stage of development.
Privacy and security concerns
Having someone do your dirty work for you is great, but it gives this piece of software an incredible amount of power. The amount of information it (and Anthropic as a company, of course) can monitor and the amount of control it has over your device is immense. This has to be taken into account, especially in scenarios where the computer is actually taking action on your behalf, such as submitting information (perhaps even sensitive information).
The impact on industry
As mentioned above, there are still many challenges to overcome before this technology can develop and begin to disrupt many parts of the industry. And, of course, it is not just Anthropic that is working on this, but OpenAI and many other companies.
Fully autonomous control of software has always been the holy grail of robotics and AI because of the immense potential for cost savings, both for businesses and for personal use.
However, given the current state of development and thinking, and the ongoing efforts to make this technology cheaper and faster, the impact on the economy could be huge and cause many technologies to emerge and existing technologies and companies or even industries to be wiped out faster than you think.
I can think of a few products that would be obsolete, but let's see what GPT predicts:
For sure! The "computer use" feature introduced by Anthropic's Claude API represents a significant leap in AI capabilities, particularly in automating tasks that traditionally require human intervention. Here are a few examples of how this technology could impact different industries:
Administrative and back-office operations:
Impact: Automation of routine tasks such as data entry, scheduling and document management.
Result: Increased efficiency and reduced operating costs, potentially leading to a reduction in the need for administrative staff.
Customer service and support:
Impact: AI agents can handle customer queries, troubleshoot issues and perform account actions without human intervention.
Result: Improved customer experience with 24/7 support and faster resolution times.
Software testing and quality assurance:
Impact: Automate the testing process by simulating user interactions with applications to identify bugs and performance issues.
Result: Faster development cycles and improved software reliability.
Financial Services:
Impact: Automated transaction processing, compliance checks and risk assessment.
Result: More efficient financial operations and reduced risk of human error.
Healthcare administration:
Impact: Manage patient records, schedule appointments and process insurance claims automatically.
Result: Streamlined administrative processes, allowing healthcare professionals to focus more on patient care.
E-commerce and retail:
Impact: Automatically manage inventory, update product listings and process customer orders.
Result: Optimised supply chain and inventory management, resulting in cost savings and improved customer satisfaction.
Human Resources:
Impact: Automated recruitment processes, employee onboarding and payroll management.
Result: Improved efficiency in HR operations and a more seamless experience for employees.
Training and e-learning platforms:
Impact: Personalised learning experiences by automating content delivery based on student performance and interactions.
Results: Improved educational outcomes through customised learning paths.
Legal Services:
Impact: Automated document analysis, contract review and compliance monitoring.
Result: Reduced workload for legal professionals and faster turnaround times for clients.
Marketing and content creation:
Impact: Automated campaign management, social media interactions and content planning.
Result: A more consistent brand presence and the ability to scale marketing efforts without a proportional increase in staff.
Overall Impact:
Workforce transformation: There could be a significant shift in job requirements, with more demand for roles in AI oversight and fewer in routine task execution.
Cost Reduction: Businesses could see lower operating costs, with savings potentially passed on to consumers.
Increased productivity: Automation of mundane tasks could lead to increased productivity, allowing human workers to focus on strategic initiatives.
Security and ethical considerations: As AI performs more actions autonomously, ensuring data security and ethical use becomes paramount.
Accelerating innovation: Businesses can innovate faster, with the ability to quickly prototype and test new ideas using AI.
Potential challenges:
Job displacement: Some positions may become obsolete, requiring retraining and education programmes for affected workers.
Regulatory hurdles: Rapid adoption could outpace existing regulations, leading to legal and compliance challenges.
Dependence on AI providers: Heavy reliance on technology from companies such as Anthropic could lead to vendor lock-in scenarios.
Conclusion:
The 'computer use' function has the potential to revolutionise multiple industries by enabling AI to perform complex tasks autonomously. While there are many benefits in terms of efficiency and productivity, there are also challenges that need to be carefully addressed to maximise positive outcomes and mitigate risks.
Amazing and frightening at the same time.
Wrap It Up.
This new piece of technology has, in my opinion, overcome a huge barrier and I believe it is only a matter of time before it unleashes its full potential and perhaps turns the economy on its head like ChatGPT did a few years ago.