Clean Any Text for LLM / GPT Processing (Mac)
Strip down lengthy web content with ease! This post reveals a handy shortcut that cleans your text by removing excess whitespace, URLs, and HTML tags, making it perfect for AI analysis. With just a few simple steps, streamline your data for summaries and Q&A. Say goodbye to messy inputs and hello...

Sometimes when I use GPT, I need to copy long text from various sources — like web pages — that can be in any format (such as JSON or Markdown) and are often very lengthy. In most cases, the specific format doesn’t matter much when GPT analyzes the text for tasks like summarization or answering questions. For example, consider this JSON text from a scrape of Twitter comments:
[{
"bookmark\_count": 58,
"created\_at": "Thu Mar 20 22:42:40 +0000 2025",
"conversation\_id\_str": "1902853325906710758",
"entities": {
"hashtags": \[\],
"media": \[
{
"display\_url": "pic.x.com/wQ6Y8SbphZ",
"expanded\_url": "https://x.com/GoogleAI/status/1902853325906710758/photo/1",
"ext\_alt\_text": "We construct a lower bound instance "small": {
"faces": \[\]
},
...
However, it isn’t very efficient to pass this text as-is to OpenAI or ChatGPT for analysis. If you’re processing it with a script instead of handling raw JSON directly, it’s better to remove unnecessary elements like extra new lines or multiple spaces.
— -
Tools to Clean Up Text
There are plenty of tools available for cleaning web content, such as the amazing https://jina.ai/ or various Chrome extensions that help you perform Q&A on websites.
But here’s an even simpler solution — not only for websites but for any text you want to process with AI.
— -
An Easy Shortcut for Any Text
I’ve created a short shortcut that reads the clipboard, cleans the text by removing extra new lines and spaces, and condenses the content so it’s more suitable for GPT. All you need to do is create a new shortcut.

After creating the shortcut, paste the following content:
on run {input, parameters}
-- Get input text, whether from clipboard or passed parameter
set theText to input as string
-- Use shell commands with careful handling of special characters
-- Remove extra whitespace
set cmd to "echo " & quoted form of theText & " | tr -s '\[:space:\]' ' '"
set theText to do shell script cmd
-- Remove URLs
set cmd to "echo " & quoted form of theText & " | sed 's/http\[s\]\*:\\\\/\\\\/\[^\[:space:\]\]\*//g'"
set theText to do shell script cmd
-- Remove HTML tags (simplify to avoid regex issues)
set cmd to "echo " & quoted form of theText & " | sed 's/<\[^>\]\*>//g'"
set theText to do shell script cmd
-- Remove common filler words with simple replacement
set cmd to "echo " & quoted form of theText & " | sed 's/ just / /g; s/ very / /g; s/ actually / /g; s/ basically / /g'"
set theText to do shell script cmd
-- Trim leading/trailing spaces
set cmd to "echo " & quoted form of theText & " | sed 's/^\[\[:space:\]\]\*//; s/\[\[:space:\]\]\*$//'"
set theText to do shell script cmd
return theText
end run

— -
How to Use the Shortcut
Now, whenever you have a long, uncleaned text, simply run this shortcut:

Comments ()