Extract and Analyze Your Chrome History Using AI

Ever wondered what your browsing history says about you? Whether you're curious about your online habits, want to reflect on your productivity, or need to journal your activities, analyzing your browsing history can offer valuable insights. In this post, we'll explore how to extract your Chrome browsing history and use AI to cluster and analyze it. We'll cover both a no-code method and a more technical approach for those comfortable with the command line.

Why Browsing History?

In previous posts, I discussed the value of journaling with tools like Logseq or Obsidian to reflect on past activities and plan future ones. AI tools, such as NotebookLM, can further enhance this process by identifying patterns and insights within your journal entries that might otherwise be missed.

The common thread across these methods is the need for intentional self-reflection and note-taking. While I've personally found this practice beneficial and encourage others to adopt it, I acknowledge that maintaining detailed records can be challenging, particularly when juggling multiple tasks or diving deep into complex topics. This is especially true during web browsing, where it's common to start with one query and quickly spiral into a series of related searches. Think of it like a call stack in programming: a function calls another, which calls another, and so on, eventually (hopefully) returning to the original point.

The challenge lies in capturing these meandering online journeys. For brief explorations, a simple note might suffice ("researched XYZ"). However, longer or more convoluted sessions, especially those involving unexpected detours, require more diligent logging to maintain context. The ultimate goal is to be able to reconstruct and explain your activities over a given period, whether it's a day, a week, or longer. In essence:

  • We spend a significant amount of time engaged in online activities.
  • These activities often lead to unplanned tangents, making comprehensive note-taking difficult.
  • Despite this, it's crucial to be able to recall and reflect on how we've spent our time, both for productivity and knowledge retention. For example, a quick search for "best noise-canceling headphones" might lead to an extended exploration of audio engineering principles. Similarly, looking up "how to cook pasta" could easily evolve into a broader study of Italian cuisine.

Ideally, this capture process would be automated to supplement intentional journaling. Emerging tools like Microsoft Recall and various open-source alternatives aim to address this need, although their comprehensive data collection raises significant privacy concerns.

A more readily available and less intrusive solution lies within our web browsers themselves. Browsers like Chrome automatically record a wealth of data, including browsing history, form entries, and more. This data, often overlooked, can be repurposed to provide valuable insights into our online activities over time, enabling us to reflect on past explorations and identify recurring patterns or areas of interest.

Convinced? Let’s see how it works…

Step 1: Get the History File

The No-Code Way

If you prefer not to use shell commands, you can easily download your Chrome history using a browser extension. One such extension is Export Chrome History. This is just an example, and while we haven't personally used it, it can serve as a starting point. Always ensure you trust the extensions you install, as they can access sensitive data.

The Code Way (for Mac Users)

For those comfortable with the command line, here's a step-by-step guide:

  1. Close Your Browser and Copy the History File

Before proceeding, make sure to close Google Chrome to prevent any file access issues. Then, open Terminal and run:

cp /Users/<username>/Library/Application\ Support/Google/Chrome/Default/History /tmp

Replace <username> with your actual macOS username. This command copies your Chrome history file to the /tmp directory.

  1. Install SQLite (If You Haven't Already)

SQLite is a lightweight database engine that Chrome uses to store your history. Install it using Homebrew:

brew install sqlite
  1. Explore the Database

To see what tables are in the database, run:

sqlite3 /tmp/History ".tables"

You'll see a list of tables, such as:

urls
visits
visit_source
downloads
keyword_search_terms
...

It's fascinating (and a bit concerning) to see how much data is stored.

  1. Export the History to a Readable Format

Let's extract the browsing history into a CSV file:

sqlite3 -header -csv /tmp/History "SELECT
    datetime(last_visit_time/1000000-11644473600, 'unixepoch', 'localtime') as visit_time,
    title,
    url,
    visit_count
FROM urls
ORDER BY last_visit_time DESC
LIMIT 100000;" > recent_history.csv

This command selects the visit time, title, URL, and visit count from the urls table, orders them by the most recent visits, and saves the last 100,000 entries to recent_history.csv.

Step 2: Clean Up the Data

Now that you have your history in CSV format, it's time to clean it up. Since URLs may appear multiple times, we'll aggregate them to see how frequently you visit each domain.

You can use Excel or any spreadsheet software to manipulate the data. However, if you prefer using the command line, here's how you can do it:

awk -F'","' '{
    split($3,u,"//");
    d=(length(u)>1?u[2]:u[1]);
    split(d,h,"/");
    if(h[1]!="") print substr($1,2,7), h[1]
}' recent_history.csv | sort | uniq -c | sort -rn

This command does the following:

  • Uses awk to process the CSV.
  • Extracts the domain from each URL.
  • Prints the date and domain.
  • Sorts and counts the unique occurrences.
  • Sorts the output in descending order based on visit count.

Here's an example of what the output might look like:

189 2024-11 chat.openai.com
82 2024-10 docs.aws.amazon.com
45 2024-09 github.com
28 2024-11 portal.azure.com
20 2024-12 eu-west-1.console.aws.amazon.com
15 2024-11 status.digitalocean.com
12 2024-12 analytics.google.com
9 2024-10 stripe.com/dashboard
8 2024-11 console.firebase.google.com
7 2024-11 app.netlify.com
6 2024-10 docs.microsoft.com
5 2024-10 developer.mozilla.org
4 2024-12 support.google.com
4 2024-11 app.hubspot.com
3 2024-11 stackoverflow.com
3 2024-11 sentry.io
3 2024-10 app.datadog.com
2 2024-12 grafana.com

As you can see, certain websites like chat.openai.com and docs.aws.amazon.com are visited frequently.

Step 3: Let AI Cluster the History

With the cleaned data, you can now use AI to cluster your browsing history into meaningful categories.

Using Open-Source AI Models

For privacy reasons, you might prefer to use open-source AI models on your own device. Models like LLaMA or Mistral can be used with tools such as LocalAI, Ollama, or Open WebUI. Personally, I recommend Open WebUI for its ease of use.

  1. Prepare the Prompt

Copy the output from the previous step and use the following prompt:

Cluster this browsing history and output it in a table.
  1. Run the AI Model

Paste the prompt and data into your chosen AI tool. The AI will process the data and cluster the websites into categories based on their content and your usage patterns.

AI Clustering Output
AI Clustering Output

Example of AI clustering output.

And that's it! You've successfully used AI to analyze your browsing history.

Benefits of Analyzing Your Browsing History

By clustering your browsing history, you gain valuable insights into:

  • Productivity Trends: Identify the tools and resources you use most frequently.
  • Learning Patterns: See what topics or technologies you're researching.
  • Time Management: Understand where you're spending most of your online time.
  • Reflection and Journaling: Combine this data with notes from apps like Logseq or Obsidian to create comprehensive records of your activities.

Wrap-Up

Analyzing your browsing history doesn't have to be a daunting task. With the steps outlined above, you can extract, clean, and analyze your data to gain meaningful insights. Whether for personal growth, productivity enhancement, or simple curiosity, understanding your online habits can be incredibly beneficial.