From PDF to Podcast: The MIT Tool That Goes Beyond NotebookLM

Google's NotebookLM, known for transforming texts into engaging podcast-style conversations, faces limitations like lack of customization and API. However, open-source alternatives on platforms like Hugging Face offer customizable solutions to generate podcasts from documents, offering more contr...

From PDF to Podcast: The MIT Tool That Goes Beyond NotebookLM

NotebookLM is one of Google’s more creative AI products, introduced a couple of months ago. Many people were amazed by its abilities—especially the idea of turning a long text into an interesting conversation between two podcast hosts. NotebookLM offers more than that, such as chatting (Q&A) and even generating mind maps. If you haven’t tried NotebookLM yet, I highly encourage you to experience it yourself. It’s free and really easy to use.


The Problem

This is great, but some advanced users, including developers, want more and quickly run into limitations—at least in the free version:

  • You can’t select the hosts’ character.
  • You can’t change the prompt.
  • You can’t select the length or depth of the conversation.
  • … and pretty much anything else that goes beyond uploading the document and telling NotebookLM what to focus on, which was added recently.
  • Most importantly, it has no API… yet (at least not in the free version)

---

The Solution

As we often see these days, for almost every commercial solution, open-source alternatives appear—if not many. The same goes for NotebookLM.

When you search on GitHub for NotebookLM, you will find plenty of projects.

If you just want to test a solution without installing or configuring anything, there is a Hugging Face space that can do exactly that.

Generates an engaging two-host podcast from any uploaded document

Many thanks to the developers of this lamm-mit space:

https://huggingface.co/spaces/lamm-mit/PDF2Audio


What You Need

The Cost
To get an idea of how affordable inference and TTS can be, in the experiment I share here, I made a podcast of a technology trend report with about 50 pages, resulting in a 15-minute podcast. It cost me around 26 cents.

Here is how it works in a nutshell:

  1. The user uploads a document (for example, a PDF).
  2. It uses OpenAI models (the user can choose which one) to generate the dialogue.
  3. Then it uses OpenAI TTS with user-selected characters (hosts) to turn it into a podcast. That’s it.

Let’s get started.

---

Step 1: Get the Space

Open the space and optionally clone it:
https://huggingface.co/spaces/lamm-mit/PDF2Audio

---

Step 2: Upload the Document and Customize

Upload any document you want to turn into a podcast. In this example, I used a 50-page tech report.

The best part is that you can customize everything:

  • The prompt
  • The model selection
  • The characters
  • … etc.

To avoid lengthy prompting attempts, you can also choose from a predefined set of instruction templates that set the prompts for you.

If you want to test the characters’ voices first, you can use the OpenAI playground to hear these voices:

Data Privacy | Imprint