Effortlessly Summarize Audio and Video Files Using Python
Written on
Chapter 1: Introduction to Auto Chapters
The Auto Chapters feature is an advanced tool that enables the segmentation of audio and video files into distinct “chapters,” each accompanied by a generated summary. This functionality is particularly useful across various platforms. For instance, YouTube incorporates chapters to allow viewers to navigate directly to desired content, while podcast applications utilize them for enhanced searchability of episodes.
In this article, I will guide you through leveraging this feature in Python with the AssemblyAI API, which converts speech to text and subsequently generates chapters along with a summary, headline, and key points.
Section 1.1: Setting Up Your Environment
In this tutorial, we’ll create automatic chapters for the famous 2005 Stanford Commencement Address delivered by Steve Jobs, available on YouTube. The video is approximately 15 minutes long, and for this demonstration, I will only use the audio component of the speech. You are welcome to use your own audio file; however, ensure that it exceeds five minutes in length for optimal results.
Additionally, you need to establish a free account with AssemblyAI to retrieve your API key. After registering, navigate to the home page and locate the “Integrate the API” box to copy your unique API key.
Subsection 1.1.1: Uploading Your Audio File
To transcribe and summarize your audio, the first step is to upload the file to the AssemblyAI API. You will need two pieces of information: the filepath of your audio file and the API key.
Next, we will define a function called read_file to read the audio file. This function will be utilized in a POST request to upload the audio file to AssemblyAI’s upload endpoint.
Upon receiving a response, you can extract the URL where your audio file has been uploaded, which will be stored in a variable named audio_url for use in the following step.
Section 1.2: Requesting the Transcript
In this phase, we will submit a POST request to the AssemblyAI transcript endpoint using the audio_url obtained earlier. Ensure that you include the key/value pair ‘auto_chapters’: True in the JSON parameters. This setting allows us to acquire both the transcription and the summaries.
We will create a variable called transcript_id that will identify our submission, which will be essential for the next step.
Chapter 2: Retrieving the Transcript and Summary
In order to access the transcript and summary, we will use the transcript_endpoint along with the transcript_id to create a variable known as polling_endpoint. By periodically sending GET requests to this endpoint, we can check the status of our earlier request.
Once the status indicates that the process is complete, we can save the transcript in a text file and the summary in a JSON file, naming both files after the transcript_id.
You can now check your working directory to find a .txt file containing the transcript alongside a .json file with the summary. Here’s a sample summary, headline, and gist from a section of the video:
{
"summary": "You need to believe that the dots will connect in your future, trusting in something—your intuition, destiny, life, karma, or whatever it may be. This belief will empower you to follow your heart, even if it leads you off the conventional path.",
"headline": "This belief will empower you to follow your heart, even if it leads you off the conventional path.",
"start": 312538,
"end": 342070,
"gist": "the dots will connect"
}
Now it’s your turn to implement this yourself! You can find the code used in this tutorial on my GitHub repository.
Join my email list, which boasts over 10,000 subscribers, to receive my Python for Data Science Cheat Sheet—an invaluable resource that I utilize in all my tutorials (Free PDF).
Chapter 3: Video Tutorials
The first video, "How To Summarise An Audio File In Python | Speech Recognition using Python," provides a comprehensive guide on summarizing audio files in Python, showcasing the use of the AssemblyAI API effectively.
The second video, "Speech Recognition And Summarization System In Python [Project Tutorial]," delves deeper into creating a speech recognition and summarization system using Python, offering practical insights and examples.