Posey's Tips & Tricks
Microsoft Word as an Unexpected Tool for Audio Workflows
Brien Posey explains how Microsoft Word's transcription and read-aloud features became a surprisingly practical way to sort, edit and refine AI-generated podcast audio created with Microsoft 365 Copilot Notebooks.
Recently, I have been working on an audio project, and as strange as it may seem, I have found that Microsoft Word to be an incredibly useful tool in my audio workflows, but not in the way that you would probably expect.
In order to understand why in the world I am using Microsoft Word for an audio project, I need to tell you a little bit about the project that I am working on.
Microsoft 365 Copilot includes a feature called Notebooks, which allows you to assemble a collection of documents and then ask Copilot questions about the document contents. As useful as this capability might be however, there is another feature that is a bit more unique. Microsoft gives you the option of creating an audio overview of one or more documents within a notebook. While you can use this feature to generate a formal audio summary of a collection of documents, there is also an option to create a podcast. This causes two different AI speakers to have a casual discussion about whatever it is that is in your document.
After spending some time messing around with the Audio Overview feature, I quickly realized that I could use it to create podcasts related to some of the books that I have written. As such, I set out to create a series of podcasts tied to one book in particular. My idea was to create a separate podcast for each chapter.
Even though this project initially seemed like it was going to be easy, I quickly realized that it was going to be tough to get exactly the results that I wanted. When I started generating podcasts for the various chapters, I discovered that Copilot was sometimes omitting key details from the discussion. In other cases, the audio overview might talk about a topic that I wanted to cover, but approach the topic from a standpoint that did not align with my book’s original intent.
I found through trial and error that by creatively structuring the AI prompt, I could come close to getting what I was after, but Copilot never seemed to be able to produce exactly what I wanted. However, I also found that by regenerating the audio over and over again, I could eventually get the AI to cover everything that I wanted, just not all in one file. My job therefore, was to dissect half a dozen different versions of each podcast, extract the audio that I wanted to keep, and then assemble those audio clips into a master audio file.
This ended up being a really tough job since I had three hours or more of audio for each podcast. Manually sorting through all of the audio and making notes about what I wanted to keep and what I wanted to get rid of just wasn’t practical. This is where Microsoft Word comes into play.
Microsoft Word contains an audio transcription feature that I was able to use to create written transcripts of the various recordings. To use this feature, just open Word and then click on the Dictate icon, found on the Home tab. This causes Word to display a menu asking if you want to dictate or transcribe. Choose the Transcribe option. You will now be asked if you want to upload audio or start recording audio. Choose the Upload option, and when prompted, choose an audio file to upload to OneDrive. Sadly, if your audio files are already in OneDrive, you will have to download them and reupload them. There is no option for using a file that is already in OneDrive. When the transcription is complete, click the Add to Document button.
The audio transcription process seems to work well, even if it is not perfect. In my case however, perfection didn’t matter. The transcription was still good enough that I was able to delete the parts of the transcript that were irrelevant, leaving the parts that I wanted to keep. Better still, the transcript optionally contains speaker references (Speaker 1, Speaker 2, etc.) and time stamps, greatly simplifying the process of locating audio clips later on.
Oh, and there is one more way that Word can sometimes help out with this process. Microsoft Word includes a Read Aloud feature (found on the Review tab). Word uses the Microsoft Ava voice when reading documents aloud. Ava is one of the primary voices used in the Audio Overviews. As such, you could conceivably use Word to generate bits of custom audio to help smooth the transition from one audio clip to the next.
About the Author
Brien Posey is a 22-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.