Microsoft Releases Transcription and Dictation Capabilities in Word Apps
Microsoft on Tuesday announced the release of a transcription capability within its Word for the Web app for Microsoft 365 subscribers.
Additionally, users of Word for the Web and Word for mobile apps are getting access this month to an expanded Dictate feature that enables document structuring and formatting using voice commands.
Transcription in Word for the Web
Transcription in Word for the Web is available now, but only in the U.S. English language. It was previously at preview, but reached "general availability" commercial release this month, per a Microsoft 365 Roadmap item description.
Transcription is just available in the browser-based Word application, and isn't available in the desktop version. However, Microsoft is planning to deliver a "Transcribe in Office mobile" capability, which is expected to arrive by year's end.
Transcription in Word for the Web is capable of recording conversations that are within range of a PC's microphone. Alternatively, users can upload a prerecorded audio file for transcription by the service. Supported audio file formats include .MP3, .WAV, .M4A or .MP4 files.
Microsoft described limits to Transcription in Word for the Web that possibly are temporary, stating that "at this time, there is a five hour limit of transcription time per month for uploaded recordings and there is a file size limit of 200mb." It's not apparent how users get alerted should they approach those limits.
The Transcription service is a bare-bones kind of application that you turn on from Word for the Web's "Dictate" option in the Ribbon menu, where there's an option to "Transcribe." Users simply let it record. Next, they hit a "Save and Transcribe Now" button and they wait. The application is said to be able to identify different speakers, who are identified as "Speaker 1,"Speaker 2," etc., although Microsoft likely could use Microsoft Graph user information in the future to be more specific.
I tried reading two short paragraphs of medical text while using the Transcription service. It took about five minutes for a transcript of those paragraphs to complete, but it correctly interpreted and properly spelled the text, which included species descriptors that were in Latin. However, the app is still a bit rough. It has trouble determining where periods should appear. While there was just one speaker, namely me, who was identified as "Speaker 1," it nonetheless broke up the text multiple times using the Speaker 1 identifier.
Audio quality will be a big factor in getting a successful transcription. Another test that I ran by playing an audio file resulted in a uselessly poor transcription.
Users can edit the transcribed text. It's possible to save all of the transcribed text in Word for the Web and edit it there, or just some of the text can be saved. When saved, users will see the text plus a link to the audio file within the Word document. The recordings get saved in OneDrive's "Transcribed Files" folder.
The transcription basically has to be saved as a Word file to be kept. Selecting "New Transcription" will delete any unsaved work. Apparently, that's what Microsoft is saying in its support document, which outlines how to use the Transcription service.
Microsoft receives the audio files when people use the Transcription feature. However, the files are "used only to provide you with this service" and don't get stored, Microsoft's announcement promised.
Dictate with Voice Commands
The Dictate service has been available, but Dictate with voice commands is now at the general-availability release status as of this month for Word for the Web, as well as for the Word for Android and Word for iOS mobile applications, according to the Microsoft 365 Roadmap.
These voice commands let users tell the service to do things like "bold last sentence" or "start list," Microsoft explained. Users can specify text formatting and punctuation just by saying it. It'll also insert symbols when users say "percent sign" or "ampersand." Users can also edit text with commands like "delete," "insert space" or "backspace."
Some of the added "natural language" capabilities included with the voice commands additions stem for artificial intelligence models used with Azure Cognitive Services, Microsoft explained in an Azure blog post. The company is also working on text-to-speech capabilities in Azure Cognitive Services that could be used to "create AI-powered virtual 'teachers' that interact with students to make online courses more engaging," Microsoft contended in another post.
The Dictate with voice commands feature currently supports about 13 distinct languages, as listed in this support article on how to use Dictate. Dictate with voice commands in Word for the Web and Office mobile apps is free to use, although it requires having a Microsoft account.
Office 365 subscribers will be getting Dictate with voice commands in Word for the Windows desktop and Word for Mac "towards the end of the year," Microsoft promised.
Microsoft receives data when people use the Dictate service, but it's only used to provide results to the user. Microsoft claims to not store the audio or text files.
Kurt Mackie is senior news producer for 1105 Media's Converge360 group.