Utilising Azure Text to Speech Cognitive Services with PowerShell

Introduction

Recently I’ve been building an IoT Project that leverages Azure Cognitive Services. A couple of the services I needed to use were for converting Text to Speech and Speech to Text. The guides were pretty good from Microsoft, but not obvious for use with native PowerShell. I’ve got it all working, so am documenting it for myself for the future but also to help anyone else trying to work it out.

Accessing the Cognitive Services Text to Speech API

Azure Cognitive Services Text to Speech is a great service that provides the ability as the name suggests, convert text to speech.

First you’ll need to get an API key. Head to the Cognitive Services Getting Started page and select Try Text to Speech and Get API Key. It will give you a trial key and 5000 transactions limited to 20 per minute. If you want to use it longer, provision a Speech to Text service using the Azure Portal.

The Script

I’m using a female voice in English for my output format. All the available output languages and genders are available here.

There are also 8 audio output formats. The two I’ve used most are raw 16khz pcm for .wav format and 16khz mp3 for MP3 output as highlighted below. The script further below is configured for MP3.

  • ssml-16khz-16bit-mono-tts
  • raw-16khz-16bit-mono-pcm
  • audio-16khz-16kbps-mono-siren
  • riff-16khz-16kbps-mono-siren
  • riff-16khz-16bit-mono-pcm
  • audio-16khz-128kbitrate-mono-mp3
  • audio-16khz-64kbitrate-mono-mp3
  • audio-16khz-32kbitrate-mono-mp3

The script below is pretty self-explanatory. Update Line 5 for your API Key, and Lines 11 and 13 if you want the output audio file to go to a different directory or filename.  The text to be converted is in line 59.

Step through it using VSCode or PowerShell ISE.

Summary

Using Azure Cognitive Services you can quickly convert text to audio. Enjoy.