ExceleTel Header Logo

 

menu_end_leftmenu_homemenu_aboutmenu_productsmenu_salesmenu_supportmenu_searchmenu_end_right

 

Getting Started

Here is a checklist of what you will need when you are ready to use what we are showing you here.  You can follow the links to download the software.

First we will install the SAPI speech SDK.  You can find it on the Microsoft site here at http://www.microsoft.com/speech/download/sdk51/  Be forewarned, it is a 68Mb download.  It's an easy install, all you have to do is run the EXE and you have all the SAPI DLL's, sample programs and documentation.  Delphi users will appreciate that there is a type library included to make life even easier with 19 components!  

Next, you must tell your development environment that you want to use the files it has installed onto your computer in your project.  Here are the instructions for a few languages.  You will have to consult the documentation on adding references in your environment if it is not in this list.

Once you get everything loaded, you can access the speech objects.  In order to understand how Microsoft put this together, it's useful to understand the concept of "tokens". A token is an object representing a resource that is available on a computer, such as a voice, recognizer, or an audio input device. A token provides an application an easy way to inspect the various attributes of a resource without having to instantiate it. The tokens are stored in the registry.  For example the ISpeechObjectTokens for the voice object contains enumerators for voice, vendor, age, language and gender.  Using tokens, you can find only the "female" voices, or find which voices are that of a "child" or only get voices in "spanish".Token enumerators are COM objects that enumerate the necessary entries for the tokens under it.  Another example would be the AudioOutput tokens that give a listing of all the available audio output devices and parameters associated with them.

Setting Wave Formats

Let's start with wave files since speech is generated as a real-time audio stream in wave format. Wave files are defined in the list of SAPI constants found in your Speech library file and are in the following form:

      SAFTDefault             = -1
      SAFTNoAssignedFormat    = 0
      SAFTText                = 1
      SAFTNonStandardFormat   = 2 
      SAFTExtendedAudioFormat = 3
      SAFT8kHz8BitMono        = 4 
      SAFT8kHz8BitStereo      = 5 
      SAFT8kHz16BitMono       = 6 
      SAFT8kHz16BitStereo     = 7 
      SAFT11kHz8BitMono       = 8 
      SAFT11kHz8BitStereo     = 9
      ...

You will notice that the formats start with SAFT for "Speech Audio Format Type" and then a string of numbers and letters.  What do they stand for?  Well the 8kHz, 11kHz, etc. stand for the sampling rate.  This is the rate in Hertz or cycles per second that we "sampled" the audio.  In general, you want to sample using at least twice the highest frequency you wish to sample.  So 8kHz, or 8000Hz is just about perfect to sample the low quality phone audio which has a cutoff frequency of 3500Hz.  

The number of bits indicate how many bits are used to store the information.  A higher bit rate will have more resolution per sample.  8 bits can hold 256 levels of granularity while 16 bits can resolve 65536!  Normally, the higher the sampling rate and the more bits used to store the sampled data, the better the audio quality.  But keep in mind what your telephony device is designed to accommodate and the limitations of the type of line over which you will be sending your audio. Unless you are using digital lines on an in-house system with phones capable of extended dynamic range, the normal 8k and 11k mono formats are all you will ever need.  This will give you better performance and allow for smaller wave files.

The constants listed above are the wave format enumerations available in the SpeechAudioFormatType token.  There are about 68 formats currently.  You can access these by their constant name or by the their index number.  This makes it very easy to access individual wave format or populate something with an index, like a combobox, with all the wave formats.  But what if you don't want all of them?  What if you only want the mono wave formats?  Well to do that, you have two choices, you could use the constants by name like this. The code is very similar for Delphi and VB.NET:

MyWavFormatType := SAFT11kHz8BitMono;

But since constants are only used at compile time, you can't put them in an array or a combox and reference them later. So for now, it's enough that you know you can use the SAPI wave format constants if you want to, but we will focus on how to manage a large list of wave files and refer to them by their index.

You could reference the wave format types by their constant name or by any text name you choose by creating an object and storing the string name of the wave format with it's index.  In this way, the combobox item index of the wave format will match the ID in the constants list. Here is the code:

In Delphi...

ComboBoxWaveFormats.Items.AddObject('SAFT8kHz8BitMono', TObject(4));
ComboBoxWaveFormats.Items.AddObject('SAFT8kHz16BitMono', TObject(6));
ComboBoxWaveFormats.Items.AddObject('SAFT11kHz8BitMono', TObject(8));

Notice I am only getting the even numbered wave formats, the ones in mono, and storing them with their index. In Visual Basic (VB 5 or VB 6), we could create a function called addfmts like this:

In VB...

Private Sub AddFmts(ByRef name As String, ByVal fmt As SpeechAudioFormatType)
   ' Use the Constants in the SAPI SpeechLib file globals section
   ' fill the ComboWaveFormat box with the format name and it's index
   Dim Index As String
   ' get the count of existing list so that we are adding to the bottom of the list
   Index = ComboWaveFormat.ListCount
   ' add the name to the list box and associate the format type with the item
   ComboWaveFormat.AddItem name, Index
   ComboWaveFormat.ItemData(Index) = fmt
 End Sub

and then populate a list like this:

AddFmts "SAFT8kHz16BitMono", SAFT8kHz16BitMono 
AddFmts "SAFT11kHz8BitMono", SAFT11kHz8BitMono 
AddFmts "SAFT11kHz16BitMono", SAFT11kHz16BitMono 

This is what you would do if you want to access the wave formats directly, but TeleTools has it's own constants list, which not coincidentally matches the order of the Microsoft list.  This is all derived from the windows Multimedia sound specification.  The benefit to using TeleTools is to maintain consistency if your program and be able to use our etPlay and etRecord component along with SAPI to set your wave formats.  In addition, TeleTools allows you to refer to both the name and ID of wave formats the same way it does for things like TAPI devices.

Here are a few by ID, they all start with 'w':

       wfUnknown     = 0
       wfPCM08000M08 = 1
       wfPCM08000S08 = 2
       wfPCM08000M16 = 3
       wfPCM08000S16 = 4

and here are the same ones by a friendly name, they all start with 'S':

       SwfUnknown     = "Unknown"
       SwfPCM08000M08 = "PCM 8,000 Hz, 8-bit, Mono"
       SwfPCM08000S08 = "PCM 8,000 Hz, 8-bit, Stereo"
       SwfPCM08000M16 = "PCM 8,000 Hz, 16-bit, Mono"
       SwfPCM08000S16 = "PCM 8,000 Hz, 16-bit, Stereo"


So now you could create a list of audio wave formats by name like this:

ComboBoxWaveFmtNames.Items.Add('PCM 8,000 Hz, 8-bit, Mono');
ComboBoxWaveFmtNames.Items.Add('PCM 8,000 Hz, 16-bit, Mono');
ComboBoxWaveFmtNames.Items.Add('PCM 11,025 Hz, 8-bit, Mono');

or by name for display but referred to programatically by ID like this:

ComboBoxWaveFmtIDs.Items.AddObject('wfPCM08000M08', TObject(1));
ComboBoxWaveFmtIDs.Items.AddObject('wfPCM08000M16', TObject(3));
ComboBoxWaveFmtIDs.Items.AddObject('wfPCM11025M08', TObject(5));

What this can lead to is code as simple as the following where we create a variable of type TWAVFORMATS (a TeleTools type), populate a list with all the wave formats in one line of code and then set our desired wave format with only one more line of code:

procedure TForm1.MyProcedure(Sender: TObject);
var X: TWAVFORMATS;
begin
   for X := wfUnknown to wfIMAADPCM08000M04 do
   ComboBox1.Items.Add(ET_WAV_FOMAT[X].sName);
   end;

procedure TForm1.SetFormats(Sender: TObject);
begin
   etRecord1.Source.Format.ID := TWAVFORMATS(ComboBox1.ItemIndex)
end;

ExceleTel created a sample program just to show you how to work with wave files, it's called etWaveFormats and you can download it HERE.

Now that we have learned how to get and set wave formats, lets see how to put in into practice as far as TAPI telephony and SAPI speech are concerned.  Click below to continue to the next page where we will see how to use the SpMMAudioOut methods to choose our output device  and the SpVoice object to speak some text.

Continue to Page 3

 

 About ] First Time Visit ] TAPI Made Easy ] What You Can Do ] Products ] How To Order ] Support ] Search ]
 Home ] Up ] TAPI and SAPI Speech by ExceleTel Page 3 ]


Copyright © 1997-2007 ExceleTel Inc. All rights reserved. Friday August 17, 2007 11:05:24 AM

Contact Us