ExceleTel Header Logo

 

menu_end_leftmenu_homemenu_aboutmenu_productsmenu_salesmenu_supportmenu_searchmenu_end_right

 

Speaking Some Text

In the etTTSSimple and etTextToSpeech sample programs, you can type text into the text memo box and it will be spoken out your sound card or over an active call via your telephony device.  In the previous pages we have done all the setup for selecting our audio format and routing it to the proper audio device, so in this chapter we will focus on actually using text to speech to speak written text and showing you how to control how the speech sounds.

To speak text, you can use just one line of code:

In VB...

SpVoice1.Speak "Buy TeleTools now!" iSpeechFlags

In Delphi...

SpVoice1.Speak("Buy TeleTools now!", iSpeechFlags)

Setting Speech Flags

As a matter of fact, without all of the previous code, if you typed typed this one line of code, it would say the line of text over your sound card.  You would need to do one thing though and that is to define iSpeechFlags which is an integer variable that contains the constants that control how the speech stream is sent.  Here are your options:

  •  SVSFDefault means the Speak method will be synchronous (waits until done)
  •  SVSFlagsAsync makes the Speak method asynchronous and so it returns immediately
     (you can use events to find out when speech terminates, or call the
     WaitUntilDone method, or call SpeakCompleteEvent to receive a Win32 event
     handle, which can be passed to WaitForSingleObject).
     Note that the Speak method returns a stream number. When queuing several
     asynchronous voice streams, the stream number allows you to identify them;
     each voice event passes the stream number to which it relates as a parameter.
  •  SVSFPurgeBeforeSpeak means any text being spoken and any text queued to speak
     will be immediately cancelled.
  •  SVSFNLPSpeakPunc means punctuation marks are read out by their names, rather
     than being used as punctuation (so ? is read out as question mark)
  •  SVSFIsFilename means the first parameter is a file name containing text to speak.
  •  SVSFIsXML means the text includes XML tags to alter attributes of the spoken text

I like to speak asynchronously so that my program is free to do other things while the speech is played and I can respond to speech events to tell me about it's status. I also like to make sure that any text in the speech buffer is purged when I go to speak again, and to be able to use XML tags to control how the text is spoken.  Since these are bitflags, you can combine them like this:

In VB...

Dim iSpeechFlags As Integer
iSpeechFlags = SVSFlagsAsync or SVSFPurgeBeforeSpeak or SVSFIsXML

In Delphi...

var
  iSpeechFlags: Integer
iSpeechFlags = SVSFlagsAsync or SVSFPurgeBeforeSpeak or SVSFIsXML

You sometimes see people use "+" to combine the bitflags which will work in this case where bitflags have only one bit set in them and are unique, but technically the logical or is the proper method. We use the "or" operator to logically or all the bitflags together so that each individual bit that is set with each constant is maintained.  If you remember your high school logic course in math, if you or 00000001 with 00000010, you get 00000011.  The flags in position 7 and 8 are switched on and will control your speech accordingly.

Using XML to Control Text To Speech

XML, which stands for eXtensible Markup Language offers a lot of flexibility to control how your text is spoken.  What if trying to say "ExceleTel" sounds like cursing in a bad French accent?  Or what if you want the accented syllable in "umbrella" to be where you want it?  You can do that and more with XML.  Here is the text from our etTextToSpeech Sample program.  Notice all the XML tags.

<EMPH>Hello</EMPH><PRON SYM="f eh l ow">Fellow</PRON>developers. I can 
speak<PITCH MIDDLE="+10">in a high pitch like this.</PITCH>and<PITCH 
MIDDLE="-10">or a low one that sounds like this.</PITCH>I can speak<RATE 
SPEED="+5">very very quickly like a chipmunk</RATE>and <RATE SPEED="-10">or very 
very slowly.</RATE>I can speak <VOLUME LEVEL="30">quietly if you like</VOLUME> or 
<VOLUME LEVEL="100">loudly.</VOLUME>I can spell out this word. 
<SPELL>ExceleTel</SPELL><silence msec="500"/><VOICE 
REQUIRED="Gender=Female;Age!=Child">and even talk 
like a little girl</VOICE><VOICE REQUIRED="Gender=Male">or like an adult 
male</VOICE>and much much more<silence msec="500"/>Good luck with your ExceleTel 
TeleTools experience. You can start by pressing some digits when you are connected on a 
call. Enjoy!

As you read through this notice that we are using the following tags:

  • EMPH - Give this emphasis
  • PRON SYM - use the SAPI list of phonemes to pronounce a word not in the lexicon
  • PITCH - Change the pitch of the speech to a higher or lower register
  • RATE - Alter the playback speed of the speech
  • VOLUME - Raise and lower the volume of the TTS
  • SPELL - Spell out each letter of this word
  • VOICE REQUIRED - Change the speaking voice to another in the engine
  • SILENCE - Add silence between words or syllables

There are even more, but we will leave that up to you to look through the SAPI 5.1 SDK documentation to find all the others.  Using these XML tags you can almost completely control the inflections and pronunciations in your spoken text.

Stay tuned...  Check back for more pages about how to change voices, use a different speech engine, monitoring the status of speech using events and properties, how to use TeleTools and SAPI to construct useful applications like IVR's, call centers and more.

 

 

 
 


Copyright © 1997-2008 ExceleTel Inc. All rights reserved. Friday August 17, 2007 11:05:24 AM

Contact Us