How Speech Synthesizers Work


Open the pod bay doors, HAL. I’m sorry Dave,
I’m afraid I can’t do that. Back during the 1960s, 70s, and 80s we were
mesmerized by computerized voices in our movies and television shows. Negative, a close copy.
No alternative possible, Master. I’m sorry Michael, but I can’t do that. No good, I’ve
got three. Warp energy has increased 14 oercent. There’s only one problem. They were all
a lie. None of these were actual computer voices. Even the movie war-games, with it’s
very convincing sounding computer, wasn’t actually real either. Shall we play a game?
If you are wondering why it sounds so artificial, what they did was have the actor read the
words in reverse. So, for example, instead of saying “would
you like to play a game of chess?” They had the actor read it like this, “Chess
of game a play to like you would?” I’ll use Audacity to illustrate what they did.
They cut the individual words out like this, and re-arranged them in the correct order.
would you like to play a game of chess? I think they also changed the pitch some, so
we’ll do that too. would you like to play a game of chess? And then, they probably added
some sort of effect, I’ll play with one here and see what I can get. would you like
to play a game of chess? So, anyway, that’s roughly how it was done. Now, why they didn’t just use a real speech
synthesizer, I have no idea. I mean, speech synthesizers did exist at that time. Being
the movie was made in 1983 however, it was probably in pre-production for a year or two
before that, so speech synthesizers, while they did exist, they weren’t terribly common
at the time, so who knows? Of course, there have been devices such as
talking dolls since Edison’s first model that came out in 1890. These essentially had
a miniature phonograph inside of them such as this one shown in Get Smart back in 1969.
My name is Mary Lou. Or better yet, the miniature Yogurt doll from SpaceBalls. May the Schwartz
be with you. These actually worked very similar to another device, the popular See N Say by
Mattel. The cow says moooo. The original model used a type of internal
phonograph as well. The design of these is quite interesting. I’ve taken this one apart
so you can see how it works. Everything fits inside of this mechanism here. Let me show
you what we’re actually looking at there. This is the actual record, which is made of
plastic. And this part here is the tone arm, and you can see the stylus, which is very
dirty at the moment, is slightly lifted from the record. This part here is a rudimentary
speaker. And it amplifies vibrations by staying in contact with the tone arm. So, let’s
show it in action. The cow says moooo. Now, you may notice that each time a track
is played, the stylus travels all the way across the record. So you may be wondering
how there is room for multiple sounds. Well, here’s how that works. Normally, you think of a vinyl album and it
has different songs and you can essentially see the divisions between those songs. But
the See N Say works very different. The tracks are all wound together like this. If you look
at the outer edge of the record closely, you’ll be able to see different entry point grooves.
Each one of these is the start of a specific track. And so, by pointing the arrow at the
sound you want to hear, it will align it with the entry groove of that track. Neat, huh?
Oh, and by the way, you can actually play sounds on this thing with just these two parts,
but it takes practice. So what about those talking cars of the 1980s?
Don’t forget your keys. Well, you might be tempted to think these are computerized
synthesizers, but they aren’t. Murilee Martin from Autoweek recently took one of the speech
boxes apart and showed that they were actually little phonographs that work extremely similar
to a See N Say. Parking brake is on. The only real difference is that the amplifier is electronic
instead of mechanical. They even have the same little selection of entry grooves to
pick which sound it needs to play. So, these aren’t real speech synthesizers either. So, what about these early computer games
that incorporated speech into them? Ghostbusters! These games were extremely impressive in the
early 1980s. And as cool as they were, they were not real speech synthesizers either.
These games are simply using digitized recordings of speech. I mean, These sounds could just
have easily been dogs barking or cats meowing, or any other sounds. So in essence, these
games used the digital equivalent of a See N Say. Hence, they’d never be able to say
anything other than what was pre-recorded for them. The Speak and Spell was one of the first consumer
devices that started to cross the line into actual speech synthesis. E A R N. That is
correct. Now spell one as in one word. But I hesitate to call the Speak and Spell
a true speech synthesizer, because it really only knows about a little over 200 words and
those words are all pre-recorded. So, in essence it’s like a See N Say that just happens
to know 200 some odd words. In fact, if you wanted to add more words to
your Speak and Spell, it was necessary to buy new vocabulary cartridges, which had additional
recorded sounds on ROM. These could be inserted through the battery compartment. Nevertheless,
having appeared on the market in 1978, it was one of the first talking electronic devices
to reach the consumer market. However, the Speak and Math starts to blur the lines. The
reason is, it can pronounce any number imaginable because it has recorded all of the sounds
that make up numbers. That’s correct! Now try forty five thousand eight hundred three.
So, to say a word like 45 thousand 8 hundred 3, there are 6 separate sounds that have been
recorded to make this phrase. And so, if you wanted to change it to 46 thousand 8 hundred
and 3, then you just replace one sound. This is very similar to how the Radio Shack
VoxClock works. It only has a few dozen pre-recorded sounds and it mixes and matches them to produce
the time. It’s five seventeen PM. Also similar is the Tel Star answering system
from around the same time period. The time is eleven forty two AM, March nineteen. Next, I want to show you the Commodore Magic
Voice speech cartridge. On the side it has an audio in and audio out. It just plugs into
your C64 like so, and all you would need to do is run an audio cable out to an amplifier,
or in my case the television I’m using. Of course, if you wanted to still hear the
internal sounds from the Commodore 64, then you would run that into the audio input like
so. Then, you just fire up the C64. And you can use the say command to type something
like this. Commodore! Or this. Computer! But it’s kind of limited
and you can only say one word at a time. So, to say something more complex, you could write
it as a BASIC program. Commodore Computer. But, what if I try something like this. As
you can see, it doesn’t work. Believe it or not, the Magic Voice cartridge is not a
true speech synthesizer. It has a list of 234 pre-recorded words that it can say. In
fact, if you give it a number it will say the word that corresponds with that number.
Control! In fact, I’ll demonstrate this further by writing a little BASIC program
that says all words between 100 and 200. Find! Get! Have! Hear! Help! IS! Know! Like! Presents! So, the Magic Voice cartridge is also just
a digital equivalent of a See n Say, that just happens to know 234 words. The rooster
Says, “cockadoodledoo!” Fortunately, other software can add new phrases to it. For example,
the cartridge game GORF adds additional phrases that are used with the game. I’ll just plug
it into the little passthrough connector here, and let’s check it out. Commodore presents
Gorf, a Bally/Midway game. As you play the game, the enemy will taunt
you with insults, among other information. Ha! Ha! Ha! Gorfian robots attack! attack!
It’s a neat gimmick, but doesn’t really add much to the game in my opinion. In fact,
very few games support this. Mattel also introduced a very similar device
for it’s Intellivision gaming console around the same time. It’s called the Intellivoice.
One side plugs into the game console, and the other end is where you put the game cartridge.
On the front is a volume control for the voice. There were only a total of 5 games that ever
supported it, and I have 3 of them right here. Now, these games will work without the intellivoice,
but they just won’t have any speech. So, let’s try this thing out. The first game
I’ll try is Bomb Squad. Let’s power it on. Mattel Electronics presents Bomb Squad!
They’ll never do it in time! The code! The code! Figure out the code! It won’t be easy.
Replace this first, this third, this second. Mattel electronics presents Tron. OK, let’s try out Tron Solar Sailer instead.
7 4 7 8 2 Energy High. Again, the speech is a nice gimmick, but isn’t really all that
useful. It’s not surprising that the product was considered a flop. So, up to this point, everything I have shown
you have been devices that, while they can speak, they are really only playing back select
pre-recorded sounds. So, they are pretty limited in the things that they can say. Now I want
to show you some true speech synthesizers. These devices can actually create words out
of allophones, which are basically the fundamental building blocks of speech such as vowels,
and consonants and some of the other sounds that we make when we talk. The first one I want to show you is the Currah
speech 64 cartridge for the Commodore 64. This was also marketed under the name of voice
messenger. Now you may notice this DIN cable hanging out the side. Let me show you how
this works. The cartridge plugs in like any other, but then this part plugs into the monitor
port on the C64. It’s actually making use of the seldom-used audio-input line on the
Commodore 64. This allows audio to pass through the sound chip and back out, at the same time
mixing the sound with the internal sound. Of course, most Commodore users back then
were actually using a television for a display, so this was actually a pretty elegant design. It was supposed to come with a breakout cable
if you were using the cartridge with a monitor. Mine didn’t come with it, so I will make
my own so that I’ll be able to get some clear recordings of it. OK, so when you power on your C64, you’ll
need to type INIT. Return! And at this point, it will literally tell you every key you are
pressing on the keyboard, A B C D E F G Return!, which is sort of annoying. However, you can
tell it to say a word. Return, hello! If you get tired of hearing every key press, you
can type KOFF to turn the keyboard speech off. K O F F Return Now, here’s where things
get interesting. Hello Not only can I say a single word. But I can type literally anything
inside these quotes and it will say it. Hello there, how are you doing? Of course, it works off of English spelling
rules, which to say the least aren’t very consistent. So it isn’t perfect. Let me
give you an example. Harry Potter. Ok, it gets pretty close on that. But let’s try
Hermione Granger. Hermione Granger. Yeah, it totally fails on that one.
You can also change to different voices by putting a 0 or a 1 in front of the sentence
to be spoken. So here’s voice zero. This is voice zero. And you’ve already heard
voice 1, which is the default. This is voice one. Next, I want to show you the speech sound
program pack for the Tandy color computer. This cartridge contains not only a speech
synthesizer, but also a slightly better sound chip for the CoCo. So, let’s pop this in
there. So, on boot up the computer doesn’t really
do anything different. There are no SAY commands in BASIC or anything like that. Fortunately,
mine has the users manual with it. It looks like if I want to test the speech, I’ll
have to type this little program in. Oh the joys of typing in BASIC programs from a book. OK, all done, now let’s test it out. Test. It works! This is a Tandy. So, as you can see this is a true speech synthesizer
that can say anything you type. Of course, just like the others, certain words will throw
it for a loop. Hermione Granger. However, you can always get around this by typing in
the correct sounds, like this. Hermione Granger. OK, I have another type speech synthesizer
to show you. This is called SAM. It’s a software-only speech program that was made
for the Commodore 64, Atari, and Apple 2 computers. This particular disk is for the Commodore
64, so I’ll put it in and load it up. So, first you have to load the actual speech part
into RAM, and then you load a small interface program. This was done so that you could use
SAM with other programs if you wanted. OK, let’s see what it sounds like. This
is a test. Let’s try some other things. I love the Commodore 64. SAM is a true speech
synthesizer as it can say anything you throw at it. I can say anything you want. Of course,
with certain limitations. Harry Potter. Again, you can’t expect a computer with 64K of
RAM to have a database of every possible English word, so it has to make some assumptions.
Hermione Granger. But, again, you can get around this by tweaking the spelling of the
words so that you get the pronunciation that you want. Hermione Granger. SAM was also very configurable. You could
change all sorts of aspects of the voice. So, here’s a higher pitch. I love the Commodore
64 And here’s me changing the mouth variable. I love the Commodore 64. So, you might ask, if you could do true speech
synthesis completely with software, then why did these cartridges exist? Well, one thing
you might notice is that every time you tell SAM to say something, it causes the screen
to blank because it requires every cycle of the CPU to produce the sound. So there’s
no time for the CPU to do anything else. In fact, even when you look at games that used
speech, typically the entire game comes to a halt while the speech occurs. On the other
hand, when you have a speech synthesizer cartridge, it can handle the work of producing sound,
while the computer can keep doing other things. Another enemy ship destroyed. Ha! Ha! Ha! By the way, it’s worth mentioning that SAM
has been reverse engineered and reprogrammed as a website you can use now. So it’s really
easy to try it out. Yes, I sound just like the Commodore 64 version. So what were the practical uses for speech
synthesis? Well, when we were 10 year old kids, probably the favorite thing that we
liked to do with them was to make the computer say all kinds of filthy curse words. And yes,
programs like SAM, they could absolutely say anything you wanted them to say. You know,
when we were 10 years old, that alone could provide hours worth of entertainment for us,
but I think the second most popular use for it was for making prank telephone calls. So, for example back in those days we had
phones like these, and no caller ID. So we didn’t know who was calling until we answered
the phone talked to somebody. So, it was hilarious to type out some insulting message like this.
And then we’d just dial somebody’s phone number and put the handset up to the television
like this and wait for them to answer. Hello? Hey there Techmoan! I just have to
tell you that your YouTube channel is total crap! Flippin’ idiot! But seriously, speech synthesis has found
numerous uses over the years, such as being the voice of Stephen Hawking. The first question
they asked it was, is there a God? And even automated telephone services like these. Hello
and welcome to moviefone. If you know the name of the movie you’d like to see, press
1. And it has continued to improve over the years
with things like Siri. Hey Siri, What is speech synthesis? Speech synthesis is the
artificial production of human speech. A computer system used for this purpose is called a speech
computer or speech synthesizer. In fact, I’m using an online speech service to narrate
this section of the video. Pretty neat, huh? If you are interested in the early development
of speech synthesizers, I recommend that you check out the VODER, which was one of the
first of its kind. It came out back in the 1930s. It was completely analog. Say “She
saw me” with no expression. She saw me. Now say it in answer to these questions. Who
saw you? She saw me. Since it had no CPU, of course, it required a human to actually
like play the different sounds almost like playing a piano. And, so that about wraps
it up for this episode. So, as always, thanks for watching!

100 Replies to “How Speech Synthesizers Work

  1. An interesting “retro” talking balloon appeared in the mid to late 1980s using no electronics whatsoever. A phrase was recorded using Edison’s “hill and dale” technology on one side of a thin semi-stiff plastic strip (about 2 mm wide and 1 mm thick). A blank area at one end was used to tape it to an inflated balloon, smooth side down; the strip also served as a flexible stick to hold the balloon,

    By “hugging” the balloon like a bagpipe, or an American football, putting the thumbnail against the recorded (bumpy) side and pulling the strip gently at as near a constant speed as possible, the thumbnail would cause the strip to vibrate, and the vibrations would be amplified by the air in the balloon.

    Playing back a recording with your THUMBNAIL! How cool is that?

  2. I have the Color Computer 2 with all the plugins, the voice module is pretty cool. Watching your vid has me wanting to go dig up my CoCo from storage! Great info and research as always, thanks for another fun video😀 Anyone eles remember waiting for the mail every month for a new copy of your fav magazine to show up so you could stay up all weekend typing in basic and another 2 days of fixing Santax errors!?

  3. Bruh when he booted SAM up it reminded me of those terrifying sounds that would ocasionally play on the original Xbox menu

  4. The talking cars actually came in two possible technologies: Tech similar to the Speak-and-Spell (Electronic), and tech similar to the See-and-Say (Analog)

  5. On the vinyl records' part, I think I watched another video way back that introduced actual full length vinyl records like this so every time you play them you have a chance of getting different songs, it's neat tho!

  6. you are the best. I really appreciate what you do, I especially loved the gun of old atari game console.i am computer engeneer and i did not know how that gun worked till i saw that episode of yours.
    thank you so much.
    i owe you man…

  7. You could end somebody's marriage with these modern speech synthesizers. Just call somebody up and ask for them when their spouse answers. Then hang up.

    "Hey, Jenny! Who the hell is Microsoft Sam? That's it! I want a divorce!"

  8. I remember in school in the 90s we would “draw” by pressing d d d. To draw straight or turn left or right …it was like lines and then u could “paint” inside the shapes if u left a tiny hole it would bleed out , it wasn’t “paint” but I wish I could remember it

  9. Awesome video. I wish you went i to that “voder” thing at the end more. Also, it’s ann-a-log, not anal-og. Is that an intentional troll?

  10. I`ve had this SAM but polish version it was called black box and it was cartrige. It could sing even. Nostalgic fun times.

  11. That Chrysler "Electronic Voice Alert" at 4:00 was actually the same thing as the speak and spell: https://en.wikipedia.org/wiki/Electronic_voice_alert

  12. I came back here because apparently the arcade game "Gauntlet" used the same chip (or one from the same family) as the speak and spell for its voice announcements. Not sure of the availability of these chips, now, but it'd be pretty cool to record and play your own voice snippets with them.

  13. A nice use of the “fake” voice synths was for railroads to use on their defect detectors, since they never needed them to say anything more than about 10 words and a list of numbers, obviously now the detectors are a true voice synths but up until recently and in some cases they are still the old pre recorded real voiced synths, and in some cases the detectors cut out after so many digits, a good example is length of train a lot of old detectors cut out at 9,999ft so those had to be removed once trains began to surpass 10,000 ft

  14. the SAM software is also used in a popular indie horror game, FAITH, and it's used to play up the horror aspect which works really well with the inhuman sounding speech!

  15. When I was a teenager (in the 90s), I found a spot in the circuit panel of the Speak 'N Spell, where if you jammed a flathead screwdriver between two of the solder points of two different chips, it would cause most of the keys to spit out garbled speech, and one key to say "relief." I really thought I was on to something there…

  16. @6:44 Sounds a lot like the Half Life 1 sound system guy. They must've done something similar to get that voice.

  17. Whoa we had the Currah cartridge growing up. Didn’t even know about the different voices.

    Edit: haha, the “Hey siri” called up siri on the ipad I was watching this on.

  18. Speech synthesizers (and human speakers) use phonemes, which are the sounds that make up a language. Allophones are different sounds that are equivalent in a language (though they may be different in another language).

  19. Less than a minute into this and I realize that what I need most in my life is Majel Barrett’s voice on a smart speaker

  20. I had S.A.M. for my Atari 800XL back in the mid 80's. And yes, it was primarily used for prank calling.
    Later in '88 or 89 I had one for the Macintosh called Talking Moose, which was capable of an amazing variety of inflection.

  21. Fast forward 40 years we have software that allow you to type in entire paragraphs and made to simulate actual human voices including men & women voices.

  22. Normal languages would work. : – DDD
    I am trying to say that english is the weirdest shit where written letters can have multiple pronunciation… it is not even close to phonetic.

  23. It's funny how Microsoft now calls their text to speech voice "Sam" or the same voice that the group anonymous uses lol

Leave a Reply

Your email address will not be published. Required fields are marked *