Voder 2025 – Speech Synthesizer

VODER 2025

Speech Synthesizer – Recreated from 1939

🔴 REC 0:00

Mode

voiced

Press SPACE to toggle

Pitch

120 Hz

A=down S=up | D-F-G-H-J-K-L=presets

Quiet Mode

OFF

Hold SHIFT

10 Resonator Filters – MONOPHONIC (Q-W-E-R-T-Y-U-I-O-P – one at a time)

Stop Consonants (Always Unvoiced)

Common Sounds (Always Unvoiced)

Pitch Control (Foot Pedal)

Deep Bass (30 Hz) High Soprano (600 Hz)

Pitch Presets (D F G H J K L – instant jump to pitch)

Quick Start:

• Click “Start Audio” then “Record” to capture your performance!
• Press Q-W-E-R-T-Y-U-I-O-P keys (MONOPHONIC – one filter at a time)
• SPACE bar toggles Voiced → Unvoiced → Both
• Hold A (pitch down) or S (pitch up) for continuous slides!
• Tap D F G H J K L to jump to preset pitches instantly!
• Z, X, C for stop consonants (always unvoiced bursts)
• Hold V for “SSSSS” sound (always unvoiced)
• Hold SHIFT for quiet mode (softer consonants)
• Download as high-quality WAV when done!

The technology of speech processing, which includes speech modeling, synthesis, encoding, and recognition, dates back to the parametric techniques introduced by Homer Dudley in the late 1930’s and early 1940’s. These methods are “parametric” in the sense that they construct a model of the acoustic properties of the human vocal tract, and then analyze speech by determining the values of the parameters of the model. Below is a rendition of the basic model from Dudley’s 1940 paper, “The Carrier Nature of Speech,” published in the The Bell System Technical Journal.
At the 1939 World’s Fair in New York, Bell Labs demonstrated this principle with a device called the “Voder,” shown below in action.

The voder is operated by highly trained technicians (who at the time were called “girls”). A technician would manipulate a set of analog (continuous) controls that produced speech like sounds, as in the sentence “greetings everybody”: The voder is carefully designed to match the limitations of the human operator to the needs for modeling speech. It is shown in the following schematic:

Ten “spectrum keys” control the gains of ten bandpass filters (because there are ten fingers). This crudely determines the spectral content of the speech signal (note that a normal human operator can only control at most ten keys at once). A wrist bar switches between a periodic excitation (“buzz-type energy”) and a white-noise excitation (“hiss-type energy”). Periodic excitation is used to produce voiced sounds (like “aaaaa”) while white-noise excitation is used to produce unvoiced sounds (like “sssss”). A foot pedal controls the frequency of the periodic excitation, which can thereby control inflection.

Listen to the complete Voder demonstration:

Book of Sound

VODER 1939

VODER 2025

Leave a Reply Cancel reply