The vowel sound I, as in “fine,” it should be added, is not a simple sound, but diphthongal. The two sounds whose succession gives the sound we represent (erroneously) by a single letter I (long), are not very different from “a” as in “far,” and “ee” (or “i” as in “ravine”); they, lie, however, in reality, respectively between “a” in “far” and “fat,” and “i” in “ravine” and “pin.” Thus the tones and overtones necessary for sounding “I” long, do not require a separate description, any more than those necessary for sounding other diphthongs, as “oi,” “oe,” and so forth.
We see, then, that the sound-waves necessary to reproduce accurately the various vowel sounds, are more complicated than those which would correspond to the fundamental tones simply in which any sound may be uttered. There must not only be in each case certain overtones, but each overtone must be sounded with its due degree of strength.
But this is not all, even as regards the vowel sounds, the most readily reproducible peculiarities of ordinary speech. Spoken sounds differ from musical sounds properly so called, in varying in pitch throughout their continuance. So far as tone is concerned, apart from vowel quality, the speech note may be imitated by sliding a finger up the finger-board of a violin while the bow is being drawn. A familiar illustration of the varying pitch of a speech note is found in the utterance of Hamlet’s question, “Pale, or red?” with intense anxiety of inquiry, if one may so speak. “The speech note on the word ‘pale’ will consist of an upward movement of the voice, while that on ‘red’ will be a downward movement, and in both words the voice will traverse an interval of pitch so wide as to be conspicuous to ordinary ears; while the cultivated perception of the musician will detect the voice moving through a less interval of pitch while he is uttering the word ‘or’ of the same sentence. And he who can record in musical notation the sounds which he hears, will perceive the musical interval traversed in these vocal movements, and the place also of these speech notes on the musical staff.” Variations of this kind, only not so great in amount, occur in ordinary speech; and no telephonic or phonographic instrument could be regarded as perfect, or even satisfactory, which did not reproduce them.
But the vowel sounds are, after all, combinations and modifications of musical tones. It is otherwise with consonantal sounds, which, in reality, result from various ways in which vowel sounds are commenced, interrupted (wholly or partially), and resumed. In one respect this statement requires, perhaps, some modification—a point which has not been much noticed by writers on vocal sounds. In the case of liquids, vowel sounds are not partially interrupted only, as is commonly stated. They cease entirely as vowel sounds, though the utterance of a vocal sound is continued when a liquid consonant is uttered. Let the reader utter any word in which a liquid occurs, and he will find that while the liquid itself is sounded the vowel sounds preceding or following the liquid cease entirely. Repeating slowly, for example, the word “remain,” dwelling on all the liquids, we find that while the “r” is being sounded the “ē” sound cannot be given, and this sound ceases so soon as the “m” is sounded; similarly the long “a” sound can only be uttered when the “m” sound ceases, and cannot be carried on into the sound of the final liquid “n.” The liquids are, in fact, improperly called semi-vowels, since no vowel sound can accompany their utterance. The tone, however, with which they are sounded can be modified during their utterance. In sounding labials the emission of air is not stopped completely at any moment. The same is true of the sibilants s, z, sh, zh, and of the consonants g, j, f, v, th (hard and soft). These are called, on this account, continuous consonants. The only consonants in pronouncing which the emission of air is for a moment entirely stopped, are the true mutes, sometimes called the six explosive consonants, b, p, t, d, k, and g.
To reproduce artificially sounds resembling those of the consonants in speech, we must for a moment interrupt, wholly for explosive and partially for continuous consonant sounds, the passage of air through a reed pipe. Tyndall thus describes an experiment of this kind in which an imperfect imitation of the sound of the letter “m” was obtained—an imitation only requiring, to render it perfect, as I have myself experimentally verified, attention to the consideration respecting liquids pointed out in the preceding paragraph. “Here,” says Tyndall, describing the experiment as conducted during a lecture, “is a free reed fixed in a frame, but without any pipe associated with it, mounted on the acoustic bellows. When air is urged through the orifice, it speaks in this forcible manner. I now fix upon the frame of the reed a pyramidal pipe; you notice a change in the clang, and, by pushing my flat hand over the open end of the pipe, the similarity between the sounds produced and those of the human voice is unmistakable. Holding the palm of my hand over the end of the pipe, so as to close it altogether, and then raising my hand twice in quick succession, the word ‘mamma’ is heard as plainly as if it were uttered by an infant. For this pyramidal tube I now substitute a shorter one, and with it make the same experiment. The ‘mamma’ now heard is exactly such as would be uttered by a child with a stopped nose. Thus, by associating with a vibrating reed a suitable pipe, we can impart to the sound of the reed the qualities of the human voice.” The “m” obtained in these experiments was, however, imperfect. To produce an “m” sound such as an adult would utter without a “stopped nose,” all that is necessary is to make a small opening (experiment readily determines the proper size and position) in the side of the pyramidal pipe, so that, as in the natural utterance of this liquid, the emission of air is not altogether interrupted.
I witnessed in 1874 some curious illustrations of the artificial production of vocal sounds, at the Stevens Institute, Hoboken, N.J., where the ingenious Professor Mayer (who will have, I trust, a good deal to say about the scientific significance of telephonic and phonographic experiments before long) has acoustic apparatus, including several talking-pipes. By suitably moving his hand on the top of some of these pipes, he could make them speak certain words with tolerable distinctness, and even utter short sentences. I remember the performance closed with the remarkably distinct utterance, by one profane pipe, of the words euphemistically rendered by Mark Twain (in his story of the Seven Sleepers, I think), “Go thou to Hades!”
Now, the speaking diaphragm in the telephone, as in the phonograph, presently to be described, must reproduce not only all the varieties of sound-wave corresponding to vowel sounds, with their intermixtures of the fundamental tone and its overtones and their inflexions or sliding changes of pitch, but also all the effects produced on the receiving diaphragm by those interruptions, complete or partial, of aerial emission which correspond to the pronunciation of the various consonant sounds. It might certainly have seemed hopeless, from all that had been before known or surmised respecting the effects of aerial vibrations on flexible diaphragms, to attempt to make a diaphragm speak artificially—in other words, to make the movements of all parts of it correspond with those of a diaphragm set in vibration by spoken words—by movements affecting only its central part. It is in the recognition of the possibility of this, or rather in the discovery of the fact that the movements of a minute portion of the middle of a diaphragm regulate the vibratory and other movements of the entire diaphragm, that the great scientific interest of Professor Graham Bell’s researches appears to me to reside.
It may be well, in illustration of the difficulties with which formerly the subject appeared to be surrounded, to describe the results of experiments which preceded, though they can scarcely be said to have led up to, the invention of artificial ways of reproducing speech. I do not now refer to experiments like those of Kratzenstein of St. Petersburg, and Von Kempelen of Vienna, in 1779, and the more successful experiments by Willis in later years, but to attempts which have been made to obtain material records of the aerial motions accompanying the utterance of spoken words. The most successful of these attempts was that made by Mr. W. H. Barlow. His purpose was “to construct an instrument which should record the pneumatic actions” accompanying the utterance of articulated sounds “by diagrams, in a manner analogous to that in which the indicator-diagram of a steam-engine records the action of the engine.” He perceived that the actual aerial pressures involved being very small and very variable, and the succession of impulses and changes of pressure being very rapid, it was necessary that the moving parts should be very light, and that the movement and marking should be accomplished with as little friction as possible. The instrument he constructed consisted of a small speaking-trumpet about four inches long, having an ordinary mouthpiece connected to a tube half an inch in diameter, the thin end of which widened out so as to form an aperture of 2¼ inches diameter. This aperture was covered with a membrane of goldbeater’s skin, or thin gutta-percha. A spring carrying a marker was made to press against the membrane with a slight initial pressure, to prevent as far as possible the effects of jarring and consequent vibratory action. A light arm of aluminium was connected with the spring, and held the marker; and a continuous strip of paper was made to pass under the marker in the manner employed in telegraphy. The marker consisted of a small, fine sable brush, placed in a light tube of glass one-tenth of an inch in diameter, the tube being rounded at the lower end, and pierced with a hole about one-twentieth of an inch in diameter. Through this hole the tip of the brush projected, and was fed by colour put into the glass tube by which it was held. It should be added that, to provide for the escape of air passing through the speaking-trumpet, a small opening was made in the side, so that the pressure exerted upon the membrane was that due to the excess of air forced into the trumpet over that expelled through the orifice. The strength of the spring which carried the marker was so adjusted to the size of the orifice that, while the lightest pressures arising under articulation could be recorded, the greatest pressures should not produce a movement exceeding the width of the paper.
“It will be seen,” says Mr. Barlow, “that in this construction of the instrument the sudden application of pressure is as suddenly recorded, subject only to the modifications occasioned by the inertia, momentum, and friction of the parts moved. But the record of the sudden cessation of pressure is further affected by the time required to discharge the air through the escape-orifice. Inasmuch, however, as these several effects are similar under similar circumstances, the same diagram should always be obtained from the same pneumatic action when the instrument is in proper adjustment; and this result is fairly borne out by the experiments.”
The defect of the instrument consisted in the fact that it recorded changes of pressure only; and in point of fact it seems to result, from the experiments made with it, that it could only indicate the order in which explosive, continuant, and liquid consonants succeeded each other in spoken words, the vowels being all expressed in the same way, and only one letter—the rough R, or R with a burr—being always unmistakably indicated. The explosives were represented by a sudden sharp rise and fall in the recorded curve; the height of the rise depending on the strength with which the explosive is uttered, not on the nature of the consonant itself. Thus the word “tick” is represented by a higher elevation for the “t” than for the “k,” but the word “kite” by a higher elevation for the “k” than for the “t.” It is noteworthy that there is always a second smaller rise and fall after the first chief one, in the case of each of the explosives. This shows that the membrane, having first been forcibly distended by the small aerial explosion accompanying the utterance of such a consonant, sways back beyond the position where the pressure and the elasticity of the membrane would (for the moment) exactly balance, and then oscillates back again over that position before returning to its undistended condition. Sometimes a third small elevation can be recognized, and when an explosive is followed by a rolling “r” several small elevations are seen. The continuous consonants produce elevations less steep and less high; aspirates and sibilants give rounded hills. But the results vary greatly according to the position of a consonant; and, so far as I can make out from a careful study of the very interesting diagrams accompanying Mr. Barlow’s paper, it would be quite impossible to define precisely the characteristic records even of each order of consonantal sounds, far less of each separate sound.