(part 4 of 6)
For normally sighted and hearing people, hearing is the most important sense after vision in any interaction. Most people can hear sound in the frequency range 20 Hz (Hertz = cycles per second) up to 20,000 Hz, but both the upper and lower frequency limits tend to deteriorate with age and health. Hearing is more sensitive within the range 1,000-4,000 Hz, which in musical terms corresponds approximately to the top two octaves of the piano keyboard.
The ear: OHP from Oborne. Note the pinnae, membrane, auditory nerve, semi-circular canals, consider pitch, sound waves (pulses), encoding on the auditory nerve.
Thus, the stimulus for audition is any vibration that will set the ossicles of the ear in motion between about 20 and 20,000 Hz. Ordinarily, this means vibrations of the air but vibrations transmitted through the bones also contribute to auditory sensation. (Having a tooth extracted or drilled will almost convince you that the jaw was designed to transmit vibrations to the ear in the most efficient manner possible.)
It is convenient to consider the stimulus for sound to be made up of successive compressions and rarefactions of air that follow a sine wave over time. There are at least two reasons for using the sine function. The first is that a pure tone produced by an electronic oscillator or a theoretically perfect tuning fork is a sine wave. The second and more important reason is that theoretically a wave of any shape whatever can be analysed into components that will be sine waves. This is known as Fourier analysis. Work with sine waves thus provides a standard for comparison across different types of sounds.
Loudness of the sensation is largely dependent on the amplitude of the wave. However, the ear is not equally sensitive to all frequencies.
The pitch of the tone depends primarily on the frequency of the sine wave, but not completely. Pitch is also dependent on amplitude. The apparent pitch of high frequency tones will increase with increasing amplitude but the apparent pitch of low tones decreases with increasing intensity. The loudness of a tone will also depend on the phase relationships of the component frequencies of the stimulus (that is, do they all start at once or start after each other). Timbre is a quality that depends on the purity of the tone; is it made up of one sine wave or a broad mixture. A tuning fork has a relatively pure tone and therefore little timbre. One the other hand, a piano or other musical instrument has timbre because of the other frequencies present in its sounds.
The ear can be viewed as performing a type of Fourier analysis of the auditory stimulus, separating a complex wave into its sine components. However, there are certain situations for which this analysis breaks down. One of these is when two stimuli of approximately equal intensity and frequency are presented to the ear at the same time. Instead of hearing both tones as a linear Fourier analysis should permit, a single tone is heard that varies in loudness in a periodic manner. You've probably heard this when two people sing together or two instruments are played together. This effect can be pleasant or unpleasant depending on the frequency of the beats.
The basis of beats is the following. If you have a pure tone of 256 Hz and another of 257 Hz, each one separately would produce a steady pitch that would be difficult to distinguish from the other. When the two are played together the condensations and rarefactions (compressions and expansions) of the air produced by the two tones will at some point be in phase (synchronised) and the two tones will add together. However, since the frequency of one is slightly greater than the other, they will get out of phase after a while and their effects will cancel each other out. As this process repeats itself, they will come in and out of phase as many times per second as the one tone has more cycles per second. In this example, it would be once per second and so you will hear one beat per second. This provides a very accurate way of measuring the difference between two tones, far better than the ear could discriminate if the two tones were presented separately. This fact is used to good effect by piano tuners. They tune one note until it no longer beats with the standard tuning fork. Then the other notes are tuned until their harmonics do not beat with the first note.
The decibel scale. The decibel scale for sound intensity is a relative logarithmic scale where 10 decibels = 1 log unit ratio of energy.
If the threshold of hearing is 0 dB then a whisper registers 20 dB and normal conversation registers between 50 dB and 70 dB.
Ear damage is likely to occur if the sound exceeds 140 dB. The ear is insensitive to frequency changes below about 20 dB (i.e. below a whisper). The sensitivity to both frequency and loudness varies from person to person.
The intensities of various commons sound are plotted on the scale below. The energy in much rock music is thus 100 trillion times threshold (or more than 100,000 times the intensity at which permanent hearing loss begins to be produced by long-term exposure). This level, indicated by the dashed line, is often surpasses by industrial jobs.
140 Rock band (amplified) at close range
130
120 Loud thunder
110 Jet plane at 500 ft
100 Subway train at 20 ft
90
----------------------------------------
80 Busy street corner
70
60 Normal conversation
50
40 Typical room
30
20 Whisper
10
0 Threshold of hearing
This scale does not describe perceived loudness well. Increasing an auditory stimulus by equal ratios does not produce equal increments in sensation. It is obvious that the difference in loudness between a whisper and a normal conversation is less than the difference between a normal conversation and a subway train. However, the amount of energy in a whisper and in a conversation stand in the same ratio to each other as the energy in conversation to that is a subway train do.
This means that when measuring loudness, asking people to directly estimate apparent intensity of loudness (known as magnitude estimation) is often the best way to quantify sounds.
Volume and density. These are more complicated terms that defy simple description. Low tones of equal loudness seem to occupy more space and thus are said to have more volume than high tones. On the other hand, high tones have a greater density than low tones of equal loudness. The volume and density of tones are each a joint function of intensity and frequency of the tones. However, they seem to be as real as pitch and loudness which have simpler bases. By "real" I mean that subjects have no difficulty making reliable judgments of volume or density of tones that differ in frequency and intensity when asked to do so.
How do we get spatial cues from sound? The idea that localisation is based on inter-aural time differences at low frequencies and inter-aural intensity differences at high frequencies is called the 'duplex' theory and dates back to Lord Rayleigh (1907). This does not hold for complex sounds.
We can identify the location of a sound from the time taken for waves to reach the ears, coupled with information from head and shoulder movements. Sound reaching the far ear will be delayed in time and will be less intense relative to that reaching the nearer ear. Thus, there are two possible cues as to the location of the sound source.
Owing to the physical nature of the sounds, these cues are not equally effective at all frequencies.
Low frequency sounds have a wave-length that is long compared with the size of the head, and this the sound "bends" around the head very well. This process is known as 'diffraction', and the result is that little or no 'shadow' is cast by the head. On the other hand, at high frequencies where the wavelength is short compared to the dimension of the head, little diffraction occurs. A "shadow" almost like that produced by a beam of light occurs.
Inter-aural differences in intensity are negligible at low frequencies, but may be as large as 20 dB at high frequencies. This is easily illustrated by placing a small transistor radio close to one ear. If that ear is then blocked with a finger, only the sound bending around the head and entering the other ear will be heard. The sound will be much less "tinny" since high frequencies will have been attenuated more than low; the head effectively acts like a low pass filter (allowing only low frequency sounds). Inter-aural intensity differences are thus more important at high frequencies than at low.
If a tone is delayed at one ear relative to the other ear, there will be a phase differences between the two ears (the peaks of the sine waves will arrive at different times), thus if nerve impulses occur at a particular phase of the stimulation wave form the relative timing of the nerve impulses at the two ears will be related to the location of the sound source. However, for sounds whose wavelength is comparable with or less than the distance between the two ears there will be ambiguity. The maximum path difference between the two ears is about 23 cm, which corresponds to a time delay of about 690 micro seconds. Ambiguities occur when the half wavelength of the sound is about 23 cm, i.e. when the frequency of the sound is about 750 Hz. A sinusoid of this frequency lying to one side of the head produces waveforms at the two ears that are in opposite phase (phase difference between the two ears of 180 degrees). From the observer's point of view, the location of the sound source is now ambiguous, since the waveform at the right ear might be either a half-cycle behind that at the left ear or a half-cycle ahead. Head movements, or movement of the sound source may resolve this ambiguity, so that there is no abrupt upper limit in our ability to use phase differences between the two ears. However, when the wavelength of the sound is less than the path difference between the two ears, the ambiguities increase; the same phase difference could be produced by a number of different source locations.
Two different mechanisms for sound localisation: one operates best at high frequencies and the other at low frequencies. For middle frequencies neither mechanism operates efficiently, and errors are at a maximum. Stevens and Newman (1936): investigate localisation of single bursts with smooth onsets and offsets for observers on the roof of a building so that reflection was minimised. The listeners had to report the direction of the source in the horizontal plane, to the nearest 15 degrees. Although left-right confusions were rare, low frequency sounds in front were often indistinguishable from its mirror behind. If these front-back confusions were discounted, then the error rate was low at very low and very high frequencies and showed a maximum for mid-range frequencies (around 3,000 Hz).
Intensity differences are more important at high frequency, phase differences provide usable cues for frequencies below about 1,500 Hz.
Consider cone of sound, timing, azimuth dealt with by timing, up and down dealt with by pinnae, note that there are limitations on the system because the sound is encoded on the auditory nerve, but above about 18 KHz the sound is not heard because of physiological limitations. High pitch sounds are easier to localise because low pitch sounds are showed less effectively.
Our abilities in discrimination depend upon whether we mean absolute discrimination or relative discrimination (this applies to vision too). Absolute discrimination is quite poor (e.g. systems should not rely on remembering a tone or sound), but relative discrimination is very good (e.g. comparisons of outputs coded into colours will be efficient, scientific visualisation is based on this). With vision (e.g. colour) and sounds we can remember no more than 5-7 items for absolute discrimination (unless we can attach meaning to them - colour labels, pitch labels). Also as we vary more dimensions of the stimulus (increasing its complexity), so we increase our ability to discriminate (up to 150 sounds - varying in frequency, rhythm, location, duration, volume, etc.).
Sounds evoke certain reactions. Some sounds can be hard to localise. Knowledge of the properties of sounds enables us to design effective alarm systems, for example.
A square-ended waveform (i.e. sudden-onset) will evoke a startle reaction, and if the sound is at high volume, the effect can cause quite severe panic. Bell-based alarm clocks are a good example - people often train themselves to wake up just before the alarm in order to turn it off! The startle reaction is automatic, even if you know it is going to happen, you are still startled.
Sounds such as from klaxons and bells tends to be square-wave sounds. They also tend to be hard to localise and can be hard to discriminate one from another, especially when occurring at the same time.
A typical reaction to such alarms is panic, a concern with stopping the high noise, and not processing the emergency. The aircraft in the Kegworth crash had poor alarms. In a number of disasters in environments with high false alarm rates you discover partially disabled alarms because of the unpleasantness of the noise.
ALARMS should not be unpleasant. For example, the desired reaction from a fire alarm is an orderly exit via the nearest door, not a panic rush for the door you came in by. This is the behaviour observed by investigators following fire tragedies - people have a strong instinct to go out the way they came in when panicked.
If alarms are to evoke the "right" responses then they need to be carefully designed. No square-end waves - it turns out that the curve can be very steep and still evoke no startle reaction; distinctive but not very high intensity sounds. High intensity causes panic - it does not denote urgency. Urgency can be conveyed by the speed of an alarm - a tri-one alarm which doubles its speed will suddenly sound very urgent.
Principles to derive from this:
These principles have been applied in hospital intensive care wards and in helicopters, and soon will be in many aeroplanes. They have been an enormous success - yet the work is not widely known.
Pickles, James O. (1988) An introduction to the physiology of hearing (2nd Ed). Academic Press.
Moore, Brian C.J. (1989) An introduction to the psychology of hearing. (3rd Ed). Academic Press.