01. Principles of Sound and Vision

1.1 Fundamental of Sound

Sound is considered as a physical disturbance in a medium. The sound is actually pressure waves in the air which cause our eardrums to vibrate. For sound to be propagated from one place to another, a medium is required that has elasticity and inertia. By virtue of elastic and inertial forces acting on the air particles, sound is transmitted through the air. Particles vibrate in circular motion as in water waves, in transverse wave motion as in stretched strings and in longitudinal waves as in sound waves. All three forms of particle vibration are simple harmonic motion.

Sounds in the air are commonly produced by the vibration of diaphragms, vocal cords or instrumental strings. These waves are called periodic waves because they repeat over and over. They have continuous and generally smooth vibrations. These analog signals have characteristics such as the loudness and the pitch. The loudness and the pitch of the sound are directly proportional to the variations in the air.
The amplitude of the pressure wave determines how loud it sounds and the frequency determines whether it sounds high or low in tone. Low notes or bass have low frequencies and high notes or treble have high frequencies.
A pure single tone has a single frequency. More complex sounds (e.g. human voice, music) are made up of many individual sine waves of different frequencies and amplitudes, all added together.

1.2 Sound Propagation
The sound waves are represented by wav fronts which are defined as lines of points in the medium that are at the same point on the vibration cycle and at equidistant from the sources. A point source in free space sends over spherical wavefronts that tend to become plane wavefronts at great distances from the source. When a sound occurs, if there are surfaces nearby (e.g. walls, floors, ceilings, objects) the sound will come in contact with these surfaces and can be affected in some ways as a result.

Reflected sound bounces back from nearby surfaces, especially if the surfaces are hard and flat.
If nearby objects are porous or thin, while some sound will be reflected sound will also easily penetrate through the surface to be heard on the other side.
If nearby surfaces are splayed at angles other than 90 degree angles, reflection will be sent in different directions, producing a smoother and more desirable reflections sound as diffusion.
If nearby surfaces are soft or designed to absorb sound, sounds can be absorbed into them rather than being reflected.

1.2.1 Reflection of Sound

If the wavelength of sound is small compared to the dimensions of a smooth and hard surface, then reflection will take place. Like light, the angle of incidence of sound onto a plane surface equals the angle of reflection. A convex surface scatters the rays of sound after reflection. The concave surface tend to focus sound. A parabolic surface focuses sound very accurately. Upon reflections from corners to the walls, the incident ray is sent back towards the source from the reflecting surfaces forming a 90 degree corner. The corner reflections from the tricorners formed by the wall surfaces and ceiling and wall surfaces and floor, include two or more surface reflections which reduce their amplitude.

1.2.2 Refraction of Sound

Sound travels with different speeds in different media. As sound waves pass from one medium to another of different density, there will be refraction or bending of the direction of propagation. In the lower density medium, the sound travels slower because sound speed is proportional to density. Thus, sound travels faster in warm air than in cold air. During early morning hours, the warmth of the earth causes the lower layers of air to be warmer than the upper layers. As a result, sound rays are bent upward and doesn't appear to travel far. During the day the sun causes the upper layer to be warmer than lower layer resulting in a bending of sound down towards the earth.

1.2.3 Diffraction of Sound

When sound strikes a barrier which is large compared to the wavelength of the sound, a shadow zone is found just behind the barrier. But some sound is directed into the shadow area by diffraction. The following four results of the effect of obstacles on the flow of sound are seen.

A sound having a wavelength large compared to the size of the obstacle is scarcely impeded by the obstacle.
An obstacle large in terms of wavelength of the sound casts a sound shadow. Diffraction sends some sound into the shadow.
An aperture large compared to the wavelength of sound allows sounds to pass through readily. Again, diffraction sends some sound into the shadow zone.
An aperture small in terms of the wavelength of the impinging sound acts as a new point source radiating waves of spherical wave front by diffraction.

The brick wall reflects high frequency sound energy impinging on it because the wall is large compared to the wavelength of the sound. However, diffraction will tend to direct low frequency sound into the shadow zone.

1.2.4 Diffusion of Sound

Different absorptive and reflective materials are used for room acoustics. The diffusing effect of elements of different shapes is realized if the depth of the elements is at least of the order of one-seventh of wavelength. Patches (eg. square, rectangular) of absorptive materials offer the same amount of diffusion of sound. Likewise, tilting of room surfaces offers little diffusion.

1.2.5 Superposition of Sound

The principle of superposition says that same portion of a medium can simultaneously transmit any number of different sound waves with no adverse mutual effects. If several sound waves travel simultaneously through a given region of the air medium, air particles in that region will respond to the vectorial sum of the required displacements of each wave system. That is, two waves of equal amplitudes and in phase combine to give double amplitude. It is called constructive interference. Similarly two waves of equal amplitude but 180 degree out of phase cancel each other and known as destructive interference. When the two combining waves have different amplitudes, phases and frequencies, the principle of superposition still holds true.

1.3 Speed, Frequency and Wavelength Relation

Speed measures the rate of change of position traveled by sound. The unit is meter per second or miles per hour. The speed of sound in air is approximately 344m/s for normal conditions. If a sound wave is traveling past the source, the distance between successive peaks of the wave is the wavelength and the time is takes for one wavelength to pass is given by:

t = λv

Where, t is the period for one cycle in seconds

λ is the wavelegth in meters

v is the speed of sound

Since one complete cycle passes in time t, the frequency or number of cycles passing by in a second is,

f = 1/t = v / λ

The relation is otherwise written as,

λ = 344 / f (m)

1.4 Audio Spectrum

The sound pressure is normally measured in decibles. The difference in pressure levels at two different points or with respect to a reference sound pressure level, L_p is given by,

L_p = 20 log (P₁/P₂) (dB)

The international level for sound in air is 20μPa which is the threshold of human hearing. The inverse square law is applied for sound intensity and as pressure distance product is constant, it is called inverse distance law. So, the differences between the sound pressure levels at two points can be expressed as,

Difference = 20 Log (d₁/d₂) (dB)

Spectrum level is the sound pressure level of a sound with a bandwidth of 1 Hz. The spectrum level can be obtained for any band level by,

SL = BL – 10 Log (f₂ – f₁) (dB)

Where,
f₂–f₁ is the width of the band giving the band level reading.

BL is the sound pressure level in the band.

1.5 Audio
       The electrical audio waveform for a sine wave, or the pressure wave, is called the audio signal. It is a varying voltage that can be recorded and prossessed in many ways, and ultimately transmitted over-the-air. The range of frequencies in an audio signal, ranging from the lowest to the highest frequency carried, is known as the bandwidth of the signal. Sound is vibrations of the air and the average ear can hear sounds between 20 Hz and 15,000 Hz. The range of frequencies that can be passed without substantial change in amplitude, and how accurately their amplitudes are reproduced, is known as the frequency response of the equipment or system.
       AM radio is monaural or mono sound. Everything is on one channel and come from one speaker. But sound doesn't come from only one direction. Stereophonic or stereo is a system that gives sound a direction. It is the sound that seems to come from different locations. Traditional stereo uses two channels and two speakers to carry the complete audio message. The movies may use surround sound. Several channels of such surround sound are played through a number of speakers located around the theater. The new HDTV standard accepted for the United States includes a 5.1 surround sound stereo standard. Five main speakers are used and are labeled as left front, center front, right front, right surround, and left surround. A man seated in the center will hear the sound coming from either part of the room.
       Consumer audio is also unbalanced. Unbalanced audio uses only two wires to carry the message and does not resist induced noise. Balanced audio systems use three wires to carry the message and resist induced noise. This resistance to noise resulting in a cleaner signal is the big advantage of balanced audio. Consumer equipment, then, is always high-impedance balanced. Every piece of audio equipment except for microphones produces a very strong signal. This strong signal is usually called line-level or high-level signal. Microphones create their own electricity and as a result create very weak signals. Mic-level or low-level signals may be 40 to 60 dB weaker than line-level signals.

1.6 Television Systems:
        The aim of a television system is to extend the sense of sight beyond its natural limits and to transmit sound associated with the scene. The picture signal is generated by a TV camera and sound signal by a microphone. A television system is required to produce structural content, tonal content, motion, chromatic content and perspective of the scene as well as sound associated with it. The picture and sound signal each are modulated differently before transmission.
       A large amount of information must be broadcast by a television transmitter. Each television station is assigned different carrier frequencies on a prescribed channel. In each TV channel, the picture carrier frequency is 1.25 MHz above the bottom edge of the channel and the difference between the picture and sound frequency in PAL system is 5.5 MHz.

1.6.1 Light and Color

        Light is an electromagnetic wave which is generated by the source of light. It stimulates the retina in the eye and we can see it. The amplitude of the light wave determines the brightness of the light, which is referred to as the luminance level. The frequency of the wave determines the color we see, referred to as its hue. Likewise, saturation refers to the color's intensity or how much it has been diluted with white light.
        White light is made up of a mixture of many colors that the brain interprets as being white. This can be demonstrated by shining a white light through a glass prism, which splits up the different colors so they can be seen individually. The same sensation of white light in the eye can be produced by mixing together just three colors in the right proportion. In reality white light is 30% red, 59% green and 11% blue. For light mixing the primary colors are red, green and blue also referred to as R, G and B. By mixing two of these colors together, we can produce other secondary colors. For example:
        Red + Green = Yellow    Red + Blue = Magenta Green + Blue = Cyan
        By mixing the primary colors in different proportions, we can actually produce most other visible colors. An infinite number of colors in the visible spectrum can be produced by changing the proportions of the mix, brightness and saturation of the colors.

1.6.2 Video
        The human eye and brain perceive different light frequencies as different colors. One characteristics of human vision is that the eye sees much less color detail compared to the brightness detail, or luminance, of a scene. This greatly affects the way that color signals are carried in color television systems. Another characteristics is called persistence of vision. After an image having viewed has disappeared, the eye still sees the image for a fraction of a second. This allows a series of still pictures in both television and cinematography to create the illusion of a continuous moving picture.
        The video signal that carries the image is a varying voltage, and that signal can be recorded and processed in many ways and transmitted over-the air. The electrical video signal is a direct equivalent of the luminance of the light that it. However, it is possible for an analog signal to have distortion and noise added to it as it passes through the signal chain from source to viewer. Therefore, analog video is often converted to more robust digital signals for processing and transmission, with many advantages. The audio and video signals, directly presenting sound and image information, are referred to as baseband signals. They can be carried as varying voltages over wires and cables and can be processed by various types of equipment.

1.6.3 Video Frames, Luminance and Chrominance
        Video signals produce a rapid-fire series of still pictures that are displayed on a television receiver or monitor. Each of these pictures is called a frame. Because of the phenomenon known as persistence of vision, the eye does not see a series of still pictures, but rather an illusion that the picture on the TV is moving. If the camera is only producing black-and-white image based on the brightness, it is called luminance of the scene. However, color cameras actually generate three separate signals for the three primary colors red, green and blue.

        Different analog standard defines how they should be combined together. This is done by adding R, G and B in the correct proportions to make the luminance signal, which is always referred to as Y. The process of combining the three color signal is called encoding. The encoded signal, called a composite signal, actually comprises two parts. They are chroma or color signal and the luminance or brightness signal. Then, by subtracting the luminance from each of the red and blue signals, two color difference signals are produced, called R-Y and B-Y. These two signals are called called chrominance. Because the eye is less sensitive to color detail than to brightness, the chorminance signals have their bandwidth reduced to make it possible to transmit color in the same transmission channel as black-and-white television. They are then modulated onto a subcarrier, also known as a color subcarrier. The subcarrier chrominance signal is added to the luminance signal. Then they both can be carried on the same cable from the camera and around the studio, and ultimately transmitted over-the-air. The resulting signal that combines luminance and chrominance picture information with sync pulses, all carried together on one cable, is known as composite video.

        The color TV has a decoder built in to re-create the three separate R, G, B signals. The Y/C, RGB, and color difference systems are all different types of component video. Most systems that are set up for RGB can also use the color difference system. Many pieces of video equipment need to decode this composite signal into its component parts in order to process and use it. Of course, the component signals must then be re-encoded before the composite signal can be sent on its way from that piece of equipment. Every time the signal is encoded or decoded, it is distorted a little and some noise is added. For TV, a composite waveform carrying the well-known test signal color bars is used. Figure 1.1 shows how this signal looks on a waveform monitor. On a picture monitor, vertical bands of color are seen on the display.

1.6.4 Visual Display Units
        The television picture tube that displays the picture is properly called a cathode ray tube (CRT). Traditionally, the CRT used is a large glass vacuum tube. The inside front of the tube is covered with a phosphorescent substance that glows when struck by a beam of electrons. At the back of the CRT, a narrow neck that consists of an electron gun or a heated cathode emits a beam of electrons. The direction of the electron beam is controlled by the deflection yoke. Variations in the strength of the electron beam cause the phosphor to produce different levels of brightness, corresponding to the content of the original scene. The beam of electrons from the electrons gun moves in a left-to-right, top-to-bottom manner. Each time the beam reaches the right edge of the picture screen, it stops. Then it moves back to the left-hand side of the screen in order to start painting the next line.
        Similarly, after laying down a field of information, the electron beam is turned down to a low voltage before it retraces back to the top of the image.
        Once in position at the top of the screen, the beam's voltage is turned back up and it starts scanning a new field. If the electron beam were at full power when it returned to the beginning of a new line or field, it would illuminate phosphors on the CRT face and interfere with the information previously laid down. In order to prevent this from happening, the electron beam is turned down to a very low power so that it can return to the beginning of a new line or field. This period of time when the electron beam is turned down is called blanking.
        The duration of the lowered voltage, from the end of line to the beginning of the next, is called horizontal blanking. During this horizontal blanking period, the return of the electron beam from one side of the CRT to the other is called retrace. The time that the beam's voltage is turned down until it is turned back up to the next field again is called vertical blanking or the vertical interval. When the electron beam is retracing back to the beginning of a new field, it's called vertical sync. Most modern cameras use pickup devices such as charge-coupled devices (CCDs), and modern picture displays often use liquid crystal display (LCD) and plasma screens. These do not require blanking intervals for beam retrace.

Fig.1.2
Video signal waveform in a horizontal line.

1.6.5 Interlaced Scanning
When the NTSC standard was developed, 30 frames of video per second was about the best that available technology could process and transmit in a 6MHz-wide television transmission channel. However, pictures presented at 30 frames per second appear to flicker terribly. NTSC television reduces this flicker without actually increasing the frame rate by using an arrangement called interlacing. Rather than just sweeping all 525 line at once, the system goes through and sweeps the odd lines (line numbers 1,3,5,7,9.....) first and then goes back and sweeps the even lines (line numbers 2,4,6,8,10.....). Thus, there are two separate fields of 262.5 lines each. When these two fields are combined, they compose a single video frame or a complete picture of 525 lines. This will produce 60 fields and 30 frames every second. This process is called interlacing scanning. The interlaced scanning system is the method used to ensure that the picture has an even brightness throughout instead of having separate bright and dark bands.

1.6.6 Progressive Scanning
Computers and some formats of digital television use progressive scanning. In progressive scanning there are no interlaced fields. Each frame is scanned line by line in progressive (1,2,3,4,5,6.....) order from top to bottom. If interlacing is not used with a television picture, then each frame is painted on the screen, from top to bottom in its entirely. This is called progressive scan.

Rijan KC

Latest Articles

01. Principles of Sound and Vision