P&S Theory Part #2

If you haven't had a look at the first part of this series, please do so to refresh your brain (it can be found on the resources section of my website).

I want to admit that I really struggled to define the technical concepts found within this post before I started writing this series.

How do you define sampling rate, or bit-depth? What is quantization? Can terms such as sampling rate and bit-depth describe both audio and video concepts? For my sanity and peace of mind, I have finally explored these questions and come up with a few answers. To properly explore these questions, we will all need to rely on our strong understanding of the signal path.

Let's start from the beginning, and pull it apart!

The signal path is comprised of three domains:


Both light and sound waves live and breath in the acoustic domain. The acoustic domain is the world where photons bounce around and molecules are being disturbed by sound - it's the infinitely uncompressed, native source that we use to record all media.

When we try to record the information around us in the acoustic domain, there is an inevitable loss of quality and fidelity. However, loss of fidelity can be minimised through the use of high-quality recording equipment and proper recording technique.


To get from the acoustic domain to the analogue domain, you need a device that can take in the acoustic sound waves and photons and convert them into an electrical voltage, a measurable signal. This is where you will hear words such as transducers, photosites, mic-level and line-level - but we won't spend too much time here. The main focus is on the digital side of things.


To get from the analogue domain to the digital domain, you need the electrical voltage to be translated, recorded and stored as a continuous set of digital values onto a hard drive/storage device. In this section, we'll be talking about sampling rate, bit depth and curiously, apple pie...

Let's delve into the audio signal path using the context from the three domains above. Don't worry picture-lovers, we'll be diving into the video signal path very shortly.


The process of travelling from the acoustic domain to the analogue domain starts with a device called a transducer. A transducer is simply a device that converts variation of a physical quantity like sound waves into an electrical signal. So all microphones are actually transducers.

Every microphone/transducer is a wee bit different, but all microphones have something in common: they have a diaphragm.

The diaphragm is a thin piece of material that vibrates when in contact with a sound wave - and this is where the magic happens. All of the vibrations from this material are translated into electrical voltages, and this initial voltage is referred to as a 'mic-level' signal. This low level, weak signal travels through a microphone cable into the sound mixer/recorder's pre-amplifier and gets boosted to a louder 'line-level' signal.

This is where it turns a bit... digital.

The line-level electrical signal travels to an analogue to digital converter, or ADC.

An ADC measures or samples the incoming voltage thousands of times per second and converts the input voltage into digital output values. This continuous measuring process is called the sampling rate.

The sampling rate is one of two key concepts that we will discuss when referencing the analogue to digital signal conversion.


The rate at which the ADC measures the incoming voltage per second is called the sampling rate, or sampling frequency. The sampling rate is measured in Hertz - common sampling rates for audio ranging from 41,000 (44.1kHz) samples per second to 48,000 samples (48kHz) per second.

A higher sampling rate means more detail and accuracy can be captured during an audio recording:


So how many samples do you need in order to accurately record your sound?

Well, any given sampling rate can accurately record audio frequencies up to just one half of itself, i.e. a sampling rate of 48kHz can accurately record frequencies just under 24kHz. If you remember from the previous blog post, humans can hear from around 20Hz to 20kHz, so recording audio with a sampling rate of 44.1kHz or higher covers the possible range of frequencies that a human can hear.

So to recap - sampling rate is simply the number of times the analogue electrical voltage is recorded per second, i.e. 48,000 times per second.

Bit depth is the accuracy in which those samples can be taken. Let me explain.


Computers work with binary numbers to record and store values, and digital audio samples are no different. Samples are recorded in strings of binary numbers. Here is an example of one, individual audio sample:


If the amount of binary digits (or bits) used to describe a sample increases, sound wave amplitudes can be recorded with higher accuracy and the dynamic range of your signal will increase. The amount of 'bits' used to record any given sample is referred to as the bit-depth.

This concept can be highly simplified with a crude example. Imagine this: One-third of a delicious apple pie is sitting on your kitchen bench. If you had to write down how much pie was left using integers and decimals, you would probably start off by writing down 0.3. That's close, but not quite accurate. 0.33 is closer, right? But 0.333333 is closer still.

Well, just like decimal points can be added to increase accuracy of a fraction, the same is true for binary digits. Bits can be added to samples to increase the accuracy of the recorded amplitude.

A higher bit depth (more 0s and 1s to play with) allows the computer to divide the range of possible amplitudes with more precision. 24-bit audio, for example, which is a high-quality standard for recording on-set, divides the amplitude range over 16 million times.

The most dramatic effect of a higher bit depth is increased dynamic range - so it is an important concept to appreciate and understand.

NB: Let's talk about that scary term "quantization" that I mentioned earlier:

An audio sample (hypothetically) could be recorded with an infinite number of bits, right? You could always add another decimal to your apple pie fraction to minimally improve its accuracy! So, regardless of how high your bit-depth is, at some point, you will always need to truncate or round off your recorded digital value. This is called "quantization". Quantization is a complicated word that means "round off your number" to the specification that the bit-depth allows.


We're nearly there!

Luckily, with video, the signal path is nearly identical to its audio counterpart - so we can pretty much re-summarise and get straight to the point.

In the video world, the camera sensor (the transducer) is used to convert incoming photons into measurable voltages, and an ADC on the camera converts these voltages into digital values. To properly understand sampling rate and bit depth for video - we need to understand a little bit about camera sensors.


Camera sensors are comprised of thousands of photosites, tiny photo-collecting light buckets.


On a standard single-chip camera, these photosites are filtered by microscopic red, green and blue filters at each site. These "colour filter arrays" allow light from only one primary colour to filter through to each photosite.


The most common camera filter array is called the "Bayer Pattern" (as seen above). A fully fledged RGB pixel is produced by combining samples from adjacent R, G and B photosites. This combining process is called demosaicing or debayering, and it's pretty amazing. There is some incredibly clever mathematics going on during this stage to read in between the lines and generate the full RGB pixels that we expect to see.


When light floods onto the camera sensor, a voltage per photosite is generated depending on the intensity of the light that reached that particular photosite. For example, a high number of photons entering a photosite would generate a higher voltage out, whereas less light entering the photosite would generate a lower voltage out.

This voltage is sampled and is run through an ADC (just like audio) to be translated into digital bits to be recorded and stored.

NB: For the life of me I can't find what "common" sampling rates are for DSLR cameras and cinema cameras - so I can only assume that any given camera will have an in-built sampling rate(s) that are dictated by the camera manufacturer. To my knowledge, sampling rate is not an option you can alter onboard the camera interface.


Bit-depth is a setting that you can occasionally choose onboard your camera interface. 8-bit, 10-bit and 12-bit options may be available to you alongside resolution and bit-rate - but what do they really mean?

Well, with audio, a higher bit depth equates to a wider range of possible amplitudes and an increased dynamic range. In the video world, a higher bit-depth equates to a broader palette of colours and an increased dynamic range.

Nearly exactly the same thing, with the exact same concept as above:


Most DSLRs record 8-bit images, meaning the ADC records the incoming voltages from the photosites using a total of eight 0's and 1's.

Cambridge In Colour does a fantastic job of summing it up, so I'm not going to try and say it better: "[8-bit recording] allows for 2^8 or 256 different combinations—translating into 256 different intensity values for each primary colour. When all three primary colours are combined at each pixel, [8-bit] allows for as many as 28*3 or 16,777,216 different colours... This is referred to as 24 bits per pixel since each pixel is composed of three 8-bit colour channels."


The higher the bit depth (i.e. 12-bit), the more flexibility you will have with the video image in post-production, especially with heavy colour manipulation/strong grades.

So, there we have it - a fairly broad stroke overview of picture and sound. Happy New Year!

This is the second half of a two-part series on Picture vs.

Sound. If you have any questions you'd like to ask, flick me an email at lukerosspost@gmail.com