In order to fully understand speech and channel coding it is easier to
start from the very beginning of the process. The first step in speech
coding is to transform the sound waves of our voices (and other ambient
noise) into an electrical signal. This is done by a microphone.
A microphone consists of a diaphragm, a magnet, and a coil of wire.
When you speak into it, sound waves created by your voice vibrate the
diaphragm which is connected to the magnet which is inside the coil of
wire. These vibrations cause the magnet to move inside the coil at the
same frequency as your voice. A magnet moving in a coil of wire creates
an electric current. This current which is at the same frequency as the
sound waves is carried by wires to whereever you wish it to go like an
amplifier, transmitter, etc. Once it gets to its destination the
process is reversed and it comes out as sound. Speakers basically being
the opposite of microphones. The signal created by a microphone is an
analog signal. Since GSM is an all digital system, this analog signal
is not suitable for use on a GSM network. The analog signal must be
converted into digital form. This is done by using an Analog to Digital
Converter (ADC).
In order to reduce the amount of data needed to represent the sound wave, the analog signal is first inputted into a band pass filter.
Band pass means that the filter only allows signal that fall within a
certain frequency range to pass through it, and all other signals are
cut off, or attenuated. The BP filter only allows frequencies
between 300Hz and 3.4 kHz to pass through it. This limits the amount of
data that the Analog/Digital Converter is required to process.
Band Pass Filter
The filtered signal is inputted into the analog/digital
converter. The analog/digital converter performs two tasks. It converts
an analog signal into a digital signal and it does the opposite,
converts a digital signal into an analog signal.
In the case of a cell phone, the analog signal created by a microphone
is passed to the analog/digital converter. The A/D converter measures
the analog signal, or samples
it 8000 times per second. This means that the ADC takes a sample of the
analog signal every .125 sec (125 µs). Each sample is quantified with a
13-bit data block. If we calculate 13 bits per sample at 8000 samples
per second, we determine a data rate of 104,000 bits per second, or 104
kb/s.
Analog/Digital Converter
A data rate of 104 kbps is far too large to be economically
handled by a radio transmitter. In order to reduce the bitrate, the
signal is inputted into a speech encoder.A speech encoder is a device
that compresses the data of a speech signal. There are many types of
speech encoding schemes available. The speech encoder used in GSM is
called Linear Predictive Coding (LPC) and Regular Pulse Excitation
(RPE). LPC is a very complicated and math-heavy process, so it will
only be summarized here.
Remember that the ADC quantifies each audio sample with a 13-bit
"word". In LPC, 160 of the 13-bit samples from the converter are saved
up and stored into short-term memory. Remember that a sample is taken
every 125 µs, so 160 samples covers an audio block of 20ms. This 20ms
audio block consists of 2080 bits. LPC-RPE analyzes each 20ms set of
data and determines 8 coefficients used for filtering as well as an
excitation signal. LPC basically identifies specific bits that
correspond to specific aspects of human voice, such as vocal modifiers
(teeth, tongue, etc.) and assigns coefficients to them. The excitation
signal represents things like pitch and loudness. LPC identifies a
number of correlations of human voice and redundancies in human speech
and removes them.
The LPC/RPE sequence is then fed into the Long-Term Prediction
(LTP) Analysis function. The LTP function compares the sequence it
receives with earlier sequences stored in its memory and selects the
sequence that most resembles the current sequence. The LTP function
then calculates the difference between the two sequences. Now the LTP
function only has to translate the difference value as well as a
pointer indicating which earlier sequence it used for comparison. By
doing this is prevents encoding redundant data.
You can envision this by thinking about the sounds we make when we
talk. When we pronounce a syllable, each little sound has a specific
duration that seems short when we are talking but often lasts longer
than 20ms. So, one sound might be represented by several 20ms-block of
exactly the same data. Rather than transmit redundant data, LPC only
includes data that tells the receiving which data is redundant so that
it can be created on the receiving end.
Using LPC/RPE and LTP, the speech encoder reduces the 20ms
block from 2,080 bits to to 260 bits. Note that this is a reduction by
eight times. 260 bits every 20ms gives us a net data rate of 13
kilobits per second (kbps).
Speech Encoding
This bitrate of 13kbps is known as Full Rate Speech (FS). There is
another method for encoding speech called Half Rate Speech (HS), which
results in a bit rate of approximately 5.6kbps. The explanations in the
remainder of this tutorial are based on a full-rate speech bitrate
(13kbps).
Calculate the net data rate:
Description
Formula
Result
Convert ms to sec
20 ms ÷ 1000
.02 seconds
Calculate bits per second
260 bits ÷ .02 seconds
13,000 bits per second (bps)
Convert bits to kilobits
13,000 bps ÷ 1000
13 kilobits per sec (kbps)
As we all know, the audio signal must be transmitted across a
radio link from the handset to the Base Station Transceiver (BTS). The
signal on this radio link is subject to atmospherics and fading which
results in a large amount of data loss and degrades the audio. In order
to prevent degradation of audio, the data stream is put through a
series of error detection and error correction procedures called channel coding.
The first phase of channel coding is called block coding.
A single 260-bit (20ms) audio block is delivered to the block-coder.
The 260 bits are divided up into classes according to their importance
in reconstructing the audio. Class I are the bits that are most
important in reconstructing the audio. The class II bits are the less
important bits. Class I bits are further divided into two categories,
Ia and Ib.
Classes of Bits
The class Ia bits are protected by a cyclic code. The cyclic
code is run on the 50 Ia bits and calculates 3 parity bits which are
then appended to the end of the Ia bits. Only the class Ia bits are
protected by this cyclic code. The Ia and Ib bits are then combined and
an additional 4 bits are added to the tail of the class I bits (Ia and
Ib together). All four bits are zeros (0000) and are needed for the
next step which is "convolutional coding". There is no protection for
class II bits. As you can see, block coding adds seven bits to the
audio block, 3 parity bits and 4 tail bits, therefore, a 260-bit block
becomes a 267-bit block.
This 267-bit block is then inputted into a convolutional code.
Convolutional coding allows errors to be detected and to be corrected
to a limited degree. The class I "protected" bits are inputted into a
complex convolutional code that outputs 2 bits for every bit that
enters it. The second bit that is produced is known as a redundancy
bit. The number of class I bits is doubled from 189 to 378.
This coding uses 5 consecutive bits to calculate the redundancy bit,
this is why there are 4 bits added to the class I bits when the cyclic
code was calculated. When the last data bit enters the register, it
uses the remaining four bits to calculate the redundancy bit for the
last data bit. The class II bits are not run through the convolutional
code. After convolutional coding, the audio block is 456 bits
Now, one problem remains. All of this error detection and
error correction coding will not do any good if the entire 456-bit
block is lost or garbled. In order to alleviate this, the bits are
reordered and partioned onto eight separate sub-blocks. If one
sub-block is lost then only one-eighth of the data for each audio block
is lost and those bits can be recovered using the convolutional code on
the receiving end. This is known as interleaving.
Each 456-bit block is reordered and partitioned into 8 sub-blocks of 57 bits each.
These eight 57-bit sub-blocks are then interleaved onto 8 separate
bursts. As you remember from the TDMA Tutorial, each burst is composed
of two 57-bit data blocks, for a total data payload of 114 bits.
The first four sub-blocks (0 through 3) are mapped onto the
even bits of four consecutive bursts. The last four sub-blocks (4
through 7) are mapped onto the odd bits of the next 4 consecutive
bursts. So, the entire block is spread out across 8 separate bursts.
Taking a look at the diagram below we see three 456-bit
blocks, labeled A, B, and C. Each block is sub-divided into eight
sub-blocks numbered 0-7. Let's take a look at Block B. We can see that
each sub-block is mapped to a burst on a single time-slot. Block B is
mapped onto 8 separate bursts or time-slots. For illustrative purposes,
the time-slots are labeled S through Z.
Let's expand time-slot V for a close-up view. We can see how
the bits are mapped onto a burst. The bits from Block B, sub-block 3
(B3) are mapped onto the even numbered bits of the burst (bits
0,2,4....108,110,112). You will also notice that the odd bits are being
mapped from data from block A, sub-block 7 (bits 1,3,5....109,111,113).
Each burst contains 57 bits of data from two separate 456-bit blocks.
This process is known as interleaving.
Reordering, Partitioning, and Interleaving
In the following diagram, we examine time-slot W. We see that bits from
B4 are mapped onto the odd-number bits (bits 1,3,5....109,111,113) and
we would see bits from C1 mapped onto the even number bits (bits
0,2,4....108,110,112). This process continues indefinitely as data is
transmitted. Time-slots W, X, Y, and Z would all be mapped identically.
The next time-slot would have data from Block C and Block D mapped onto
it. This process continues for as long as there is data being
generated.
Interleaving
The process of interleaving effectively distributes a single
456 bit audio block over 8 separate bursts. If one burst is lost, only
1/8 of the data is lost, and the missing bits can be recovered using
the convolutional code.
Now, you might notice that the data it takes to represent a
20ms (456-bits) audio block is spread out across 8 timeslots. If you
remember that each TDMA frame is approximately 4.615ms, we can
determine that it takes about 37ms to transmit one single 456-bit
block. It seems like transmitting 20ms worth of audio over a period of
37ms would not work. However, this is not what is truly happening. If
you look at a series of blocks as they are mapped onto time-slots you
will notice that one sub-block ends every four time-slots, which is
approximately 18ms. The only effect this has is that the audio stream
is effectively delayed by 20ms, which is truly negligible.
In the diagram below, we can see how this works. The diagram
shows 16 bursts. Remember that a burst occurs on a single time-slot and
the the duration of a time-slot is 577 µs. Eight time-slots make up a
TDMA frame, which is 4.615ms. Since a single resource is only given one
time-slot in which to transmit, we only get to transmit once every TDMA
frame. Therefore, we only get to transmit one burst every 4.615ms. * If this is not clear, please review the TDMA Tutorial.
During each time-slot, a burst is transmitted that carries data from
two different 456-bit blocks. In the diagram below, Burst 1 carries
data from A and B, burst 5 has B and C, burst 9 has C and D, etc.
Looking at the diagram, we can see that it does take approximately 37ms
for Block B to transmit all of its data, (bursts 1-8). However, in
bursts 5-8, data from block C is also being transmitted. Once block B
has finished transmitting all of its data (burst 8), block C has
already transmitted half of its data and only requires 4 more bursts to
complete its data.
Block A completes transmitting its data at the end of the
fourth burst. Block B finishes in the eighth, block C, in the 12th, and
block D in the 16th. Viewing it this way shows us that every fourth
burst comepletes the data for one block, which takes approximately
18ms.
The following diagram illustrates the entire process, from audio sampling to partitioning and interleaving.
Data and signalling messages will be covered in a future tutorial.