# DVB-T implementation in GNUradio – Viterbi decoder implementation

Later edit: source code added here: https://github.com/BogdanDIA/gr-dvbt

The picture above is not directly connected to the following discussion but it is showing a 64-QAM signal decoded with the gr-dvbt.

In this post I thought I should write about the Viterbi decoder implementation on which I spent more time than usual just because I tried to understand why the whole DVB-T decoding take so much processor time. Investigating the processor time used by each module it turned out that the most computing power hungry module was the Viterbi convolutional decoder.

The initial implementation of convolutional decoder I used was based on the gr-trellis implementation from gnuradio which I modified for hard bit decision decoding. It means that each input symbol is actually one bit as compared with the soft bit decision decoding in which each input symbol is represented on a number of bits representing the probability of receiving the transmitted symbol.

I did a calculation for the bitrate at the input of the convolutional decoder using various parameters of the DVB-T just to understand what will be the maximum rate expected at the input in the case of a real time decoding. Here it is the worst case scenario as an excerpt from the calculation done in an excel file:

It is seen that the rate after decoder in the worst case scenario is 34.36Mbit/s and the rate before decoder is 39.27Mbit/s for a FEC rate of 7/8. However, due to the fact that the 7/8 FEC comes actually from ½ FEC rate punctured then the actual data rate at the input of the decoder is 2*34.36Mbit/s=68.43Mbit/s. This is the maximum rate that can be processed in real time by the decoder.

I tried different implementation of convolutional decoder and the next one I tested was the Karn C implementation already existing in gnuradio 3.7.x. It gave some speed-up but still it was slow compared with what I needed.

I should mention that my target was to run in real time decoding of QAM-16, FEC rate ½, 2k OFDM which requires around 26Mbit/s rate at the entry of the decoder.

I tried another decoder implementation, namely the one from IT++ which gave me poor results too in terms of maximum data rate that can be processed.

So the next step was to actually craft my own Viterbi decoder implementation using SIMD instructions. The implementation requires SSE2 support in the processor and the next step is to move this implementation into it’s own VOLK kernel. Here are some results I measured with the various Viterbi decoder implementations I tried:

A note on depuncturing:

I developed the initial code for FEC rate ½ and therefore once it worked I started thinking about how to integrate the depuncturing in the whole solution as te DVB-T standard includes FEC for several n/(n+1) rates.

Usually depuncturing is done by adding symbols in the place of punctured ones and then run the decoder as it received data at the master rate which in my case is ½. For example the rate 2/3 comes out of the ½ rate by puncturing a bit. At the receiving side there is one symbol to add in order to restore the initial, not punctured, rate but it gives the risk of inserting the symbol in the wrong position if the start of the depuncturing is not synchronized. Therefore there is need to make sure that the depuncturing starts at the right position.

The next issue appearing in the case of depuncturing was: what is the unknown symbol to add? There are several solutions used in this case and this depends on many conditions for example on the soft/hard decoding. In the case of soft bit decision one can add a median value, for example 0 in the case of a symbol ranging from -127, 127. In the case of hard bit decision when the symbols are bits with values 0,1 I have seen solutions using insertion of alternating 0, 1 in the place of the unknown symbols. I tried this and it works for rate 2/3 decoding but for the other rates like 3/4, 5/6, 7/8 it did not work.

One solution on the above problem is to modify the decoder to be aware of the puncturing matrix(or puncturing vector) so that whenever the unknown symbol is processed, the Accumulate in the ACS (Accumulate, Compare, Select) is not done meaning that the symbol does not influence the result of ACS. This is what I used in my case, specifically inserting a symbol with value 2 and testing the value inside the decoder to decide whether to do the Accumulate or not.

The traceback length when depuncturing:

It is known that the traceback length in the Viterbi decoder should be of 5*K where K is the constraint length of the convolutional code. For example for the NASA code that is also used in DVB-T the constraint length is K=7. It gives the traceback 35 meaning that all paths will get back to the same state if we traceback with 5*K positions in the trellis.

However, it is clear that in the case of depuncturing the more unknown symbols we add the lengthier we need to go back in order to reach a stable state. Therefore we need to extend the traceback length depending on the code rate we want to achieve.

I can conclude with the followings: I needed to make the Viterbi decoder aware of the puncturing matrix and also adapt the traceback length for each FEC rate.

Another image capturing a lower SNR receiving with 8K OFDM, FEC rate 7/8, QAM-64. It still decodes the MPEG2-TS perfectly:

# DVB-T implementation in GNUradio – part 3

Later edit: source code added here: https://github.com/BogdanDIA/gr-dvbt

The blocks used for receiving are basically the transmitter blocks in the reverse order with a major difference that there is need for synchronization blocks in order to obtain a clean constellation.

OFDM symbol aquisition:

The task of this block is to synchronize the receiver so that a clean time domain signal is obtained before the FFT block. There are several subtasks that this block takes care of:

1. Use Cyclic Prefix (CP) to obtain the start of the OFDM symbol in time domain. Here I’ve chosen to use a MLSE algorithm that is presented in [REF1 – Van de Beek]. It minimizes a log likelihood function in order to obtain frequency correction, epsilon – a fractional of intercarrier frequency, and time correction, theta – number of samples to the detected start of OFDM symbol. Since the CP is added in front of the OFDM symbol and being a copy of the last part of the symbol there should be a correlation maximum that will signal the beginning of the symbol. This is called pre-FFT NDA (Non Data Aided) correction and basically will obtain sync on the beginning of the OFDM symbol and will assure the subcarriers will fit into the center of the FFT bin.

Note: The algorithms used in OFDM for 802.11 [Schmidl and Cox – REF2] are using a preamble that is specially created to create two identical halves in time domain. In DVB-T this option is not available and the pre-FFT correction is called Non Data Aided acquisition because no pilots are used to achieve the synchronization task. However, there will be pilots used for post-FFT synchronization and I will explain it later in this article.

2. Once the CP start is found and epsilon is also found, de-rotation of the signal is applied in order to obtain the correct time domain OFDM symbol.

Note: The MLSE algorithm is quite computational intensive due to the correlations necessary to be calculated. For that reason for the subsequent symbols only +/-4 samples is taken in consideration to assure the CP is found again. In case of a number of consecutive CP misses a full initial aquisition is triggered.

Note 2: The MLSE of Beek requires knowledge of SNR value. For now this is entered manually as a parameter to the processing block.

TODO: Add one methods of SNR estimation

3. The CP in front of the OFDM symbol is taken out

FFT for time to frequency domain conversion:

This is just plain FFT conversion for 2k, 8k or 4k depending on the parameters. Here the GNUradio block is used.

A plot of the constellation after the FFT is in order. One may see that the constellation is rotating and therefore requires more processing in order to be useful for de-mapping:

Plot is made when transmitting with USRP N210 and receiving with USRP N210, both equipped with WBX daughterboards. DVB-T parameters are 8Mhz bandwidth, FFT 2k, 16-QAM, FEC 1/2

Demodulation of Reference signals:
The DVB-T standard uses several sub-carriers to insert pilot signals that are used for synchronization, signal parameters transmission and equalization.

There are three pilot signals types used in DVB-T standard:

– scattered pilot signals

– continual pilot signal

– TPS (Transmission Parameter Signals)

1. Post-FFT DA (Data Aided) synchronization.

The continual pilots and (in my implementation) scattered pilots are used to obtain an integer frequency correction after the FFT is performed. The position of the scattered and continual pilots is known and therefore by using ML correlation of the signal with the expected values one can obtain an integer frequency correction. This is the number of bins the synchronizer needs to take in consideration when correcting the signal after the FFT.

There are two roots of frequency deviations that we need to take into account beside the nature of the transmission channel: Carrier and Sampling-clock deviations. These produces ICI (Inter Carrier Interferences) that are important due to the small intercarrier frequency and can be modelled as noise. See [Speth and all – REF3].

After pre/post-FFT carrier synchronization that is to be performed one time only, a continuous correction still need to be done for the residual correction that will appear during time using a PID-like algorithm described also in [Speth and all – REF3].

TODO: implement this correction.

2. Demodulation of TPS block

Transmission Parameter Signalling is used to assist the receiver in knowing the parameters of the stream that is received. The TPS is send using using predefined position pilots on witch  the data is modulated using DBPSK modulation. Being differential modulation, this is suitable for demodulation even though the constellation is rotating. The same bit of information is sent in an OFDM symbol so majority voting is used to decide the actual value of the data bit. There are 68 symbols in an frame and 4 frames in a superframe.

3. Equalization of signal based on pilot signals

A simple continual and scattered pilot based equalizer is used to perform correction of the data. The scattered pilots have a position inside the frame that is based on the following formula {k = Kmin + 3 × (l mod 4) + 12p p integer, p ≥ 0, k ∈ [Kmin; Kmax] } whre l is the symbol index. The idea behind scattered pilots is to allow for a better coverage of the whole channel when moving from one symbol to the other.

TODO: Implement DFE (Decision Feedback Equalizer) – See [Proakis – REF4]

The following plot shows the signal constellation for 16-QAM after the equalization:

References:

[REF1]: Jaan-Jaap van de Beek, ML Estimation of Time and Frequency Offset in OFDM Systems

[REF2]: Timothy M.Schmidl and Donald C. Cox, Robust Frequency and Timing Synchronization for OFDM

[REF3]: Michael Speth, Stefan Fechtel, Gunnar Fock, Heinrich Meyr, Optimum Receiver Design for OFDM-Based Broadband Transmission – Part I and II

[REF4]: John G. Proakis, Masoud Salehi, Digital Communications

# DVB-T implementation in GNUradio – part 2

Later edit: source code added here: https://github.com/BogdanDIA/gr-dvbt

In the previous post I presented the results when using the gnuradio DVB-T transmitter together with the RTL2832U Elonics receiver. Now I’m going to dig into more technical details regarding the actual implementation of the DVB-T standard and I will insist less on transmitter side and more on the receiver side. The standard is defined in ETSI 300 744 and beside that there is doxygen documentation provided with the gnuradio modules.

TX and how to get on the air:

The chain of the transmitter is like the one described in the image below (taken from ETSI):

This is how the GNUradio implementation is done using gnuradio-companion. The majority of the blocks is implemented as new GR blocks as nothing that already exists in GR is fitting into this standard:

It worth talking about each module so here it is:

Energy dispersal:

In order to disperse the energy in the MPEG2-TS source, this block does a xor of the actual data with the output of the PRBS generator. It also creates MUX frames composed of 8 MUX packets of 188bytes. The sync is done by replacing the 0x47 sync byte with 0xb8 at the beginning of the MUX frame.

Outer coder (Reed-Solomon):

Implements RS(188, 204) encoding with t=8 error correcting capability.

Convolutional interleaver:

The whole point of using an outer coder+interleaver+inner coder is to generate a long codeword.

The inner interleaver is of Ramsey type III (TODO add IEEE paper link here).

Inner coder:

This is a convolutional encoder with mother code 1/2 and puncturing rate so that will obtain 2/3, 3/4, 5/6, 7/8 rates.

Inner interleaver:

The inner interleaver is made of symbol interleaver and bitwise interleaver. The result should be prepared for mapping of number of bits to the constellation according to the used modulation type.

Mapper:

Maps the bits to a constellation and this depends not only on the type of modulation but also on whether the hierarchical modulation is used.

Reference signals block:

In order to support the synchronization and equalization at the receiver side, the standard adds three types of pilots: continual pilots, scattered pilots and TPS. Actually TPS (Transmission Parameter Signalling) is a way to send over the transmission parameters and this is done using differential modulation that is immune to phase rotations.  Decoding of TPS is the first thing to do at receiver after an initial synchronization.

FFT:

The COFDM (Coded OFDM) modulation used in DVB-T is creating frequency domain bins and then do an I-FFT to convert them to time domain. 2k, 8k and 4k FFT size can be used.

Cyclic prefix:

In order to prevent ISI, ICI in the SFN(single frequency networks) that are usually employed in the DVB-T implementations, a cyclip prefix (CP) of various sizes is used. It consist in copying of the last part of the time domain signal to the beginning of the signal. The fact that the signal becomes periodic helps in synchronizing at the receiver and many algorithms for time and frequency correction are based on CP.

Rational resampler:

The USRP N210 I’m using as receiver and transmitter has a clock of 100Mhz. Unfortunately, in DVB-T the clock at which the samples are out are at Fs=8*B/7 =9.14xxMsps/s for 8Mhz bandwidth and for that we need a resampler. This proved to be the most time consuming block (taken from GR) and probably it needs to be rewritten with another one in the future. I also noted that group delay is having variations so it needs to be investigated.

Processing power when running the transmitter:

I’m using a core i7 processor with 4 HT and 3.4Ghz frequency which consumes on transmitting 170% resources (less than 2 HT cores). This is where the things become interesting as long as there are a lot of improvements to do, e.g. using VOLK, do some calculations at init time with a trade-off between memory and processing. But this should be a chapter for itself at the end.

Finally, I attach a screenshot that shows the constellation and the FFT at tha baseband. Given the fact that the points in the WXwidget are so small I used the options to show the constellation with connected points, hope you like it 🙂

73, Bogdan – YO3IIU

# DVB-T implementation in GNUradio – part 1

Later edit: source code added here: https://github.com/BogdanDIA/gr-dvbt

Motivation for implementing DVB-T RX/TX:

I thought I should write about my work on implementing the ETSI 300 744 (DVB-T ) standard on GNUradio and what were the challenges I needed to overcome and challenges that still remained.

The entire work for DVB-T, both TX and RX, is going to be on github in the next couple of weeks.

The whole idea of the project was to provide a framework for researchers that want to play with various algorithms for synchronization, demodulation equalization, etc and none the less for individuals wanting to have a testing tool for real life implementation of the DVB-T standard.

The presentation here will be of low complexity and intended for understanding the principles that drive the digital broadcasting. I will write later on a more academic like paper with studies on effect of various SNR levels on the BER, MER and entire system together with further development directions. Given the complexity of the receiver, the tool can be used also as a benchmark to understand how different architectures can minimize the processing required together with lowering power consumption.

Let’s get started by telling you that looking for sources of open source DVB-T implementation I found none that can work in real life or that can be used as a research in real life (TODO: insert OpenDVBT, Barruffa, Pellegrini, other). Therefore I decided to dig the ETSI 300 744 standard and implement it step by step.

Actually, the whole started when I wanted to use the cheap RTL2832U dvbt stick on ham radio frequencies like 1200Mhz band where I did not have a transmitter available. As you may probably know, the Elonics dvb-t stick has a driver for Linux kernel available. And also you may know that the chip inside can go up more than 1Ghz frequency. Modifying the driver to accept frequencyes above the regular DVB-T frequencyes is easy like this:

— a/RTL2832-2.2.2_kernel-3.0.0/rtl2832u_fe.c
+++ b/RTL2832-2.2.2_kernel-3.0.0/rtl2832u_fe.c

–        .frequency_max      = 862000000,
+        .frequency_max      = 1700000000, //862000000

Now, let’s jump to some screenshots to show the functionality of the transmitter and the signal received with Elonics dvb-t stick. I am using Kaffeine in Ubuntu Linux:

Searching for the channel (see SNR):

Now the channel is found:

Display properties of the channel (see the frequency and the other parameters):

And finally a screenshot from Kaffeine playing the stream:

I used a MPEG2-TS file as an input file but also I did some tests with a webcamera Logitec C920 that is able to output H248 stream. I did not do many tests with the camera because an Ubuntu update changed the kernel and after that the whole USB system on my PC is failing (including USB mass storage) to sustain constant bitrate.

As you may know or imagine, the receiver was the most time consuming compared with the transmitter. I can say that implementing RX takes 3 times more time than implementing TX but that will be presented in the subsequent posts where I will dig into more technical details.

73, Bogdan – YO3IIU