** Later edit: **source code added here: https://github.com/BogdanDIA/gr-dvbt

The picture above is not directly connected to the following discussion but it is showing a 64-QAM signal decoded with the gr-dvbt.

In this post I thought I should write about the Viterbi decoder implementation on which I spent more time than usual just because I tried to understand why the whole DVB-T decoding take so much processor time. Investigating the processor time used by each module it turned out that the most computing power hungry module was the Viterbi convolutional decoder.

The initial implementation of convolutional decoder I used was based on the gr-trellis implementation from gnuradio which I modified for hard bit decision decoding. It means that each input symbol is actually one bit as compared with the soft bit decision decoding in which each input symbol is represented on a number of bits representing the probability of receiving the transmitted symbol.

I did a calculation for the bitrate at the input of the convolutional decoder using various parameters of the DVB-T just to understand what will be the maximum rate expected at the input in the case of a real time decoding. Here it is the worst case scenario as an excerpt from the calculation done in an excel file:

It is seen that the rate after decoder in the worst case scenario is **34.36Mbit/s** and the rate before decoder is **39.27Mbit/s** for a FEC rate of 7/8. However, due to the fact that the 7/8 FEC comes actually from ½ FEC rate punctured then the actual data rate at the input of the decoder is 2*34.36Mbit/s=**68.43Mbit/s**. This is the maximum rate that can be processed in real time by the decoder.

I tried different implementation of convolutional decoder and the next one I tested was the Karn C implementation already existing in gnuradio 3.7.x. It gave some speed-up but still it was slow compared with what I needed.

I should mention that my target was to run in real time decoding of QAM-16, FEC rate ½, 2k OFDM which requires around 26Mbit/s rate at the entry of the decoder.

I tried another decoder implementation, namely the one from IT++ which gave me poor results too in terms of maximum data rate that can be processed.

So the next step was to actually craft my own Viterbi decoder implementation using SIMD instructions. The implementation requires SSE2 support in the processor and the next step is to move this implementation into it’s own VOLK kernel. Here are some results I measured with the various Viterbi decoder implementations I tried:

**A note on depuncturing:**

I developed the initial code for FEC rate ½ and therefore once it worked I started thinking about how to integrate the depuncturing in the whole solution as te DVB-T standard includes FEC for several n/(n+1) rates.

Usually depuncturing is done by adding symbols in the place of punctured ones and then run the decoder as it received data at the master rate which in my case is ½. For example the rate 2/3 comes out of the ½ rate by puncturing a bit. At the receiving side there is one symbol to add in order to restore the initial, not punctured, rate but it gives the risk of inserting the symbol in the wrong position if the start of the depuncturing is not synchronized. Therefore there is need to make sure that the depuncturing starts at the right position.

The next issue appearing in the case of depuncturing was: what is the unknown symbol to add? There are several solutions used in this case and this depends on many conditions for example on the soft/hard decoding. In the case of soft bit decision one can add a median value, for example 0 in the case of a symbol ranging from -127, 127. In the case of hard bit decision when the symbols are bits with values 0,1 I have seen solutions using insertion of alternating 0, 1 in the place of the unknown symbols. I tried this and it works for rate 2/3 decoding but for the other rates like 3/4, 5/6, 7/8 it did not work.

One solution on the above problem is to modify the decoder to be aware of the puncturing matrix(or puncturing vector) so that whenever the unknown symbol is processed, the Accumulate in the ACS (Accumulate, Compare, Select) is not done meaning that the symbol does not influence the result of ACS. This is what I used in my case, specifically inserting a symbol with value 2 and testing the value inside the decoder to decide whether to do the Accumulate or not.

*The traceback length when depuncturing:*

It is known that the traceback length in the Viterbi decoder should be of 5*K where K is the constraint length of the convolutional code. For example for the NASA code that is also used in DVB-T the constraint length is K=7. It gives the traceback 35 meaning that all paths will get back to the same state if we traceback with 5*K positions in the trellis.

However, it is clear that in the case of depuncturing the more unknown symbols we add the lengthier we need to go back in order to reach a stable state. Therefore we need to extend the traceback length depending on the code rate we want to achieve.

I can conclude with the followings: I needed to make the Viterbi decoder aware of the puncturing matrix and also adapt the traceback length for each FEC rate.

Another image capturing a lower SNR receiving with 8K OFDM, FEC rate 7/8, QAM-64. It still decodes the MPEG2-TS perfectly: