Monday, February 9, 2015

February is DSP month!

DSP month continues...

Last month I unilaterally declared that January is DSP month, and posted a series of excerpts from the book I am working on, Think DSP.  Here is the series so far:

Chapter 1: Signals and Spectrums
Chapter 2: Harmonics
Chapter 3: Chirps
Chapter 4: Noise

Chapter 5: Autocorrelation

I didn't get through all of the chapters in January, so today I am happy to announce that February is also DSP month, according to me.

To celebrate, I am posting excepts from my favorite chapter: Chapter 6, the Discrete Cosine Transform.  I present the discrete cosine transform (DCT) before the discrete Fourier transform for two reasons:

1) The DCT has many practical applications; notably, it is used by MP3 and related formats for compressing music, JPEG and similar formats for images, and the MPEG family of formats for video.

2) I found a way to present both DCT and DFT that (I think) demystifies the idea of a transform; and starting with the DCT has the advantage that everything is real-valued (no complex numbers).

My explanation follows these steps:

  1. I start with the synthesis problem: given a set of frequency components and their amplitudes, how can we construct a waveform?
  2. Next I rewrite the synthesis problem using NumPy arrays. This move is good for performance, and also provides insight for the next step.
  3. We look at the analysis problem: given a signal and a set of frequencies, how can we find the amplitude of each frequency component? I start with a solution that is conceptually simple but slow.
  4. Finally, I use some principles from linear algebra to find a more efficient algorithm.

Synthesis #1

Suppose I give you a list of amplitudes and a list of frequencies, and ask you to construct a signal that is the sum of these frequency components. Using objects in the thinkdsp module, there is a simple way to perform this operation, which is called synthesis:
def synthesize1(amps, freqs, ts):
    components = [thinkdsp.CosSignal(freq, amp)
                  for amp, freq in zip(amps, freqs)]
    signal = thinkdsp.SumSignal(*components)

    ys = signal.evaluate(ts)
    return ys

amps is a list of amplitudes, freqs is the list of frequencies, and ts is the sequence of times where the signal should be evaluated.
components is a list of CosSignal objects, one for each amplitude-frequency pair. SumSignalrepresents the sum of these frequency components.
Finally, evaluate computes the value of the signal at each time in ts.
We can test this function like this:
    amps = numpy.array([0.6, 0.25, 0.1, 0.05])
    freqs = [100, 200, 300, 400]
    framerate = 11025

    ts = numpy.linspace(0, 1, framerate)
    ys = synthesize1(amps, freqs, ts)
    wave = thinkdsp.Wave(ys, framerate)
    wave.play()

This example makes a signal that contains a fundamental frequency at 100 Hz and three harmonics (100 Hz is a sharp G2). It renders the signal for one second at 11,025 frames per second and plays the resulting wave.
Conceptually, synthesis is pretty simple. But in this form it doesn’t help much with analysis, which is the inverse problem: given the wave, how do we identify the frequency components and their amplitudes?

Synthesis #2

Here’s another way to write synthesize:

def synthesize2(amps, freqs, ts):
    args = numpy.outer(ts, freqs)
    M = numpy.cos(PI2 * args)
    ys = numpy.dot(M, amps)
    return ys

This function looks very different, but it does the same thing. Let’s see how it works:
  1. numpy.outer computes the outer product of ts and freqs. The result is an array with one row for each element of ts and one column for each element of freqs. Each element in the array is the product of a frequency and a time, f t.
  2. We multiply args by 2 π and apply cos, so each element of the result is cos(2 π f t). Since the ts run down the columns, each column contains a cosine signal at a particular frequency, evaluated at a sequence of times.
  3. numpy.dot multiplies each row of M by amps, element-wise, and then adds up the products. In terms of linear algebra, we are multiplying a matrix, M, by a vector, amps.
The following figure shows the structure of this computation.


 M
 




0.6 
0.25 
0.1 
0.05 




amps
 
 . 
 . 
 tk 
 . 
 . 






...
...
abcd 
...
...












 . 
 . 
 e 
 . 
 . 






ys
 
 
.fj.


Each row of the matrix, M, corresponds to a time from 0.0 to 1.0 seconds; tk is the time of the kth row. Each column corresponds to a frequency from 100 to 400 Hz; fj is the frequency of the jth column.
I labeled the kth row with the letters a through d; as an example, the value of a is cos[2 π (220)tk].
The result of the dot product, ys, is a vector with one element for each row of M. The kth element, labeled e, is the sum of products:
e = 0.6 a + 0.25 b + 0.1 c + 0.05 d 

And likewise with the other elements of ys. So each element of y is the sum of four frequency components, evaluated at a point in time, and multiplied by the corresponding amplitudes. And that’s exactly what we wanted. We can use the code from the previous section to check that the two versions of synthesize produce the same results.

    ys1 = synthesize1(amps, freqs, ts)
    ys2 = synthesize2(amps, freqs, ts)
    print max(abs(ys1 - ys2))

The biggest difference between ys1 and y2 is about 1e-13, which is what we expect due to floating-point errors.
One note on linear algebra vocabulary: I am using “matrix” to refer to a two-dimensional array, and “vector” for a one-dimensional array or sequence. To be pedantic, it would be more correct to think of matrices and vectors as abstract mathematical concepts, and NumPy arrays as one way (and not the only way) to represent them.
One advantage of linear algebra is that it provides concise notation for operations on matrices and vectors. For example, we could write synthesize like this:
M=cos(2 π t ⊗ f
y=M a
where a is a vector of amplitudes, t is a vector of times, f is a vector of frequencies, and ⊗ is the symbol for the outer product of two vectors.


Analysis #1

Now we are ready to solve the analysis problem. Suppose I give you a wave and tell you that it is the sum of cosines with a given set of frequencies. How would you find the amplitude for each frequency component? In other words, given ysts and freqs, can you recover amps?
In terms of linear algebra, the first step is the same as for synthesis: we compute M = cos(2 π t⊗ f). Then we want to find a so that y = M a; in other words, we want to solve a linear system. NumPy provides linalg.solve, which does exactly that.
Here’s what the code looks like:
def analyze1(ys, freqs, ts):
    args = numpy.outer(ts, freqs)
    M = numpy.cos(PI2 * args)
    amps = numpy.linalg.solve(M, ys)
    return amps

The first two lines use ts and freqs to build the matrix, M. Then numpy.linalg.solve computes amps.
But there’s a hitch. In general we can only solve a system of linear equations if the matrix is square; that is, the number of equations (rows) is the same as the number of unknowns (columns).
In this example, we have only 4 frequencies, but we evaluated the signal at 11,025 times. So we have many more equations than unknowns.
In general if ys contains more than 4 elements, it is unlikely that we can analyze it using only 4 frequencies.
But in this case, we know that the ys were actually generated by adding only 4 frequency components, so we can use any 4 values from the wave array to recover amps.
For simplicity, I’ll use the first 4 samples from the signal. Using the values of ysfreqs and tsfrom the previous section, we can run analyze1 like this:
    n = len(freqs)
    amps2 = analyze1(ys[:n], freqs, ts[:n])
    print(amps2)
And sure enough, we get
[ 0.6   0.25  0.1   0.05 ]
This algorithm works, but it is slow. Solving a linear system of equations takes time proportional to n3, where n is the number of columns in M. We can do better.

Orthogonal matrices

One way to solve linear systems is by inverting matrices. The inverse of a matrix M is written M−1, and it has the property that M−1M = II is the identity matrix, which has the value 1 on all diagonal elements and 0 everywhere else.
So, to solve the equation y = Ma, we can multiply both sides by M−1, which yields:
M−1y = M−1 M a 
On the right side, we can replace M−1M with I:
M−1y = I a 
If we multiply I by any vector a, the result is a, so
M−1y = a 
This implies that if we can compute M−1 efficiently, we can find a with a simple matrix multiplication (using numpy.dot). That takes time proportional to n2, which is better than n3.
In general inverting a matrix is slow, but some special cases are faster. In particular, if M is orthogonal, the inverse of M is just the transpose of M, written MT. In NumPy transposing an array is a constant-time operation. It doesn’t actually move the elements of the array; instead, it creates a “view” that changes the way the elements are accessed.
Again, a matrix is orthogonal if its transpose is also its inverse; that is, MT = M−1. That implies that MTM = I, which means we can check whether a matrix is orthogonal by computing MTM.
So let’s see what the matrix looks like in synthesize2. In the previous example, M has 11,025 rows, so it might be a good idea to work with a smaller example:
def test1():
    amps = numpy.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    time_unit = 0.001
    ts = numpy.arange(N) / N * time_unit
    max_freq = N / time_unit / 2
    freqs = numpy.arange(N) / N * max_freq
    ys = synthesize2(amps, freqs, ts)

amps is the same vector of amplitudes we saw before. Since we have 4 frequency components, we’ll sample the signal at 4 points in time. That way, M is square.
ts is a vector of equally spaced sample times in the range from 0 to 1 time unit. I chose the time unit to be 1 millisecond, but it is an arbitrary choice, and we will see in a minute that it drops out of the computation anyway.
Since the frame rate is N samples per time unit, the Nyquist frequency is N / time_unit / 2, which is 2000 Hz in this example. So freqs is a vector of equally spaced frequencies between 0 and 2000 Hz.
With these values of ts and freqs, the matrix, M, is:
[[ 1.     1.     1.     1.   ]
 [ 1.     0.707  0.    -0.707]
 [ 1.     0.    -1.    -0.   ]
 [ 1.    -0.707 -0.     0.707]]

You might recognize 0.707 as an approximation of √2/2, which is cosπ/4. You also might notice that this matrix is symmetric, which means that the element at (jk) always equals the element at (kj). This implies that M is its own transpose; that is, MT = M.
But sadly, M is not orthogonal. If we compute MTM, we get:
[[ 4.  1. -0.  1.]
 [ 1.  2.  1. -0.]
 [-0.  1.  2.  1.]
 [ 1. -0.  1.  2.]]

And that’s not the identity matrix.

Making M orthogonal

But if we choose ts and freqs carefully, we can make M orthogonal. There are several ways to do it, which is why there are several versions of the discrete cosine transform (DCT).
One simple option is to shift ts and freqs by a half unit. This version is called DCT-IV, where “IV” is a roman numeral indicating that this is the fourth of eight versions of the DCT.
Here’s an updated version of test1:
def test2():
    amps = numpy.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    ts = (0.5 + numpy.arange(N)) / N
    freqs = (0.5 + numpy.arange(N)) / 2
    ys = synthesize2(amps, freqs, ts)

If you compare this to the previous version, you’ll notice two changes. First, I added 0.5 to tsand freqs. Second, I cancelled out time_units, which simplifies the expression for freqs.
With these values, M is
[[ 0.981  0.831  0.556  0.195]
 [ 0.831 -0.195 -0.981 -0.556]
 [ 0.556 -0.981  0.195  0.831]
 [ 0.195 -0.556  0.831 -0.981]]

And MTM is
[[ 2.  0.  0.  0.]
 [ 0.  2. -0.  0.]
 [ 0. -0.  2. -0.]
 [ 0.  0. -0.  2.]]

Some of the off-diagonal elements are displayed as -0, which means that the floating-point representation is a small negative number. So this matrix is very close to 2I, which means M is almost orthogonal; it’s just off by a factor of 2. And for our purposes, that’s good enough.
Because M is symmetric and (almost) orthogonal, the inverse of M is just M/2. Now we can write a more efficient version of analyze:
def analyze2(ys, freqs, ts):
    args = numpy.outer(ts, freqs)
    M = numpy.cos(PI2 * args)
    amps = numpy.dot(M, ys) / 2
    return amps

Instead of using numpy.linalg.solve, we just multiply by M/2.
Combining test2 and analyze2, we can write an implementation of DCT-IV:
def dct_iv(ys):
    N = len(ys)
    ts = (0.5 + numpy.arange(N)) / N
    freqs = (0.5 + numpy.arange(N)) / 2
    args = numpy.outer(ts, freqs)
    M = numpy.cos(PI2 * args)
    amps = numpy.dot(M, ys) / 2
    return amps

Again, ys is the wave array. We don’t have to pass ts and freqs as parameters; dct_iv can figure them out based on N, the length of ys.
We can test dct_iv like this
    amps = numpy.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    ts = (0.5 + numpy.arange(N)) / N
    freqs = (0.5 + numpy.arange(N)) / 2
    ys = synthesize2(amps, freqs, ts)

    amps2 = dct_iv(ys)
    print max(abs(amps - amps2))

Starting with amps, we synthesize a wave array, then use dct_iv to see if we can recover the amplitudes. The biggest difference between amps and amps2 is about 1e-16, which is what we expect due to floating-point errors.

Inverse DCT

Finally, notice that analyze2 and synthesize2 are almost identical. The only difference is that analyze2 divides the result by 2. We can use this insight to compute the inverse DCT:

def inverse_dct_iv(amps):
    return dct_iv(amps) * 2

inverse_dct_iv takes the vector of amplitudes and returns the wave array, ys. We can confirm that it works by starting with amps, applying inverse_dct_iv and dct_iv, and testing that we get back what we started with.

    amps = [0.6, 0.25, 0.1, 0.05]
    ys = inverse_dct_iv(amps)
    amps2 = dct_iv(ys)
    print max(abs(amps - amps2))

Again, the biggest difference is about 1e-16.

Summary

I hope this explanation helps to demystify the discrete cosine transform.  If you understand this way of thinking about it, you are 90% of the way to understanding the discrete Fourier transform, which is the topic for next time!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.