The Salty Economist

Things I Should Have Learned in High School
posts - 56, comments - 0, trackbacks - 0

F# Matrix Inversion (Linear Decomposition Explained)

Well, it's now time to bite off the big enchilada -- matrix inversion.

So, what method to use.  I first thought I would try the Gauss-Jordan method, but then when I picked up my copy of Numerical Recipes in C, I read the following:

"For inverting a matrix, Gauss-Jordan elimination is about as efficient as any other method.  For solving sets of linear equations, Gauss-Jordan elimination produces both the solution of the equations for one or more right-hand side vectors b, and also the matrix inverse A.  However, its principal weaknesses are (i) that is requires the right-hand sides to be stored and manipulated at the same time, and (ii) that when the inverse matrix is not desired.  Gauss-Jordan is three times slower than the best alternative technique for solving a single linear set (2.3).  The method's principal strength is that it is stable as any other direct method, perhaps even a bit more stable when pivoting is used (see below)."

"If you come along later with an additional right-hand side vector, you can multiply it by the inverse matrix, of course.  This does give an answer, but one that is quite susceptible to roundoff error, not nearly as good as if the new vector had been included with the set of right-hand side vectors in the first instance."

"For these reasons, Gauss-Jordan elimination should usually not be your method of first choice, either solving linear equations or for matrix inversion.  The decomposition methods in 2.3 are better.  Why do we give Gauss-Jordan at all?  Because it is straightforward, understandable, solid as a rock, and an exceptionally good "psychological" backup for those times that something is going wrong and you think it might be your equation solver."

"Some people believe that the backup is more than psychological, that Gauss-Jordan elimination is an "independent" numerical method.  This turns out to be mostly myth.  Except for relatively minor differences in pivoting, described below, the actual sequence of operations performed in Gauss-Jordan elimination is very closely related to that performed by routines in the next two sections."

So says Numerical Recipes.  Well, I guess we better get on the linear decomposition train.

So, just what is linear decomposition anyway?

Suppose you have a set of equations specified as such:

4x +  7y -  8z =  7
2x + 11y +  2z =  5
3x +  3y - 17z = 12

You can write this in matrix form:

X b = y

where X is a k x k matrix of the fixed values, b is a k x 1 vector of coefficients (x,y,z) and y is a vector of the right-hand side constants.  The problem ere is to solve for the x,y,z coefficients.

Now, some smart guy figured out that the X matrix can be decomposed into two matrices, say L and U, where L only has values beneath the main diagonal (with ones on the diagonal), and U has values on (and above) the main diagonal, in the form of:

L = [1  0  0]  U = [v  v  v] where the v's are some values
    |v  1  0|      |0  v  v]
    [v  v  1]      [0  0  v]

such that LU = X.

If L & U can be easily found, then solving the matrix inversion problem is also easy.

The solution works this way:

(1) Calculate L and U
(2) We know that LU b = y  (from the fact the X = LU)
(3) Let us assume we have some matrix Z (actually a vector), equal to the product of U and b: 
Z=Ub
(4) Substituting, we have LZ = y.
(5) Solving for Z is trivial, because L is triangular.  We start first by calculating the first value.  From this
knowledge, we can forward substitute to calculate the entire matrix Z.
(6) Once we know Z, we need to solve:  Ub = Z for b (we know U & our result Z)
(7) Again, this is trivial because U is triangular.  So we can easily calculate b.

So, how do we decompose X in L & U?

We need to take this matrix:

X = [4   7  -8]
    |2  11   2|
    [3   3 -17]

And decompose it into:

L = [1     0      0]  U = [4   7   -8  ]
    |0.5   1      0|      |0   7.5  6  |
    [0.75 -0.3    1]      [0   0   -9.2]

Be assured, if you multiply L x U, you will get X

This is where we lean on the Smart Guy:

This guy is Prescott Durand Crout (I borrowed the picture from the Web, I hope the MIT museum won't arrest me).  Anyway, he was a prof at MIT who in 1941 came up with an efficient, succinct way of decomposing a matrix.

Crout's algorithm works like this:

First ,set up two matrices, a lower triangular one and an upper triangular one, such as

L = [1  0  0]  U = [v  v  v] where  the v's are some values
    |v  1  0|      |0  v  v]
    [v  v  1]      [0  0  v]

The elements of the two matrices are computed together and are calculated column by column, from left to right

I'll work with the 3x3 matrix to see how the elements are solved.

The first column is easy.  We know that X[1,1] = 4 and that when multiplied together L x U must also yield 4 in the [1,1] position.  So we know that:

L[1,1] * U[1,1] + L[1,2] * U[2,1] + L[1,3] * U[3,1] = 4;

Thus, U[1, 1] = (4 - (L[1,2] * U[2,1] + L[1,3] * U[3,1])) / L[1,1]

So, if L[1, 1] = 1, then L[1,1] must equal 4.  (Note, both L[1,2] and L[1,3] are zero.

We then solve for the remaining elements in the first column of L, where j is the jth row.

For row 2:

L[2,1] * U[1,1] + L[2,2] * U[2,1] + L[2,3] * U[3,1] = 2

L[2,1] = (2 - (L[2,2] * U[2,1] + L[2,3] * U[3,1])) / U[1,1]

L[2,1] = 0.5

For the jth row we have:

We have L[1, j] = (X[j,1] - (L[j,2] * U[2,1] + L[j,3] * U[3,1])) / U[1,1]

This turns out to be equivalent to dividing the elements in the column by U[1,1] -- 4.0 in this case.

After solving the first column in our example we have

L = [1.0   0  0]  U = [4  v  v]
    |0.5   1  0|      |0  v  v]
    [0.75  v  1]      [0  0  v]

Now, let's work on the second column.

For U[2, 1] we know:

L[1,1] * U[2,1] + L[1,2] * U[2,2] + L[1,3] * U[2,3] = 7

U[2,1] = 7 - (L[1,2] * U[2,2] + L[1,3] * U[2,3]) / L[1,1]

Thus, if L[1,1] = 1, then U[2,1] must equal 7.

For U[2,2] we know:

L[2,1] * U[1,2] + L[2,2] * U[2,2] + L[2,3] * U[3,2] = 11

U[2,2]  = 11 - (L[2,1] * U[1,2] + L[2,3] * U[3,2]) / L[2,2]

With our example, this means:

(0.5 * 7.0) + (1.0 * U[2,2]) + 0.0 * v = 11.0

The cool thing is, We can now solve for U[2,2]

U[2,2] = (11.0 - (0.5 * 7))/1.0 = 7.5

So, we now have:

L = [1.0   0  0]  U = [4  7    v]
    |0.5   1  0|      |0  7.5  v]
    [0.75  v  1]      [0  0    v]

We now solve for L[3,2]:

For L[3, 2] we know:

L[3,1] * U[1,2] + L[3,2] * U[2,2] + L[3,3] * U[3,2] = 3;

L[3,2] = (3 - (L[3,1] * U[1,2] + L[3,3] * U[3,2])) / u[2,2]

L[3,2] = (3.0 - (.75 * 7.0 + 1.0 * 0)) / 7.5 = -0.3

 We now have:

L = [1.0   0    0]  U = [4  7    v]
    |0.5   1    0|      |0  7.5  v]
    [0.75 -0.3  1]      [0  0    v]

Now, let's work on the third column.

For U[3, 1] we know:

L[1,1] * U[3,1] + L[1,2] * U[3,2] + L[1,3] * U[3,3] = -17;

U[3,1] = (-17 - (L[1,2] * U[3,2] + L[1,3] * U[3,3])) / L[1,1]

Thus, if L[3, 1] = 1, then U[3,1] must equal -17.

For U[3, 2] we know:

L[2,1] * U[3,1] + L[2,2] * U[3,2] + L[2,3] * U[3,3] = 2

U[3,2] = (2 - (L[2,1] * U[3,1] + L[2,3] * U[3,3])) / L[2,2]

With our example, this means:

0.5 * -8.0 + 1.0 * U[3,2] + 0.0 * v = 2.0

We solve for U[3, 2]

U[3,2] = (2.0 - (0.5 x -8.0))/1.0 = 6.0

So, we now have:

L = [1.0    0    0]  U = [4   7    -8]
    |0.5    1    0|      |0   7.5   6]
    [0.75  -0.3  1]      [0   0     v]

Last, but not least, for U[3, 3] we know:

L[3,1] * U[1,3] + L[3,2] * U[2,3] + L[3,3] * U[3,3] = -8;

U[3,3] = (-8 - (L[3,1] * U[1,3] + L[3,2] * U[2,3]) ) / L[3,3]

With our example, this means:

0.75 x -8.0 - 0.3 x 6 + 1.0 x U[3,3] = -17

We now solve for U[3, 3]

U[3,3] = -17.0 - (.75 * -8.0) - (-.03 x 6) = -9.2

Finally we have:

L = [1.0   0    0]  U = [4  7    -8]
    |0.5   1    0|      |0  7.5   6]
    [0.75 -0.3  1]      [0  0    -9.2]

which is the result we are looking for.

Generalizing, we see that the lower triangular terms are calculated as

L[i,j] = (X[i, j] - Sum) / U[j,j]

where Sum = sum of L[i,k] * U[k,j] terms over the range k = 1 to j-1

We see that the upper triangular terms are calculated as

U[i,j] = (X[i, j] - Sum) / L[j,j] 

Note because  L[j,j] = 1

U[i,j] = X[i, j] - Sum

where Sum = sum of L[i,k] * U[k,j] terms over the range k = 1 to i-1

Ok, that's enough of the tedium, let's write some F# code.

Ughh....this post is already too long -- The Code is in the next post. 

Print | posted on Friday, July 24, 2009 9:35 AM |

Powered by:
Powered By Subtext Powered By ASP.NET