Hacker News

I'm an applied math PhD who thinks linear algebra is the best thing ever, and it's the nuts and bolts of modern AI, so for fun and profit I'll attempt a quick cheat sheet.

To manage expectations, this won't be very satisfying by itself. You have to do a lot of exercises for this stuff to become second nature. But hopefully it at least imparts a sense that the topic is conceptually meaningful and not just a profusion of interacting symbols. For brevity, we'll pretend real numbers are the only numbers that exist; assume basic knowledge of vectors; and, I won't say anything about eigenvalues.

1. The most important thing to know about matrices is that they are linear maps. Specifically, an m x n matrix is a map from n-dimensional space (R^n) to m-dimensional space (R^m). That means that you can use the matrix as a function, one which takes as input a vector with n entries and outputs a vector with m entries.

2. The columns of a matrix are vectors. They tell you what outputs are generated when you take the standard basis vectors and feed them as inputs to the associated linear map. The standard basis vectors of R^n are the n vectors of length 1 that point along the n coordinate axes of the space (the x-axis, y-axis, z-axis, and beyond for higher-dimensional spaces). Conversely, a vector with n entries is also an n x 1 column matrix.

3. Every vector can be expressed uniquely as a linear combination (weighted sum) of standard basis vectors, and linear maps work nicely with linear combinations. Specifically, F(ax + by) = aF(x) + bF(y) for any real-valued "weights" a,b and vectors x,y. From this, you can show that a linear map is uniquely determined by what it maps the standard basis vectors to. This + #2 explains why linear maps and matrices are equivalent concepts.

4a. The way you apply the linear map to an arbitrary vector is by matrix-vector multiplication. If you write out (for example) a 3 x 2 matrix and a 2 x 1 vector, you will see that there is only one reasonable way to do this: each 1 x 2 row of the matrix must combine with the 2 x 1 input vector to produce an entry of the 3 x 1 output vector. The combination operation is, you flip the row from horizontal to vertical so it's a vector, then you dot-product it with the input vector.

4b. Notice how when you multiply 3x2 matrix with 2x1 vector, you get a 3x1 vector. In the "size math" of matrix multiplication, (3x2) x (2x1) = (3x1); the inner 2's go away, leaving only the outer numbers. This "contraction" of the inner dimensions, which happens via the dot product of matching vectors, is a general feature of matrix multiplication. Contraction is also the defining feature of how we multiply tensors, the 3D and higher-dimensional analogues of matrices.

5. Matrix-matrix multiplication is just a bunch of matrix-vector multiplications put side-by-side into a single matrix. That is to say, if you multiply two matrices A and B, the columns of the resulting matrix C are just the individual matrix-vector multiplications of A with the columns of B.

6. Many basic geometric operations, such as rotation, shearing, and scaling, are linear operations, so long as you use a version of them that keeps the origin fixed (maps the zero vector to zero vector). This is why they can be represented by matrices and implemented in computers with matrix multiplication.