To remind oneself how to multiply matrices together, it suffices to remember how to apply a matrix to a column vector, and that ((A B) v) = (A (B v)).

For each 1-hot vector e_i (i.e. the row vector that has a 1 in the i-th position and 0s elsewhere), apply B e_i to get the i-th column of the matrix B. Then, apply the matrix A to the result, to obtain A (B e_i), which equals (A B) e_i . This is then the i-th column of the matrix A B. And, when applying the matrix A to some column vector v, for each entry/row of the resulting vector, it is obtained by combining the corresponding row of A, with the column vector v.

So, to get the entry at the j-th row of the i-th column of (A B), one therefore combines the i-th column of B with the j-th row of A. Or, alternatively/equivalently, you can just compute the matrix (A B) column by column, by, for each e_i , computing that the i-th column of (A B) is (A (B e_i)) (which is how I usually think of it).

To be clear, I don't have the process totally memorized; I actually use the above reasoning to remind myself of the computation process a fair portion of the time that I need to compute actual products of matrices, which is surprisingly often given that I don't have it totally memorized.

When I took linear algebra, the professor emphasized the linear maps, and somewhat de-emphasized the matrices that are used to notate them. I think this made understanding what is going on easier, but made the computations less familiar. I very much enjoyed the class.