Think OpenGL used all those 2,3,4D critters at API level. It must be very hardware friendly to reduce your pipeline to matrix product. Also your scene graph (tree) is just this, you attach relative rotations and translations to graph nodes. You push your mesh (stream of triangles) at tree nodes, and composition of relative transforms up to the root is matrix product (or was the inverse?) that transform the meshes that go to the pipeline. For instance character skeletons are scene subgraphs, bones have translations, articulations have rotations. That's why it is so convenient to have rotations and translations in a common representation, and a linear one (4D matrix) is super. All this excluding materials, textures, and so on, I mean.