This looks really nice, but I cant help to think this is a stop gap solution like other splatting techniques. It's certainly better than NERFs, where the whole scene is contained in a black box, but reality is not made up of a triangle soup or gaussian blobs. Most of the real world is made up of volumes, but can often be thought of as surfaces. It makes sense to represent the ground, a table, walls, etc with planes, not a cloud of semi translucent triangles. This is like pouring LEGO on the floor and moving them around until you get something that looks ok from a distance, instead of putting them together. Obviously looking good is often all that's needed, but it doesn't feel very elegant.

Although, the normals look pretty good in their example images, maybe you can get good geometry from this using some post processing? But then is a triangle soup really the best way of doing that? My impression is that this is chosen specifically to get a final representation that is efficient to render on GPUs. I haven't done any graphics programming in years, but I thought you'd want to keep the number of draw calls down, do you need to cluster these triangles into fewer draw calls?

Is there any work being done to optimize a volumetric representation of scenes and from that create a set of surfaces with realistic looking shaders or similar? I know one of the big benefits of these splatting techniques is that it captures reflections, opacity, anisotropicity, etc, so "old school" photogrammetry with marching cubes and textured meshes have a hard time competing with the visual quality.

> I haven't done any graphics programming in years, but I thought you'd want to keep the number of draw calls down, do you need to cluster these triangles into fewer draw calls?

GPUs draw can draw 10,000's of vertices per draw call, whether they are connected together into logical objects or are "triangle soup" like this. There is some benefit to having triangles connected together so they can "share" a vertex, but not as much as you might think. Since GPUs are massively parallel, it does not matter much where on the screen or where in the buffer your data is.

> Is there any work being done to optimize a volumetric representation of scenes and from that create a set of surfaces with realistic looking shaders or similar?

This is basically where the field was going until nerfs and splats. But then nerfs and splats were such HUGE steps in fidelity, it inspired a ton of new research towards it, and I think rightfully so! Truth is that reality is really messy, so trying to reconstruct logically separated meshes for everything you see is a very hard way to try to recreate reality. Nerfs and splats recreate reality much easier.

A digital image is a soup, of RGB dots of various size.

Gaussian Splatting radically changed the approach to photogrammetry. Prior approaches to generate surface models, and mapping the captures to materials that a renderer would more or less rasterize with physically accuracy were hitting the ceiling of the technique.

NerF was also a revolution but is very compute intensive.

Even a browser, a mid range GPU, can render millions of splats at 60 frames per seconds. That's how fast it goes and less than a million dense scene can already be totally bluf the eye in most possible angles.

Splatting is the most advanced, promising and already delivered on the promise technique for photogrammetry. The limit is that can't do as much in term of modification to point clouds vs surface with great PBR attributes.

No, an image is a well ordered grid of pixels. The 3D variant would be voxels, and Nvidia recently released a project to do scene reconstruction with sparse voxels [0].

If you take these triangles, make them share vertices, and order them in a certain way, you have a mesh. You can then combine some of them into larger flat surfaces when that makes sense, draw thousands of them in one draw call, calculate intersections, volumes, physics, LODs, use textures with image compression instead of millions of colored objects, etc with them. Splatting is one way of answering the question "how do we reproduce these images in a way that lets us generate novel views of the same scene", not "what is the best representation of this 3D scene".

The aim is to find the light field that describes the scene, and if you have solid objects that function can be described on the surface of those objects. Seems like a much more elegant end result than a cloud of separate objects, no matter what shape they have, since that's much closer to how reality works. Obviously we need to handle volumetrics and translucency as well, but if we model the real surfaces as virtual surfaces I think things like reflections and shadow removal will be easier. At least gaussian splats have a hard time with reflections, they look good from some viewing angles, but the reflections are often handled as geometry [1].

I'm not arguing that it doesn't look good or that it doesn't serve a purpose, sometimes a photorealistic novel view of a real scene is all you want. But I still don't think it's the best representation of scenes.

[0] https://svraster.github.io/

[1] https://www.youtube.com/watch?v=yq6gtdpLUCo

I still love this older paper on Plenoxels : https://alexyu.net/plenoxels/

It made so much sense to me: voxels with view dependent color, using eg. spherical gaussians.

I don't know how it compares to newer techniques, probably badly since nobody seems to be talking about it.

They're mentioned in the SVRaster paper.

https://svraster.github.io/images/teaser.jpg