mobile-first it's how to degrade computing in first place.

Sad to see indeed; "Augmented Reality" in a rather wide definition and opportunistic computing ("I have a calculator with me because my phone has calculator software installed, and I keep a phone with me for tasks it's objectively rather good at (say, for example, FaceTime style video calls, with camera switchable between selfie (front) and surroundings (back))") seem the only actually good/deserving cases for mobile-first.

[Plane in the following refers to the image/sensor plane of the camera.] My understanding is that with both an in-plane and a normal component of translation, together with enough 3D rotation to anchor/calibrate the gyro, the 3D accelerometer's absolute scale can be translated to a fixed-to-earth (Swaying skyscraper doesn't count! On-a-train doesn't count!) static scene's Structure-from-Motion feature/anchor points. In-plane alone just gives you parallax that tells you the _distance ratios_ of the two objects that parallax to another as you translate in-plane; but once you add plane-normal translation, an absolute translation interacts additively to both object's distances thus letting you recover absolute scale not just distance ratios. Of course you'd hope for some suitably good features dense enough in the scene, start out with some optical flow or similar style to get a baseline on gyro calibration and (translation/linear) velocity zeroing, to then get a decent shot at being able to use the SfM point features with very little "brute force" in the SfM alignment/solution process.

AR interactivity allows directing the camera man to collect appropriately dense coverage of the scenario/area before allowing conditions to change (illumination/plant growth; people moving furniture; people moving back to actively use/occupy the space), before one could let the software refine the entire situation as a background task. Once sufficient refinement has been done (during which one would prefer to redirect the interactive AR compute resources to said refining task), one could quickly lock back into a now-static scene which could render the captured version anchored to real-time camera feedback from the real location to practically eliminate the traditionally annoying drift/tracking-artifacting. At least in places with enough light to allow the camera to track non-blurry views of the reference features despite the obviously interactive motion.

...what did i read...