Over the past year, the task of novel view synthesis has seen massive improvements on the SOTA with the introduction of 3D Gaussian Splatting (3DGS) based techniques, leveraging the Gaussian primitive's algebraic properties to sample rays analytically and optimize a scene in a fully differentiable manner.
Though 3DGS boasts rendering times that are more than an order of magnitude faster than alternatives like NeRFs, techniques to train the models have mostly relied on complicated heuristics that require many views with known poses and intrinsics, as well as a sparse point cloud representation of the scene. These priors have been traditionally very hard to obtain, and necessitate very accurate estimations to be able to work; however, recent work by Fan et al. has shown that by leveraging the dense point cloud output and camera estimations of Dust3r, splats can be trained seamlessly from sparse views letting Dust3r handle the prior generation.
Our contribution is to extend these insights to the dynamic case by reimplementing the core ideas of InstantSplat on top of the pipeline proposed by Luiten et al. in their brilliant paper "Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis", thus being the first to create an end to end pipeline that can recover dynamic 3d gaussian splat scenes from a low number of videostreams without intrinsics or extrinsics.
tbd