There are already networks to generate depth maps out of 2D video, so hooking up the output to that and a simpler VR video encoder should probably work.
Will it be jank as hell? Oh yeah. Nothing is conveniently packaged in Local AI Land, unfortunately.