I'm working in an office with Real Estate developers who are using SketchUp to create concepts and then Enscape with VR to immerse in that experience. This is now evolving into using VR to demonstrate to potential investors what a development would look like and also to test out modifications. I would really like to make it possible to create a composite video of the person using VR with the render, some people call this "Mixed Reality". This is not to be confused with Windows MR Headsets, it is different, we use Vive Pro.
In game development there is an SDK called LIV that is essentially a Unity (or Unreal) asset that you add to your project and it automatically adds the cameras, makes them tracked and connects to the LIV compositor. The most common format is to take a camera, control its pose using an HTC Vive tracker and then calibrate the transform from tracker pose to camera Pose. This way if the tracker moves, the camera follows. The compositor then has a "quadrant view" that splits the camera into a few channels, One is the third person background, another is Foreground, then there is an Alpha channel that tells the compositor what parts of the foreground should be transparent. Finally they put the first person view in the bottom right quadrant in case you want to have a Picture in Picture with first person view.
I think that an Enscape integration could be similar but also has some differences. This is primarily because of scale, buildings are large so having a 1:1 relationship of tracker location to camera location would not really give a good perspective. Instead there would need to be a "Virtual Zoom" component where the tracker position would dictate the "look at" position of the camera but the radial distance of the camera from the person can can some multiple. We would then need to scale the image from the camera according to virtual zoom.
In any case, moving forward with VR integration, I think that this would be a very useful feature because now it allows the observers not in VR to have more empathy for the experience.
By the way, in the sort term all of the features above are really nice to haves, even using the tracker. Even if there was just a thrid person camera added that always "looked at" the person in VR was controlled by the keyboard, that would be minimum viable. Also the quadrant view is optional, for models of this scale I think that having just background image would be MVP.