Immersive Media and AR/VR

Contact person: Prof. Dr. Sebastian Knorr

Contact details:

Tel.: +49 3641 205 739


Web: Homepage


360-degree video, also called live-action virtual reality (VR), is one of the latest and most powerful trends in immersive media, with an increasing potential for the next decades. In particular, head-mounted display (HMD) technology like e.g. HTC Vive, Oculus Rift and Samsung Gear VR is maturing and entering professional and consumer markets. On the other side, capture devices like e.g. Facebook’s Surround 360 camera, Nokia Ozo and Google Odyssee are some of the latest technologies to capture 360-degree video in stereoscopic 3D (S3D).

However, capturing 360-degree videos is not an easy task as there are many physical limitations which need to be overcome, especially for capturing and post-processing in S3D. In general, such limitations result in artifacts which cause visual discomfort when watching the content with a HMD. The artifacts or issues can be divided into three categories: binocular rivalry issues, conflicts of depth cues and artifacts which occur in both monocular and stereoscopic 360-degree content production. Issues of the first two categories have been investigated for standard S3D content e.g. for cinema screens and 3D-TV. The third category consists of typical artifacts which only occur in multi-camera systems used for panorama capturing. As native S3D 360-degree video production is still very error-prone, especially with respect to binocular rivalry issues, many high-end S3D productions are shot in 2D 360-degree and post-converted to S3D.

Within the project QualityVR, we are working on video analysis tools to detect, assess and partly correct artefacts which occur in stereoscopic 360-degrees video production, in particular, conflicts of depth cues and binocular rivalry issues.

Contact:  Prof. Dr. Sebastian Knorr


Methods of storytelling in cinema have well established conventions that have been built over the course of its history and the development of the format. In 360° film, many of the techniques that have formed part of this cinematic language or visual narrative are not easily applied or are not applicable due to the nature of the format i.e. not contained the border of the screen. In this paper, we analyze how end-users view 360° video in the presence of directional cues and evaluate if they are able to follow the actual story of narrative 360° films. We first let filmmakers create an intended scan-path, the so-called director’s cut, by setting position markers in the equirectangular representation of the omnidirectional content for eight short 360° films. Alongside this, the filmmakers provided additional information regarding directional cues and plot points. Then, we performed a subjective test with 20 participants watching the films with a head-mounted display and recorded the center position of the viewports. The resulting scan-paths of the participants are then compared against the director’s cut using different scan-path similarity measures. In order to better visualize the similarity between the scan-paths, we introduce a new metric which measures and visualizes the viewport overlap between the participants’ scan-paths and the director’s cut. Finally, the entire dataset, i.e. the director’s cuts including the directional cues and plot points as well as the scan-paths of the test subjects, is publicly available with this paper.



CVMP Paper

Contact:  Prof. Dr. Sebastian Knorr


We introduce a novel interactive depth map creation approach for image sequences which uses depth scribbles as input at user-defined keyframes. These scribbled depth values are then propagated within these keyframes and across the entire sequence using a 3-dimensional geodesic distance transform (3D-GDT). In order to further improve the depth estimation of the intermediate frames, we make use of a convolutional neural network (CNN) in an unconventional manner. Our process is based on online learning which allows us to specifically train a disposable network for each sequence individually using the user generated depth at keyframes along with corresponding RGB images as training pairs. Thus, we actually take advantage of one of the most common issues in deep learning: over-fitting. Furthermore, we integrated this approach into a professional interactive depth map creation application and compared our results against the state of the art in interactive depth map creation.


DeepStereoBrush: Interactive Depth Map Creation

Contact:  Prof. Dr. Sebastian Knorr


The concept of 6 degrees of freedom (6DOF) video content has recently emerged with the goal of enabling immersive experience in terms of free roaming, i.e. allowing viewing the scene from any viewpoint and direction in space. However, no such real-life full 6DOF light field capturing solution exists so far. Light field cameras have been designed to record orientations of light rays, hence to sample the plenoptic function in all directions, thus enabling view synthesis for perspective shift and scene navigation. Several camera designs have been proposed for capturing light fields, going from uniform arrays of pinholes placed in front of the sensor to arrays of micro-lenses placed between the main lens and the sensor, arrays of cameras, and coded attenuation masks. However, these light field cameras have a limited field of view. On the other hand, omni-directional cameras allow capturing a panoramic scene with a 360° field of view but do not record information on the orientation of light rays emitted by the scene.   

Neural Radiance Fields (NeRF) have been introduced as an implicit scene representation that allows rendering all light field views with high quality. NeRF models  the scene as a continuous function, and is parameterized as a multi-layer perceptron (MLP). The function represents the mapping between the 5D spatial and angular coordinates of light rays emitted by the scene into its three RGB color components and a volume density measure. 

NeRF is capable of modeling complex large-scale, and even unbounded, scenes. With a proper parameterization of the coordinates and a well-designed foreground-background architecture, NeRF++ is capable of modeling scenes having a large depth, with satisfying resolution in both the near and far fields. 

Our motivation here is to be able to capture or reconstruct light fields with a very large field of view, in particular 360°. We focus on the question: how do we extract omni-directional information and potentially benefit from it when reconstructing a spherical light field of a large-scale scene with a non-converged camera setup?


Omni-NeRF: Neural Radiance Field from 360° image captures

Contact:  Prof. Dr. Sebastian Knorr


Colour transfer is an important pre-processing step in many applications, including stereo vision, sur- face reconstruction and image stitching. It can also be applied to images and videos as a post processing step to create interesting special effects and change their tone or feel. While many software tools are available to professionals for editing the colours and tone of an image, bringing this type of technology into the hands of everyday users, with an interface that is intuitive and easy to use, has generated a lot of interest in recent years.

One approach often used for colour transfer is to allow the user to provide a reference image which has the desired colour distribution, and use it to transfer the desired colour feel to the original target image. This approach allows the user to easily generate the desired colour transfer result without the need for user interaction.

In our project, the main focus is the colour transfer from a reference image to a 3D point cloud and the colour transfer between two 3D point clouds captured under different lighting conditions.

Contact:  Prof. Dr. Sebastian Knorr

Immersive Media Showreel

DeepStereoBrush - Interactive 2D-to-3D Conversion