Real-time immersive video has been an area of interest for the research and standardization community over the past few years. It has gained particular importance in the era of 5G… Click to show full abstract
Real-time immersive video has been an area of interest for the research and standardization community over the past few years. It has gained particular importance in the era of 5G services. 3GPP (Third Generation Partnership Project) Release 17 includes Immersive teleconferencing for Multimedia Telephony services over IP Multimedia System. Viewport-dependent delivery (VDD) is a mechanism for bandwidth savings by assigning more bits to the part of the 360-degree video currently in the viewport and fewer or no bits to the areas of the 360-degree video that a user does not currently view. The technique has been extensively studied in the context of adaptive HTTP streaming of immersive video, where the 360-degree video is pre-encoded in different versions, and the appropriate version is selected and requested by the viewer’s client application, depending on their current viewport orientation. However, conversational video has more stringent latency requirements than HTTP streaming; the time from capture to rendering is in the order of a few milliseconds for a good user experience. Conversational immersive video also allows adapting the encoding during the media delivery session according to the user viewport and network bandwidth characteristics. This paper explores tile-based and viewport region encoding for VDD and how they apply to conversational video. In particular, we explore the challenges related to the absence of periodic intra-coded frames in conversational video. Based on simulation results, we show that VDD methods may not always be a performant alternative to viewport-independent delivery in conversational use cases. We also discuss possible encoder optimization that would improve the VDD performance.
               
Click one of the above tabs to view related content.