DSP

Optimal design is an art form in which the designer selects from a myriad of alternatives to bring the 'optimum' choice for user. In many complex systems the notion of 'optimum' is difficult to define. Indeed, the users themselves wil not agree. So, the 'best' system is simply the one in which the designer evaluates the options and takes the responsibilities!?

Image processing in the context of imaging science. The rate of discovery in the natural sciences has been accelerated by powerful computer-based imaging methods, which enable scientists to map certain object properties into an 'image space'. These methods also support the goals of medicine, education, art, commerce, and the media. Every digital imaging method involves a sequence of interdependent steps: namely, image-data acquisition, reconstruction and processing, recording and distribution, display/visualization, observation and analysis, and a criterion for image evaluation. When viewed within the context of imaging science, 'optimal' image processing must take account of how the image-data are acquired and the intervening steps leading to how the images are to be evaluated, based on a well-defined goal and criterion.

Multimedia science & art:

1. experimental (extended) media, 2. multi (cross/trans) media(matics), 3. new (digital) media, 4. post (meta) media

Immersive multimedia

The quest for a truly immersive media experience is almost the history of media technology. We have sought to create an experience just like being there from the earliest days of recorded sound, right up to the present day. Curiously, the desire for richer and more immersive productions has spawned technologies that help us in other areas.

FROM CLICKABLE PAGES TO WALKABLE SPACES

Immersive media = G2G formats @ E2E eco-systems (ARcore, ARkit, MRms)3D spatial acquisition and immergere reconstruction/reproduction = hyper-realistic media + personalized virtual walk + involving interactivityImaging modalities = omnidirectional / depth-enhanced / point cloud / light field / holographicAR/VR = Mix video/graphical formats + 3D registration + interactivityARframe@FaceTrackingSession = tracking (position, orientation, distances) + scene understanding (plane, hit, ligths) + rendering (viewport)Head-motion Parallax is displacement or difference in the apparent position of an object viewed from different viewing positions or viewing orientations.Motion-to-High-quality Latency is the time it takes between head motion and displaying content at high quality in a head-mounted device.Interactivity = precise motion + minimal latency + tracking + natural UIEcosystems = 3Dgraphics/Animation, VisualFX, CinematicVR, GamingArchitecture = The difference is the size of the viewing zone/volume (sitting/standing vs. some steps). The amount of data that is needed scales proportionately to the actual size of the display.
Immersion impossible?How immersive is enough? to trigger emotions -> change user's focus and perception (attention is driven by contrast and patterns)VR = immersion+presence (3D tracking&mapping ->real-time graphics engine)AR = overlay of content on the real world, but that content is NOT anchored (just aligned) to, or part of it. MR = overlay of synthetic content on the real world that is anchored to and interacts with the real world.

Volumetric video

Volumetric video here refers to having a number of cameras capturing the scene, out of which any other viewpoint to the scene can be synthesized so that the viewer has the feeling of being immersed into the scene. MPEG-I immersive media enables coding a volumetric scene captured from a multitude of cameras (inward looking and outward looking setups) in a regular or irregular arrangement. Inter-camera baseline distances can vary and may be narrow or wide. During exploration experiments, MPEG has realized that there is a lack of test material for video experiments, especially for the 6DoF and Dense Light Field processing and coding experiments. Understanding this need, we hereby solicit new test material, in particular using non-parallel camera setups. Furthermore, in order to prepare for more advanced features of next generation Immersive Video, capture/creation of non-Lambertian content with specular reflections, transparent objects are also encouraged. Content may be provided as computer-generated/synthetic 3D models of dynamic scenes, as this material can be used for rendering various viewpoints with computer graphics techniques, creating video footage required in all experiments and comparative studies. MPEG is also calling for natural content, both indoor and outdoor, directly captured with camera rigs. Content with objects close to the camera are also requested, since this will challenge the proposed technologies for parallax rendering, e.g. heavy motion parallax for nearby foreground objects. Content should be provided in any image-based representation format, e.g. lenslet format, or multiview+depth. Content with following properties is highly encouraged for preparing the second generation MIV and 6DoF experiments:

any projection type corresponding to a physical camera rig (e.g. perspective, fisheye).
fine geometry.
dynamic scenes with larger viewing volumes and more cameras.
complex light interactions within the scene.
scenes with basic transparencies introduced as RGBA images completed by fully opaque views to be used as reference source views.
deep image, with multiple attributes per pixel beyond geometry and texture, such as reflection, refraction, transparency, object, and material ID.
moving cameras with step-in/step-out motion and/or change in view orientation.
scenes with particle mediums such as fog, cloud, water.
non-static content, with biological entities (for e.g. people, cats, dogs, grass, hair/fur)
camera intrinsics and extrinsics should be provided.

Immersive video

MPEG-I MIV is the standard for coding of immersive video (2D+D) content that provides 6DoF rendering capability, allowing the user to see the scene from different angles and move in the scene, albeit typically in a limited space. The term 3DoF+ is used for a viewing space that enables head rotation but little navigation.Immersive video is defined as content synchronously captured by an array of cameras pointing towards a scene. Cameras may be converging, parallel or diverging and the scene may be captured or computer generated. Along withthe camera viewes, depth information of each camera is estimated and transmitted. Notice that the sorce data is not assumed to be clean. Camera calibration can be accurate, but depth maps may contain significant estimation and quantization errors. Beside noise, geometry information of immersive video can also have a much larger range, since it is describing a scene instead of a single object.

Point cloud

MPEG-I V-PCC point cloud is a set of points in 3D space, occasinally carrying attribute information, such as color/texture, reflectance, and normal vectors, among others. Lately, entertainment applications have used point clouds to create vlumetric models in real-time with realistic appearance. Generallzy, the acquistion of point clouds involves recording a subject with several passive and/or active cameras pointing inwards. Post-processing of the recorded videos generates an accurate 3D representation. The user can visualize such models from any viewpoint, that is, the viewer has six degrees of freedom (6DoF).A voxelized point cloud is a set of points constrained to lie on a regular 3D grid, which, without loss of generality, may be assumed to be the integer lattice. The coordinates may be interpreted as the address of a volumetric element, or voxel. A voxel whose address is in the set is said to be occupied; otherwise it is unoccupied. Each occupied voxel may have attributes, such as colour, transparency, normals, curvature, and specularity. A voxelized point cloud captured at one instant of time is a frame. A dynamic voxelized point cloud is represented as a sequence of frames.With the current MPEG-I V-PCC encoder implementation providing a compression of 125:1, a dynamic point cloud of 1 million points could be encoded at 8 Mbit/s with good perceptual quality. For the second approach MPEG-I G-PCC, the current implementation of a lossless, intra-frame encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

3D Mesh dynamic sequence

Polygonal 3D meshes represent surface of 3D objects with 3D points and connectivity data.Dynamic point clouds are a set of 3-D points, representing surfaces of objects that may vary temporally. Each point (vertex) in a 3-D point cloud includes a geometric position, represented by a 3-tuple (X,Y,Z), specifying coordinate values, as well as one or more attributes such as color, reflectance, intensity, surface normal, etc. Triangular mesh representation is also commonly used in point cloud applications. In addition to the position and attributes of vertices, the mesh representation also includes connectivity information between vertices. Triangle meshes (polyhedra) are the de facto standard for exchanging and viewing 3D data sets. A triangle mesh may be represented by its vertex data and byits connectivity. Vertex data comprises coordinates of all the vertices and optionally the coordinates of the associated normal vectors and textures. In its simplest form, connectivity captures the incidence relation between the triangles of the mesh and their bounding vertices. A mesh sequence is a set of mesh files with each mesh file containing a .obj file representing geometry, a .mtl file representing material and a .png file representing texture. In each .obj file, it contains position information for each vertex and connectivity information for each triangle and UV coordinate information to map the vertex to the texture map.

Immersive communication

Communication, like the atmosphere itself, is ubiquitous and essential for humans, and, with the development of new technologies, such as ubiquitous network, big data, 3D printing, virtual reality, and artificial intelligence, it has become almost impossible to live without it. In addition, means of communication have changed immeasurably. New research paradigm incorporates new features and factors of communication and a new theoretical framework named “immersive communication.” Pointing out that communication today has moved beyond the bidirectional mass communication of “the second media age” to ubiquitous, immersive communication in “the third media age,” the author discusses the definition, characteristics, information structure, and models of immersive communication. We feel that communication is becoming more and more ubiquitous and omnipresent; we are more and more closely integrated with the media, and recognize that a new mode of communication is greatly changing our way of life. It seems that we are returning to the original state of human existence, going in a circle from the end to the starting point. We have gone from direct face-to-face information and emotional communication between individuals, to using media as a communication interface, to the invisible and even disappearing interface, and now we return to the original starting point of “direct” communication.

Immersive video technologies 2022

In the past decades, digital video experiences have been continuously evolving towardsa higher degree of immersion and realism. This development has been possible thanksto a number of technological advances. Get a broad overview of the different modalities of immersive video technologies - from omnidirectional video to light fields LF and volumetric video VV - from a multimedia processing perspective.

From capture to representation, coding, and display, video technologies have been evolving significantly and in many different directions over the last few decades, with the ultimate goal of providing a truly immersive experience to users.
After setting up a common background for these technologies, based on the plenoptic function theoretical concept, Immersive Video Technologies offers a comprehensive overview of the leading technologies enabling visual immersion, including omnidirectional (360 degrees) video, light fields, and volumetric video.
Following the critical components of the typical content production and delivery pipeline, we present acquisition, representation, coding, rendering, and quality assessment approaches for each immersive video modality. We also review current standardization efforts and explores new research directions.

MPEG Immersive video technologies 2022

MIV MPEG Immersive Video ISO/IEC CD TR 23090-1:2022 Information technology — Coded representation of immersive media — Part 1: Architectures for immersive media
V3C Visual volumetric video-based coding ISO/IEC DIS 23090-5:2021/DAmd1 (2E) Information technology — Coded representation of immersive media — Part 5: Visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC) — Amendment 1: V3C extension mechanism and payload type
V-PCC Video-based point cloud compression ISO/IEC 23090-5:2021(1E) Information technology — Coded representation of immersive media — Part 5: Video-based point cloud compression (V-PCC)
G-PCC Geometry-based point cloud compression ISO/IEC FDIS 23090-9:2020 Information technology — Coded representation of immersive media — Part 9: Geometry-based point cloud compression

Metaverse

Metaverse, combination of the prefix meta (implying transcending) with the word universe, describes a hypothetical synthetic environment linked to the physical world. The journey starts with a piece of speculative fiction named Snow Crash, written by Neal Stephenson in 1992. In this novel, Stephenson defines the metaverse as a massive virtual environment parallel to the physical world, in which users interact through digital avatars. Since this first appearance, the metaverse as a computer-generated universe has been defined through vastly diversified concepts, such as lifelogging, collective space in virtuality, embodied internet/ spatial Internet, a mirror world, an omniverse: a venue of simulation and collaboration.Metaverse is a virtual environment blending physical and digital, facilitated by the convergence between the Internet and Web technologies, and Extended Reality (XR). According to the Milgram and Kishino’s Reality-Virtuality Continuum, XR integrates digital and physical to various degrees, e.g., augmented reality (AR), mixed reality (MR), and virtual reality (VR). Similarly, the metaverse scene in Snow Crash projects the duality of the real world and a copy of digital environments. In the metaverse, all individual users own their respective avatars, in analogy to the user’s physical self, to experience an alternate life in a virtuality that is a metaphor of the user’s real worlds. To achieve such duality, the development of metaverse has to go through three sequential stages, namely digital twins, digital natives, and eventually co-existence of physical-virtual reality or namely the surreality.