Reconstruction of real-world scenes from a set of multiple images is a topic in Computer Vision and 3D Computer Graphics with many interesting applications. There exists a powerful algorithm for shape reconstruction from arbitrary viewpoints, called Space Carving. However, it is computationally expensive and hence can not be used with applications in the field of 3D video or CSCW as well as interactive 3D model creation. Attempts have been made to achieve real-time framerates using PC cluster systems. While these provide enough performance they are also expensive and less flexible. Approaches that use GPU hardware acceleration on single workstations achieve interactive framerates for novel-view synthesis, but do not provide an explicit volumetric representation of the whole scene. The proposed approach shows the efforts in developing a GPU hardware-accelerated framework for obtaining the volumetric photo hull of a dynamic 3D scene as seen from multiple calibrated cameras. High performance is achieved by employing a shape from silhouette technique in advance to obtain a tight initial volume for Space Carving. Also several speed-up techniques are presented to increase efficiency. Since the entire processing is done on a single PC the framework can be applied to mobile setups, enabling a wide range of further applications. The approach is explained using programmable vertex and fragment processors with current hardware and compared to highly optimized CPU implementations. It is shown that the new approach can outperform the latter by more than one magnitude.
Table of Contents
1 Introduction
1.1 Application
1.2 Classification
1.3 Performance
1.4 Contribution
1.5 Overview
2 Related Work
2.1 Shape from Silhouette
2.1.1 Image Segmentation
2.1.2 Foundations
2.1.3 Performance of View-Independent Reconstruction
2.1.3.1 CPU
2.1.3.2 GPU Acceleration
2.1.4 Performance of View-Dependent Reconstruction
2.1.4.1 CPU
2.1.4.2 GPU Acceleration
2.1.5 Conclusion
2.2 Shape from Photo-Consistency
2.2.1 Foundations
2.2.2 Performance of View-Independent Reconstruction
2.2.2.1 CPU
2.2.2.2 GPU Acceleration
2.2.3 Performance of View-Dependent Reconstruction
2.2.3.1 CPU
2.2.3.2 GPU Acceleration
2.2.4 Conclusion
3 Fundamentals
3.1 Camera Geometry
3.1.1 Pinhole Camera Model
3.1.2 Camera Parameters
3.1.2.1 Intrinsic Parameters
3.1.2.2 Extrinsic Parameters
3.1.2.3 Radial Lens Distortion
3.1.3 Camera Calibration
3.2 Light and Color
3.2.1 Light in Space
3.2.1.1 Radiance
3.2.2 Light at a Surface
3.2.2.1 Irradiance
3.2.2.2 Radiosity
3.2.2.3 Lambertian and Specular Surfaces
3.2.3 Occlusion and Shadows
3.2.4 Light at a Camera
3.2.5 Color
3.2.6 Color Representation
3.2.6.1 Linear Color Spaces
3.2.6.2 Non-linear Color Spaces
3.2.6.3 Color Metric
3.2.7 CCD Camera Color Imaging
3.3 3D Reconstruction from Multiple Views
3.3.1 Visual Hull Reconstruction by Shape from Silhouette
3.3.1.1 Shape from Silhouette
3.3.1.2 Discussion
3.3.1.3 The Visual Hull
3.3.1.4 Silhouette-Equivalency
3.3.1.5 Number of Viewpoints
3.3.1.6 Conclusion
3.3.2 Photo Hull Reconstruction by Shape from Photo-Consistency
3.3.2.1 Shape from Photo-Consistency
3.3.2.2 Discussion
3.3.2.3 Photo-Consistency
3.3.2.4 The Photo Hull
3.3.2.5 The Space Carving Algorithm
3.3.2.6 Voxel Visibility
3.3.2.7 Conclusion
4 Basic Algorithm
4.1 Data
4.1.1 Camera Parameters
4.1.2 Image Data
4.2 Reconstruction
4.2.1 3D Data Representation
4.2.2 Volumetric Bounding Box
4.2.3 Maximal Volume Intersection
4.2.4 Visual Hull Approximation
4.2.5 Photo-Consistent Surface
4.2.5.1 Active Source Camera Test
4.2.5.2 Photo Consistency Test
5 Advanced Algorithm
5.1 Overview
5.1.1 Deployment
5.1.2 Process Flow
5.2 Texture Processing
5.2.1 Lookup Table for Projection Coordinates
5.2.2 Mapping Image Data into Textures
5.2.3 Texture Upload and Processing Performance
5.2.4 GPU Image Processing
5.3 Destination Cameras
5.3.1 Discussion
5.3.1.1 Ray Casting vs. Multi Plane Sweeping
5.3.1.2 Virtual vs. Natural Views
5.3.2 Interleaved Depth Sampling
5.3.3 Active Destination Cameras
5.3.3.1 Source Camera Viewing Ray
5.3.3.2 Intersection of Volume and Source Camera Viewing Ray
5.3.3.3 Activity Decision
5.4 Reconstruction
5.4.1 Vertex Data
5.4.2 Vertex Shader Visual Hull Approximation
5.4.2.1 Decreasing the Sampling Error for Interleaved Sampling
5.4.2.2 Early Ray Carving
5.4.3 Fragment Shader Photo-Consistent Surface
5.4.3.1 Filling Holes
5.4.3.2 Modified Active Source Camera Decision
5.4.4 Fragment Shader Color Blending
5.4.5 Fragment Shader Render to Texture
5.5 Postprocessing
5.5.1 Extracting Texture Data
5.5.2 Filling Interior Volume Data
5.5.2.1 Ambiguities
5.5.2.2 Performance
6 Experiments
6.1 System Setup
6.2 Implementation
6.3 Datasets
6.4 Performance
6.4.1 Abstract Data Performance Experiments
6.4.1.1 CPU-GPU Texture Upload
6.4.1.2 Interleaved Sampling
6.4.1.3 Early Ray Carving
6.4.1.4 Fragment Shader CIELab-RGB Conversion
6.4.1.5 Porting all Load to the Fragment Processor
6.4.1.6 GPU-CPU Texture Read-back
6.4.1.7 FBO Texture Size
6.4.1.8 Impact of CPU-GPU Texture Upload on overall Performance
6.4.1.9 Number of Source Cameras
6.4.1.10 Number of Destination Cameras
6.4.2 Concrete Data Performance Experiments
6.4.2.1 Algorithmic Features
6.4.2.2 Destination Cameras
6.4.2.3 Volumetric Resolution
6.4.2.4 Volumetric Bounding Box
6.4.2.5 PCS Increments
6.4.3 Conclusion
6.4.3.1 Algorithmic Features
6.4.3.2 Parameters
6.4.3.3 GPU/CPU Comparison
6.5 Quality
6.5.1 Concrete Data Quality Experiments
6.5.1.1 Volumetric Resolution
6.5.1.2 Volumetric Bounding Box
6.5.1.3 PCS Increments
6.5.2 Visual Experiments
6.5.2.1 Image Segmentation
6.5.2.2 Interleaved Sampling and MV I
6.5.2.3 Camera Viewing Cone Intersection
6.5.2.4 Reconstruction of V HA and PCS
6.5.2.5 Volumetric Resolution
6.5.2.6 PCS Increments
6.5.2.7 Geometrical Score for Active Source Camera Computation
6.5.2.8 Range of Color Distances for Active Source Camera Computation
6.5.2.9 Labeling of Interior Space
6.5.3 Conclusion
6.5.3.1 Image Segmentation
6.5.3.2 BB, MV I and Viewing Cone Intersection
6.5.3.3 V HA and PCS
6.5.3.4 PCS Parameters
6.5.3.5 Labeling of Interior Space
7 Discussion and Enhancements
7.1 Summary
7.2 Limitations
7.3 Future Work
7.3.1 Online System
7.3.2 Performance
7.3.3 Quality
7.4 Annotation
Objectives & Themes
This work aims to develop a hardware-accelerated framework for real-time 3D reconstruction of dynamic real-world scenes using multiple calibrated cameras on a single workstation. The primary research focus is to achieve a volumetric representation of a scene by overcoming the high computational cost typically associated with traditional Space Carving algorithms through GPU-based parallel processing.
- Real-time 3D reconstruction using consumer-grade graphics hardware (GPU).
- Hybrid algorithm combining Shape from Silhouette and Shape from Photo-Consistency.
- Optimization of the reconstruction pipeline through interleaved sampling and active source camera selection.
- Elimination of high-cost PC cluster requirements by mapping image processing and reconstruction to a single PC.
- Integration of implicit visibility handling to improve performance and quality.
Auszug aus dem Buch
1.1 Application
The ability to reconstruct dynamic realworld scenes from multiple video streams enables a wide range of applications. 3D video extends the common 2D video in the way, that it is view-independent. Hence, the decision about a fixed camera from where the scene is viewed is shifted from the time and place of recording to the time and place of consumption. 3D video can be used in the context of personal and social human activities like entertainment (e.g. 3D games, 3D events [24], 3D TV [37], 3D video recorder [70]) and education (e.g. 3D books, 3D-based training) or for the preservation of common knowledge (e.g. ethnic dancing performances [39], special skills).
3D video is obtained by analyzing properties like scene geometry and color from multi-viewpoint video streams. This generally involves a high computational effort. Nevertheless, if the reconstruction can be computed in real-time, a broad field of further applications is facilitated. In the field of CSCW, more realistic and immersive telepresence and conferencing systems are created by using 3D cues [44]. In this context, PRINCE et al. [52] propose a framework, where a user can interact with a dynamic model of a remotely captured collaborator which is overlaid with the real world in a video-see-through HMD. Instead of inserting a dynamic scene into a real environment (AR), its also possible to combine it into a virtual world (MR). This enables for seamless integration of real and virtual content with appropriate visual cues like reflections, shading, shadowing, occlusion and collision detection. Interactive systems for reconstruction and insertion of actors into virtual worlds are proposed by HASENFRATZ et al. [21] and more recently POMI and SLUSALLEK [49], contributing to the field of virtual television studio techniques.
Dynamic 3D data may be also used as an intermediate result for further processing and information extraction. Scene analysis allows for object recognition or motion tracking. Latter is often applied as a non-intrusive approach to human motion capturing [43].
Summary of Chapters
1 Introduction: Provides an overview of the convergence of computer graphics and computer vision, identifying the potential of real-time 3D reconstruction and defining the scope and contributions of the thesis.
2 Related Work: Surveys existing research on Shape from Silhouette and Shape from Photo-Consistency, analyzing current GPU and CPU-based performance benchmarks.
3 Fundamentals: Establishes the theoretical framework, including camera geometry, radiometry, color representation, and the mathematical principles behind Shape from Silhouette and Shape from Photo-Consistency.
4 Basic Algorithm: Defines the core reconstruction approach independently of hardware, introducing the concepts of nested volumes, image segmentation, and photo-consistency testing.
5 Advanced Algorithm: Describes the hardware-accelerated mapping of the algorithm onto the GPU, detailing texture processing, destination camera rendering, and the optimized reconstruction pipeline.
6 Experiments: Presents quantitative and qualitative evaluations of the implemented system using various datasets to demonstrate performance gains and the impact of algorithmic parameters.
7 Discussion and Enhancements: Summarizes the thesis, discusses existing limitations, and suggests future improvements for online systems and performance optimizations.
Keywords
3D Reconstruction, Real-time Processing, GPU Acceleration, Space Carving, Shape from Silhouette, Photo-Consistency, Visual Hull, Photo Hull, Computer Vision, Volumetric Modeling, Image Segmentation, Voxel Visibility, Shader Programming, Interactive 3D, Multi-viewpoint Video
Frequently Asked Questions
What is the core purpose of this research?
The research aims to develop a real-time, hardware-accelerated framework for reconstructing 3D models of dynamic scenes from multiple calibrated video streams on a single standard workstation.
What are the primary methodologies used for 3D reconstruction?
The system utilizes a hybrid approach, combining Shape from Silhouette to determine an initial volume (the Visual Hull) and Shape from Photo-Consistency to refine this volume into a Photo Hull.
What is the role of the GPU in this framework?
The GPU serves as the main processing engine, handling image segmentation, ray-based reconstruction, and color blending, which significantly outperforms traditional CPU-based cluster approaches.
What makes this framework different from existing solutions?
Unlike many previous methods that rely on expensive PC clusters, this framework enables real-time volumetric reconstruction on a single PC by mapping rendering-based algorithms to the GPU.
How is visibility handled in the system?
The system uses an implicit visibility approach by selecting "active" source cameras based on color distance heuristics, avoiding the computational overhead of explicit ray-tracing or complex visibility data structures.
Which key metrics define the performance of the proposed method?
Performance is evaluated through frame rates (fps), system response times for texture uploads, the impact of varying volumetric resolutions, and the efficacy of algorithmic features like Early Ray Carving.
How does the "Interleaved Sampling" technique benefit the reconstruction?
It allows for increasing the sampling distance by a factor of 3, which significantly improves rendering performance while keeping the spatial sampling error constant in unoccluded areas.
What are the main limitations identified for the system?
Limitations include the reliance on high-bandwidth texture uploads for real-time performance and inherent ambiguities in labeling interior space, especially when objects are occluded in virtual views.
- Quote paper
- Christian Nitschke (Author), 2006, A Framework for Real-time 3D Reconstruction by Space Carving using Graphics Hardware, Munich, GRIN Verlag, https://www.grin.com/document/69735