Processing

Point cloud

Any depth estimation method discussed previously that involves a camera sensor will provide a depth map result. Using the camera’s calibration parameters, this depth is easily re-projected to the 3D space with the pinhole camera model.

Camera equation

In case the lens distortion is significant, it can be cancelled first to get back to a pinhole model. There are common frameworks for calibrating the optical parameters of the camera. In many cases, a color camera is available to provide color, but warping is necessary

When a camera is not involved, each point is a single measurement (e.g. LiDAR or digitizer). In all cases, the geometrical information is just as important as the measurement itself, and this is still an active research field.

Registration

When merging several scans or modalities, finding the spatial transformation to apply on each scan is a great challenge.

Marker-based: the scene includes easy to detect marker points, at least 3 visible per scan. The markers can either be simple checkerboards, or encode information (coded targets, Aruco markers). The latter is more robust as it makes each marker unique. Neural networks can lower the variance of the estimated marker location.

aruco

Point cloud registration: in the general case, the best transform for overlapping several point clouds can still be retrieved through iterative methods such as Iterative Closest Point. In the case of partial overlap or noisy scans, a Gaussian mixture model was proposed to handle the local ambiguity.

ICP

Noise removal

Any 3D scan will exhibit some form of noise, which can be handled in several ways:

Outlier removal: remove points that are too distant or suspiciously noisy compared to the local area.
Denoising: displace points based on an estimate of the local noise.

Denoising makes use of the sparsity of the underlying data in some natural feature space. This requires knowledge about natural data; most recent approaches embed this knowledge in a deep learning model.

Camera equation

Here is the previous example of ToF point cloud before denoising:

ToF scan - person by pythonzen on Sketchfab

Here it is after denoising with a residual neural network on the depth map:

ToF scan denoised - person by pythonzen on Sketchfab

Voxels

Voxels are the 3D equivalent of 2D pixels. In an occupancy map, each voxel contains a binary value (occupied or empty). More complex scanning can produce floating point or vector values for each voxel: this is commonly the case with medical imagery. A binary occupancy map can be obtained by thresholding these with specific criteria.

Kinect Fusion makes use of voxels as an intermediary step when merging estimates from several ToF cameras (or a single moving one).

Meshes

By far the most popular representation for 3D models in the context of 3D scanning, 3D printing and gaming.

The key challenge in scanning workflows is often centered around mesh construction and repair. Proprietary softwares like Agisoft Metashape provide many meshing algorithms and repair/denoising tools. There are some open source alternatives:

MeshLab

or simpler tools targeting printability:

Netfabb

More details on mesh processing are available here.