Abstract
Three-dimensional reconstruction based on hyperspectral data can be applied in numerous fields. In recent years, the 3D Gaussian Splatting (3DGS) method has garnered widespread interest in RGB images due to its fast training and rendering speeds. Directly extending 3DGS to hyperspectral images faces challenges such as reduced training and rendering speeds and increased demand for computational resources, due to numerous channels in hyperspectral images. This paper proposes a faster, smaller, and noise-robust hyperspectral 3DGS method based on feature dimension compression. The method leverages the advantages of decoupling of spatial and spectral features using the point cloud representation in 3DGS. By employing a two-stage training approach and incorporating a high-frequency regions refinement, this method significantly reduces the training time on hyperspectral data, improves rendering speed, reduces the required storage space and achieves higher robustness to noise, while achieving comparable rendering results compared with the original 3DGS method. The PSNR of rendered image is 33.9 and training time is only 0.55 hour. This method can better perform tasks such as three-dimensional reconstruction and novel view synthesis of hyperspectral data in situations with limited computational resources.
Similar content being viewed by others
1 Introduction
Hyperspectral data, with its diverse and abundant spectral information, has garnered widespread attention [1]. Consequently, hyperspectral data has been applied in various fields such as artificial intelligence [2, 3], remote sensing [4, 5], and detection [6, 7]. Combining hyperspectral technology with 3D reconstruction can generate hyperspectral images of target scenes or objects from any perspective, providing a richer data source for various potential applications. A promising direction is to combine hyperspectral imaging with image-based 3D reconstruction method and novel view synthesis techniques such as NeRF [8], in order to reconstruct the hyperspectral characteristics of an object from multiple hyperspectral images taken from different viewpoints.
Although previous image-based 3D reconstruction method, such as NeRF [8], have achieved some early progress [9,10,11,12,13], they still face problems like slow training and rendering speeds, along with the high demand of computational resources. These problems have become significant barriers to the practical application of hyperspectral 3D reconstruction. The recently proposed 3DGS [14] technology has significant advantages in training and rendering speed. Therefore, introducing 3DGS technology into the field of hyperspectral 3D reconstruction has become an important research direction.
3DGS can be adapted to the hyperspectral field by modifying the number of channels of gaussians directly. However, due to the much higher number of channels in hyperspectral data compared to RGB data, the training and rendering speeds of hyperspectral 3DGS are greatly affected. Additionally, since hyperspectral data is much larger than ordinary photographs, the computational resources required for training also increase. Therefore, it is necessary to propose a novel view synthesis method for hyperspectral data that improves training and rendering speed while reducing computational requirements, without compromising the rendering quality.
This paper proposes a faster, smaller, and more noise-robust hyperspectral 3DGS method based on hyperspectral feature-dimension compression. This method first extracts the luminance information from hyperspectral images, separating the original hyperspectral images into luminance and luminance-normalized feature channels. Then, based on the luminance-normalized feature channels, principal component analysis which is a popular dimension reduction method, is performed to compress the spectral dimensions of the hyperspectral images. The proposed method conducts a two-stage training approach, training the 3D representation with the luminance channel and training the spectral feature representation with the compressed feature channels. Combined with the high-frequency region enhancement method, this approach significantly reduces training time, enhances rendering speed, and decreases training memory requirements under hyperspectral data while achieving a comparable rendering result with improved noise robustness, compared with the original 3DGS method.
The primary contribution of this paper is the introduction of a two-stage training framework. This approach innovatively decouples hyperspectral information processing into two distinct phases. The first phase trains 3D geometric properties of the Gaussian point cloud (e.g., position, scale, and orientation). The second phase exclusively optimizes spectral features without modifying the established 3D structure. Compared to the original 3D Gaussian Splatting (3DGS) method, our first stage requires only a small subset of data (e.g., luminance channel) to achieve comparable 3D reconstruction quality. The second stage eliminates the need for backpropagation steps on 3D parameters, significantly reducing computational costs. The conventional 3DGS jointly optimizes 3D geometry and spectral features in a single training loop. This simultaneous optimization consumes significantly more time and memory per iteration than our decoupled approach.
In summary, hyperspectral 3D reconstruction presents several challenges, including slow training and rendering speeds, high computational demands, and the inefficiency of directly adapting existing RGB-based 3DGS models to hyperspectral data with significantly higher spectral dimensions. This study proceeds under the assumption that hyperspectral information can be effectively represented through a compressed spectral feature space without sacrificing rendering quality. To address the above constraints, we propose a two-stage hyperspectral 3DGS framework based on spectral feature-dimension compression, which decouples 3D geometry learning and spectral feature optimization. This approach substantially reduces training time, computational cost, and memory requirements while maintaining comparable rendering performance and improving noise robustness.
2 Related work
Research on hyperspectral images
Hyperspectral images have been widely applied in various fields, including remote sensing and artificial intelligence. In the remote sensing domain, hyperspectral images are highly valued and play a crucial role in different detection tasks [4, 5]. In the field of artificial intelligence, hyperspectral images find applications in various areas. Hyperspectral images can be acquired by hyperspectral cameras with filters [15], which scan each channel of the spectrum and concatenate them together. Hyperspectral imaging by scanning takes minutes to take one image, which is hundreds of times slower then RGB cameras, since the spectrum is scanned one after another. An advanced approach is compressive imaging [16,17,18], the camera takes a snapshot and the hyperspectral images are recovered by algorithms afterward. Although snapshot hyperspectral imaging achieves a great success on imaging speed, recovering the hyperspectral image from the snapshot still takes a significant amount of time, while the quality of recovered images is still lower than scanning hyperspectral cameras. Another related field is active 3D hyperspectral imaging. This method requires the use of additional equipment to project onto the target, enabling the hyperspectral camera to acquire three-dimensional information, as discussed by J.Luo et al. [19] and others. While this method allows end-to-end acquisition of three-dimensional information, it has higher equipment requirements and usage conditions compared to fully image-based methods.
3D Research based on images
Research on image-based 3D reconstruction consists mainly of two parts. One part involves more traditional feature-based 3D reconstruction methods, such as Structure from Motion (SfM) [20]. SfM generates the 3D information of objects from different perspectives, commonly using feature point identification and matching for 3D reconstruction. Traditional feature point identification methods like LIFT [21], SIFT [22], etc., have been tested for many years. With the increasing application of deep learning methods, feature point identification based on deep learning has gained attention in recent years, with notable contributions from methods like SuperPoint [23] and HSSPN [24]. The other part involves methods based on neural radiance fields.
NeRF [8] (Neural Radiance Fields) is an emerging neural network-based 3D perspective synthesis method. It can generate high-quality new perspective images using only images from different viewpoints for training. By training on multiple viewpoint images, the neural network can internally construct an implicit 3D representation, establishing the three-dimensional volume density distribution characteristics of objects at each spatial point. Rendering pixel by pixel using this information, NeRF can generate images from arbitrary angles. This method surpasses traditional methods like Structure from Motion (SfM) [20].
However, the NeRF method still has issues such as slow rendering and training speeds and high computational resource costs. 3DGS [14] has received a lot of attention since it was introduced, which addresses the slow rendering speed problem of NeRF and can achieve real-time rendering in the 3D reconstruction of RGB images. Its training speed has also significantly improved and the quality of images rendered is high. Thus, there has been a lot of works on improving 3DGS.
Many researchers [25,26,27,28,29] have applied the 3DGS method to the SLAM (Simultaneous Localization and Mapping), leveraging the speed advantage of 3DGS to create high-precision maps with good visual effects while performing localization. Other researchers [30,31,32,33,34] have combined 3DGS with generative models like diffusion models to achieve the generation of 3D scenes from images or texts. Additionally, improving the surface generation accuracy of 3DGS has garnered significant attention. 3DGS uses point clouds for 3D representation to represent object surfaces. However, in real-world training results, gaussians sometimes fail to form a complete plane. In this regard, works like 2DGS [35] have achieved more precise surface convergence and better rendering effects. Moreover, there are works which applied 3DGS to real-time dynamic scene rendering, and proposed 4D Gaussian Splatting (4D-GS) [36] as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame.
The 3DGS method can be integrated with LiDAR point clouds to achieve enhanced accuracy and efficiency [37]. Additionally, 3DGS shows potential for target reconstruction in short-range detection scenarios, addressing stringent requirements for information authenticity and timeliness in military or surveillance reconnaissance platforms [38].
Some groups have also focused on further enhancing the speed of 3DGS and reducing its storage data volume. Bernhard Kerbl et al. [39] introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels. These efforts have significantly reduced the time and space costs of rendering. Although these methods can also be applied to hyperspectral data, we are aware of no work addressing this task. The method proposed in this paper can also be combined with the aforementioned methods to achieve faster and better results.
3 Method
This section first introduces the overall process of the 3DGS algorithm based on hyperspectral data dimension compression, followed by an explanation of the algorithm details.
3.1 Overview of the proposed method
Although the original 3DGS is designed for RGB images, it can be easily extended to the hyperspectral field. However, because the number of channels in hyperspectral images is much greater than in RGB images (the hyperspectral images used in this paper have 34 channels, which is eleven times that of RGB images), directly extending 3DGS to hyperspectral images results in long training times, slower rendering speeds, and increased memory usage. The direct extension of the original 3DGS demands high computational resources, making it infeasible to conduct related work when resources are insufficient. The method proposed in this paper aims to significantly reduce the training time and computational resource requirements of 3DGS in the hyperspectral field, making hyperspectral 3DGS applicable in more common scenarios.
The method proposed in this paper is a 3DGS method based on hyperspectral data with spectrum dimension compression. It achieves comparable rendering results compared with the original 3DGS while significantly improving training speed, rendering speed, and reducing memory usage. The comparison between overall process of the proposed method and the original 3DGS training process is shown in Fig. 1.
Comparison of the 3DGS based on the spectrum compression and the original 3DGS
The original 3DGS simultaneously trains the 3D pose and color information of Gaussian point clouds. By simultaneously backpropagating the error to the pose and color of the Gaussian points, the 3D point cloud gradually converges and eventually completes the training, as shown in the lower part of Fig. 1. The upper part of Fig. 1 shows the main process of the proposed method, which is a 3DGS method based on hyperspectral channel compressed data. It first extracts the luminance channel from the hyperspectral data, then performs spectrum compression by PCA (Principal Component Analysis) on the remaining luminance-normalized feature channels. After obtaining a few principal channels from PCA, the 3DGS training is conducted on the compressed feature channels.
The proposed method uses a two-stage training approach. In the first stage, the 3D features are trained using the luminance channel. After obtaining 3D point cloud with a relatively accurate pose, the well-trained point cloud pose information is used as a prior to assist the subsequent feature channel training. There are two benefits of this training approach. First, only one channel is trained at any given time, which improves training speed; second, since the images of luminance channel already contain most of the 3D information that is needed for training, using the point cloud with well-trained pose reduces the difficulty of training for the subsequent feature channels. Due to the loss of information during compression and the typical level of noise in the hyperspectral images, the signal-to-noise ratio of luminance image is relatively low. This problem leads to blur in the high-frequency areas of the training results and there are not enough small Gaussian points generated during training to represent them. To solve this problem, after a period of training and obtaining an initial rough point cloud representation in the first stage, a high-frequency area enhancement method is added to improve the representation ability of the Gaussian point cloud in high-frequency regions, thereby enhancing the final effect.
The images rendered by the model are compressed images, which can be restored to full-spectrum images through the inverse process of compression. In the proposed method, the training and rendering parts remain consistent with the original 3DGS in terms of methodology.
3.2 Implementation of the proposed method
The overall process of the proposed method is to first compress the full-spectrum hyperspectral image into a luminance-normalized channel-compressed image with fewer channels and a luminance channel which is a gray scale image. Then train the model. The rendered image is also a channel-compressed image, which is then restored to the full-spectrum image through an inverse recovery process. The overall process is shown in Fig. 2:
Compressing, training and restoring, the overall process of the 3DGS method based on spectrum compression
Starting on the left side of Fig. 2, the full-spectrum image is first compressed and luminance channel is extracted. The luminance channel is used as the data source for 3D structure training. The basic structure of the 3D point cloud is constructed through coarse training and refinement processes. Using the well-trained point cloud structure as a foundation, subsequent feature channels are added sequentially for training. After the training is completed, the images generated by the model are channel-compressed images composed of several (fewer than 10) feature channels. These images are then restored to the original full-spectrum images through the inverse process of compression. Our method significantly improves the speed of training, enhances rendering speed in some conditions, and reduces storage volume.
3.2.1 Spectrum dimension compression of hyperspectral images
In this section, details of the spectral dimension compression method for hyperspectral images are described. A hyperspectral image is separated into two parts: a luminance channel and luminance-normalized feature channels. Principal Component Analysis (PCA) is then applied exclusively to the luminance-normalized feature channels. Further elaboration on these steps is provided in the following subsections.
First, let \(\hat{C}\) represent the matrix composed of pixels from all hyperspectral images. For totally n pixels and m channels for each pixel:
where each row represents the spectral vector of a pixel, and the columns represent different pixels.
The hyperspectral channel compression method used in this paper is shown in Fig. 3.
Details of the compression process of the 3DGS method based on spectrum compression
In Fig. 3, the original hyperspectral image is first separated into luminance channel by extracting brightness, and a luminance-normalized feature image. The normalized feature image is then projected to the principal vectors. The projected images and the corresponding eigenvectors are shown on the right of Fig. 3.
The dimension reduction process conducts principal component analysis (PCA). Hyperspectral multichannel data is projected onto a few main components according to the data’s distribution, which can greatly compress the number of channels with minimal information loss. Due to the sparsity of hyperspectral data distribution, only the first few feature channels are needed to effectively represent most of the spectral information. Moreover, because of the continuity of the spectrum, as the spectral resolution increases, resulting in more channels, the required representation dimensions do not increase significantly, further enhancing the compression rate.
Before dimension reduction, luminance normalization on each pixel is performed, separating the full-spectrum hyperspectral image into luminance channel and feature channels.
Where \(\hat{C}\) is the matrix of pixels composed of all hyperspectral images,\(\vec {C}\) is the spectral vector of any selected pixel in the selected hyperspectral image, B is the luminance value of the pixel, F is the vector of the pixel consist of luminance-normalized features, \(channel_{n}\) is the intensity value of the pixel at channel n. This approach is taken for two reasons:
-
1.
The luminance of an image includes most of the 3D information, which can be used as the data source for training the pose of Gaussian point clouds, including position, direction and scale. So, the luminance channel extracted serves as the data source for the first stage of training.
-
2.
All spectral features of objects in hyperspectral images are modulated by the intensity of the light. In the feature space, the distribution of pixels along the light intensity direction is more significant than in other spectral directions. If PCA is performed directly, the features of the first few principal component channels will be contaminated by light source spectrum, causing their eigen vectors to have similar features as that of the light source, thus reducing other spectral features presented. This means more feature channels (greater than 10) are needed to restore the spectral information with the same quality, slowing down training and decreasing the final effect. Experiments showed that training on hyperspectral data conducted direct PCA without luminance normalization resulted in images with Peak-Signal-Noise-Rate(PSNR) of less than 30, which is worse than the current method. The related ablation study is in 4.2.6 of this paper.
To ensure that all the images in the training set have consistent spectral representation bases and to reduce PCA computation time, the proposed method first uniformly samples pixels from all training set images to calculate the correlation matrix, which is then decomposed to obtain a series of orthogonal bases.
Let \(\hat{F_s}\) be the subset of luminance-normalized spectral vectors sampled from \(\hat{F}\).
Normalization of the features of sampled pixels are conducted, i.e., subtracting the mean values of each channel.
After normalization, calculation of the correlation coefficient matrix of the spectral dimensions is conducted, followed by the decomposition of the correlation coefficient matrix to obtain the vector consisting of eigenvalues \(\vec {V_{eigen}}\), and the eigenvector matrix \(\hat{M_{eigen}}\)
New spectral bases are generated via eigendecomposition, represented as column vectors in the matrix \(\hat{M_{eigen}}\). The spectral vectors of pixels in the hyperspectral image are represented under the new basis by projecting them onto these basis vectors. Specifically, for a pixel with a spectral vector \(\vec {F_0}\) and a basis \(\vec {M_0}\), the projection is computed as follows:
The result of projecting a spectral vector onto a single basis vector is a scalar value. Projecting the spectral vector onto multiple basis vectors generated by PCA produces a new feature vector.
Projecting the spectral vector of all pixels onto the selected orthogonal basis, results in the values of the pixel spectra on the new basis, which is the compressed basis.
In Formula 11, each column in \(\hat{M_{eigen}}\) represents a basis generated by PCA. These basis vectors must be normalized as specified in Formula 9, resulting in column-wise normalization of the matrix.
Performing the operations above on each pixel produces spectrum compressed images, as shown on the right side of the Fig. 3. The right side of each channel image displays the corresponding eigenvector for that channel.
The eigenvectors are not sorted by eigenvalue magnitude because luminance normalization is performed prior to PCA. Consequently, feature vectors (eigenvectors) associated with large eigenvalues contribute significantly to the luminance-normalized hyperspectral image but do not necessarily proportionally contribute to the original image. Therefore, the feature vectors are reordered based on their contribution to the Peak Signal-to-Noise Ratio (PSNR) of the restored full-spectrum image. This is achieved by restoring the hyperspectral image from each PCA-generated channel and identifying which channel maximizes the PSNR improvement. Specifically, during the first channel selection, the hyperspectral image is restored using each individual PCA-generated channel. The channel yielding the highest PSNR is selected as the first component. During the second channel selection, from the remaining channels, each candidate is combined with the previously selected first channel to restore the hyperspectral image. The channel producing the largest PSNR increase is chosen as the second component. This procedure is repeated progressively combining selected channels with remaining candidates and selecting the one that maximizes PSNR until all channels are ordered. The final order of channels is determined accordingly.
The spectral restoration process is achieved through the linear superposition of feature vectors, i.e., transforming the compressed spectral basis back to the original spectral basis. First, restore the luminance-normalized hyperspectral image, then multiply it by the luminance value to recover the original spectral image.
Since each scene has a distinct spectral distribution, PCA yields different bases, and therefore the bases used for compression cannot be shared across scenes. For each scene, PCA must be performed to obtain an appropriate basis. As this work employs sampling and other acceleration strategies for PCA, the overall efficiency during practical use is not affected.
3.2.2 High-frequency information enhancement
The refinement process employed in this paper is based on high-frequency information enhancement. Although the luminance channel essentially includes most of the object’s three-dimensional information and theoretically can guide the 3DGS to reconstruct a relatively precise 3D point cloud, it has a low signal-to-noise ratio due to noises in each channel of the original spectrum. In this context, although the main structure of the 3D object can be effectively reconstructed by the Gaussian point cloud, the Gaussian point cloud cannot accurately restore the detailed information of the target object due to noises. This is usually reflected as a blur of high-frequency information in the rendered images of the test set. To overcome this issue, this paper proposes a method to enhance the high-frequency detail performance of the rendered images by using high-frequency region point cloud enhancement, as shown in Fig. 4.
Process of high-frequency information enhancement. The image on the left is the high-frequency mask for selecting Gaussian points inside the high frequency area. The middle image is rendered by Gaussian points selected by the high frequency mask before refinement. The image on the right is rendered by Gaussian points selected by the mask after refinement
The high-frequency information enhancement algorithm proposed in this paper first extracts high-frequency regions of the images in the training set through Fourier transform and Gaussian filtering, combined with binarization and morphological close operation, to form a high-frequency mask as shown on the left in Fig. 4. Specifically, for the luminance channel image of a certain view, a fast Fourier transform is performed to obtain frequency map of the image. A Gaussian filter is then used to filter out areas in the spectrum with a distance of less than 50 pixels from the center, which indicates low frequency components of the image. An inverse fast Fourier transform is then performed to obtain the high-frequency area. Then, the result of the inverse Fourier transform is converted into a binary mask containing only 0 and 1 values through a binarization method (e.g., Otsu). During training, the rendered image is multiplied by the high-frequency mask, and the loss is calculated with an image full of zeros. Thus, only those Gaussian points which contribute to the high frequency areas within the mask have gradients. A backpropagation algorithm is conducted to determine which Gaussian points are located within the mask range. Within this range, the size and gradient thresholds of Gaussian points are dynamically reduced, making the Gaussian points in the high-frequency region smaller.
Since the Fourier transform of a Gaussian function remains a Gaussian function, reducing its scale in the spatial domain means it occupies a higher frequency in the frequency domain. Therefore, after high-frequency information enhancement, the spatial representation within this region has high-frequency characteristics.
Although theoretically lowering the global Gaussian point size threshold can also achieve a high-frequency representation, the 3DGS method requires satisfying the condition that the gradient is greater than the gradient threshold to perform split and clone operations. High-frequency regions, although lacking detail, are often not the areas with the highest gradients. A low global gradient threshold will result in too many Gaussian points in other low-frequency areas, increasing the total number of Gaussian points and slowing down the training and rendering speed without improving the overall rendering effect. According to the actual test data, the model without high-frequency region enhancement can achieve a maximum PSNR of 32.5 in the final rendering result, while the PSNR of the enhanced rendering result can reach approximately 34, as shown in the results section.
3.2.3 Training process
In this section, the details of the two-stage training method are described. The procedure consists of two distinct stages. The first stage initializes the Gaussian point cloud using only the luminance channel extracted in Section 3.2.1. During the second stage, the feature dimension of the pre-trained Gaussian points is progressively expanded by adding one dimension at a time. Each newly added dimension is trained using the corresponding channel of the luminance-normalized feature channels (introduced in Section 3.2.1), while the 3D spatial properties of the Gaussian points remain fixed. For each channel, a fine-tuning step is applied by slightly increasing the density of Gaussian points that are uniquely visible in that channel. The 3D spatial parameters of these fine-tuned points are exclusively optimized using pixel data from the corresponding luminance-normalized feature channel. Details of training and a flow chart of training are illustrated below.
The flow chart of training conducted in this paper is shown in Fig. 5.
The process of training for the 3DGS method based on spectrum compression. The training starts at the left top and move clockwise until it reaches the left bottom, indicating the training is complete
Training process starts from the top left corner in Fig. 5. The first stage uses luminance information to train the three-dimensional features such as position, orientation and scale of the Gaussian point cloud. Luminance channel images from different viewpoints are input as the training set. An initialization process of approximately 10,000-20,000 iterations is performed. After initialization, the Gaussian point cloud has a relatively coarse three-dimensional representation. Further, through the high-frequency information enhancement method, the model will further refine its high-frequency representation capability, obtaining richer detail information. After these two training steps, the first stage of training is completed. The main purpose of the first stage is to obtain a reasonable three-dimensional spatial representation of the Gaussian point cloud, including the position distribution, orientation, and scale of the point cloud. In hyperspectral images, since all feature channel information originates from the same object, they should have the same three-dimensional representation. Therefore, allowing all spectral information channels to share the three-dimensional representation trained in the first stage can reduce the training difficulty of the channel while ensuring training effectiveness, thereby reducing the number of training iterations required and improving the overall training speed.
The right side of Fig. 5 shows the second stage of training. The main purpose of the second stage is to train the spectral data of each feature channel based on the existing Gaussian point cloud with known 3D information. After initialization, the feature channel dimension of Gaussian points should be one, which is the luminance channel. The dimension of the feature channels of pre-trained Gaussian points increases by one each time dim-upgrade is conducted, as is shown by arrows in Fig. 5. The feature channels new added to Gaussian points are untrained and they are filled with random values for initialization. A few new Gaussian points are gradually added by conducting a gradient threshold operation like the original 3DGS to refine the representation of each feature channel. When a new feature channel is added, 3D information of well-trained point cloud will be fixed, i.e., position, orientation, and scale. The features associated with spectrum of the new feature channel are then trained according to the image of the corresponding channel. When a new feature channel is added, the previously trained feature channels are also fixed, and only the new feature channel is being trained, thus speeding up the process. Since the first stage has already gained a relatively precise three-dimensional point cloud representation, each channel in the second stage requires a relatively small number of additional Gaussian points and fewer training iterations (about 5000 iterations per channel) to achieve good results. When a channel is considered to have converged or reached the preset limit of iteration, the next channel is added until whole training process is completed. When the PSNR gain obtained by adding a new channel falls below a certain threshold (e.g., 0.25), we can regard the training process as converged. Further increasing the number of channels will not yield significant improvements in rendering quality, and the training can be terminated at this point.
During the second stage of training, new feature channels do not use the point clouds added during previous feature channels’ training, but only use the Gaussian point clouds trained in the first stage with the luminance channel, as shown in Fig. 5. This training strategy helps reduce the number of points each subsequent feature channel needs to train, preventing an accumulation in the number of points to be trained due to previously added points, which would gradually slow down the training speed. In PCA, basis of different feature dimensions are orthogonal, the spectral information represented by different compressed feature channels are completely different. Therefore, the position and feature information of the points needed for each feature channel are also different. Experiment has shown that using the Gaussian point clouds added by other feature channels does not improve the training results. Therefore, it is reasonable to adopt the current method in this paper to further improve the training speed.
4 Experiments & results
4.1 Experimental design
This paper employs hyperspectral data feature dimension compression to significantly reduce the training time and memory requirements for applying hyperspectral data in 3DGS. Although dimensionality reduction can effectively lower storage volume and improve training speed, it also leads to information loss. This paper addresses this issue through high-frequency information enhancement. The experimental results section will first demonstrate the method’s effectiveness, showing that training with compressed data can yield results similar to those obtained with full-spectrum data. Then, the paper will elaborate on the improvements in training speed, the reduction in required memory capacity, and the robustness to noise brought by the proposed method.
The loss function used for both 3DGS and 3DGS with compression is:
In Formula 12, the L1 norm (\(L_1\)) represents the absolute value of the difference between corresponding pixels, and the Structural Similarity Index Measure (SSIM) is a commonly used metric for evaluating structural similarity between two images.
In the experiments, the dataset comprises a total of 48 images captured from different viewpoints around the target object with angular intervals of 6 degree to 12 degree. The training set consists of 42 hyperspectral images from varying viewpoints, while the test set includes 6 evenly sampled images from distinct viewpoints. The hyperspectral images used in this study contain 34 spectral channels, with wavelengths spanning 420 nm to 760 nm and a spectral resolution of 10 nm. The spatial resolution of the hyperspectral images is 640 \(\times\) 480 pixels.
The proposed method utilizes 50,000 iterations for the first training stage and 5,000 iterations per additional feature in the second stage. In comparison, the original 3DGS method employs a maximum of 30,000 iterations with early stopping. All experiments were conducted using an NVIDIA RTX 2080 Ti GPU (11 GB VRAM) and an NVIDIA RTX 2080 Super GPU (8 GB VRAM). We ensured that all compared results were obtained on the same computing device. Unless otherwise specified, the subsequent experiments were conducted on an NVIDIA 2080 Super GPU.
PSNR metric is applied in this paper, the calculation of PSNR is by the following formula:
In the formula above, \(MAX_I\) is the maximum value of the hyperspectral image, which is 1 in our case. MSE is mean square error as shown in formula 14. \(R_i\) is i-th pixel in rendered image and \(I_i\) is the i-th pixel in the original image. A higher similarity between the rendered and original images corresponds to a higher PSNR value.
4.2 Results
4.2.1 Rendering quality evaluation
The method proposed in this paper first leverages the richness of luminance information in representing 3D structures to train a relatively precise Gaussian point cloud distribution. It further utilizes the sparsity of hyperspectral information in feature space by applying PCA to the luminance-normalized hyperspectral data for dimensionality reduction to achieve channel compression. Training with low-dimensional images from different perspectives, the 3DGS based on dimensionality reduction can render results similar to those obtained using full data training and Hyperspectral NeRF methods. Figure 6 below shows hyperspectral images generated by the proposed method.
Illustrations of hyperspectral images trained by the proposed method. The image on the left top is the RGB image synthesized from one of the ground truth. The images on the right are rendered by the proposed method with 34 channels illustrated. The graph on the left bottom is the comparison of spectrum curves of a typical pixel between the rendered image and the ground truth
The top left image in Fig. 6 is an RGB image synthesized from ground truth hyperspectral images from the training set. The bottom left graph compares the hyperspectral pixel spectral curve generated by our method with the ground truth image’s spectral curve. This pixel is a representative spectral curve selected from the hyperspectral image. Despite information loss due to dimensionality reduction, the spectral curve generated by our method shows little error compared to the original spectral curve. The images on the right show the hyperspectral images in the test set generated by our method. These hyperspectral images include 34 spectral channels ranging from 420nm to 750nm at 10nm spectrum intervals. Each image in Fig. 6 displays the grayscale of the corresponding channel, where each pixel represents the spectral intensity value at that channel. The images generated by our method have relatively high spectrum accuracy, regions of different colors in the RGB image exhibit high brightness in their corresponding spectral channels, while the brightness is lower in other channels.
3DGS is one of the best methods for 3D rendering and reconstruction introduced in the past years. Compared to previous hyperspectral neural rendering methods like NeRF, well-trained 3DGS usually renders images with richer details. However, due to its point cloud-based 3D representation, it has stricter 3D constraints and is less capable of handling 3D spatial inconsistencies. When applied to hyperspectral datasets, 3DGS retains its advantages in high-definition detail information compared to previous methods. PSNR (Peak Signal-to-Noise Ratio) is a commonly used metric to measure the consistency between the generated image and the original image. To quantitatively describe the quality of the generated images, Fig. 7 below includes the average PSNR of the test set for Hyperspectral NeRF, Original 3DGS, and the proposed method.
Comparison of PSNR (Peak Signal Noise Ratio) values between the proposed method, the original 3DGS and the hyperspectral NeRF
As shown in Fig. 7, the PSNR of images generated by our method does not significantly differ from those generated by 3DGS using full data, while the PSNR of images generated by NeRF is slightly lower than the other two methods. However, PSNR alone cannot fully convey image quality. Figure 8 below provides 600nm spectral images of the test set generated by the three different methods and their corresponding ground truth, along with magnified local details. This spectral band is chosen because its higher brightness makes details more visible.
Illustration of rendered images (in the test set) obtained with the proposed method, the original 3DGS and the hyperspectral NeRF
From Fig. 8, it is evident that the images with the best detail are those generated by 3DGS trained with full spectral information, particularly in regions with text and other local details, where this method achieves higher contrast. However, because hyperspectral data involves numerous channels, inconsistencies and noise between channels lead to observable errors in some regions’ 3D reconstruction accuracy, resulting in floating artifacts in certain views. The selected views represent typical scenarios to illustrate this.
NeRF-generated images lack details. Due to the smoothness of neural networks, NeRF tends to filter out many high-frequency details. Although it can effectively handle minor noise due to its flexible neural network representation of 3D information, it still exhibits some floating artifacts. NeRF’s slow training and rendering speeds significantly hinder its practical application.
Our method, though inferior to 3DGS using full data in terms of detail, manages to retain high PSNR values.Information loss from dimensionality reduction leads to less contrast in brightness of luminance channel compared to the full spectrum which results in blurred details. Additionally, our method employs a high-frequency information enhancement algorithm to refine the 3D information of the point cloud, ensuring high spatial resolution in boundary and high-frequency areas by generating small-sized Gaussian point clouds. Even if noise and inconsistencies affect certain positions, they only impact the Gaussian points along corresponding sightlines across different perspectives. Since these points are usually small, they do not significantly affect the overall 3D structure of the target but may cause grainy distortions in details. The advantage of this approach is that it avoids large rendering errors due to incorrect 3D structures, unlike the original 3DGS. Thus, despite our method’s details being inferior to 3DGS using the full spectrum, its average PSNR remains high.
4.2.2 Training and rendering acceleration
One of the goals of this method is to reduce the training and rendering time of hyperspectral 3DGS. By applying dimensionality reduction to hyperspectral data and employing a two-stage training method that separates 3D information and spectral information training, this method only requires training one feature dimension at a time. In contrast, 3DGS using the full spectral information needs to train 34 feature dimensions simultaneously, corresponding to 34 spectral channels. This significantly increases the training speed per iteration, thereby greatly reducing the overall training time required. The differences in training and rendering times among the three methods are shown in Fig. 9 below.
Comparison between the proposed method, original 3DGS and hyperspectral NeRF on training speed and rendering speed
One of the primary issues with the NeRF method is its slow speed, which becomes even worse when hyperspectral data is introduced as reflected in Fig. 9. On a 2080ti, a NeRF with a network width of 256 requires 12 hours to complete training. In contrast, the 3DGS using full spectral information has a much shorter training time compared to NeRF, taking approximately 4.3 hours. The proposed method, trained with compressed spectral information requires only 35-40 minutes to complete, which is a sevenfold increase in speed compared to 3DGS using the full spectrum.
In terms of rendering speed, 3DGS has an even greater advantage. On a 2080ti, a NeRF with a network width of 256 takes about 13 seconds to render a single image. For both 3DGS methods, rendering speed is much faster. To accurately measure its rendering speed, this paper uses the time taken to render 100 frames. As can be seen, due to the fewer channels required for rendering, the proposed method is faster than 3DGS using the full spectrum.
While 3DGS using the full spectrum requires only one step for rendering and can be fully parallelized.For instance, the channel-wise parallel computation can be applied, which is not applicable in our case (Only pixel-wise parallel computing is applied). The proposed method first renders with compressed features and then performs matrix multiplication on each pixel to recover the uncompressed spectrum. Therefore, when there are enough units of the parallel computation device, the original 3DGS can be further accelerated and even surpass the proposed method, since data transfer between RAM and the GPU introduces overhead in the recovering step of proposed method.
This does not imply that the proposed method is inferior to the original 3DGS. There are two key advantages of the proposed method:
Post-training flexibility
If parallel acceleration is feasible, our method can restore full-spectrum Gaussian points from compressed data post-training and render full-spectrum images directly. This achieves rendering speeds comparable to the original 3DGS when computational resources are sufficient.
Scalability for large-scale data
Hyperspectral datasets often exceed the 34 spectral channels and 640 \(\times\) 480 pixel resolution used in this study. As spectral channels (C) and image resolution (m, n) scale, the required Gaussian points (G) grow significantly. Accelerating the original 3DGS rendering under these conditions becomes computationally prohibitive due to its time complexity \(O(m\cdot n \cdot C \cdot G)\), where C is the number of spectral channels. In contrast, our method reduces complexity to \(O(m\cdot n \cdot C_{comp} \cdot G)+O(m\cdot n \cdot C_{comp} \cdot C)\), where \(C \ll G\) and \(C_{comp} < C\) (\(C_{comp}\) is the number of compressed channels). Our approach achieves faster rendering unless both pixel-wise and channel-wise parallel accelerations are applied.
To sum up, when parallel acceleration is limited, the proposed method outperforms full-spectrum 3DGS in rendering and reconstruction speed. When parallel acceleration is available, our method can recover full-spectrum Gaussian points post-training and match the original 3DGS in rendering speed.
4.2.3 Training efficiency analysis
To better demonstrate the effectiveness of the proposed training method, this section compares the convergence speed of the proposed method and the full-spectrum 3DGS during the training process.
PSNR and SSIM curves during the training with the proposed method and the original 3DGS. The x axis is time (seconds)
In Fig. 10, the left side shows the PSNR values of the two methods during the first 2500 seconds of training, and the right side shows the SSIM values during the same period. The proposed method has a faster convergence speed for both PSNR and SSIM (Structure Similarity Index Measure). The curves for the proposed method appear to start higher because the first recorded point is after a short interval of times of training. Initially, the proposed method trains only on the brightness channel, leading to faster training speed compared with the full-spectrum method, hence the relatively higher starting point. The proposed method continues single-channel training for the brightness channel for the first 1000 seconds. Each subsequent dip in the PSNR curve corresponds to channel expansion by the algorithm. Around 2500 seconds, the proposed method completes its training, reaching the maximum PSNR of approximately 34 and SSIM of 93.2 on the test set. In contrast, the full-spectrum 3DGS method takes about 16,000 seconds to complete training, achieving a maximum PSNR of 33.6 and a maximum SSIM of 93.0 on the test set.
4.2.4 Reduction on Memory Usage for 3DGS based on spectrum compression
Another goal of the proposed method is to reduce the GPU memory required for training and the storage space needed for the point cloud. Figure 11 clearly shows how the proposed method saves storage space.
The comparison of the two methods on size of memory usage
For 3DGS, the most space-consuming part of the Gaussian point cloud is the feature section. With the proposed method, as the spectral information is compressed from 34 dimensions to a maximum of no more than 10 dimensions, the required storage space is reduced, theoretically to one-third of the original method.
Similarly, for the same reason, the proposed method can also reduce memory usage during training.
The GPU memory usage during the training with the two methods
Figure 12 records the memory usage of the two methods. The "Total Memory Usage" row represents the total GPU memory usage range of the two algorithms in their final stages. "Common Memory Usage" indicates the fixed memory usage when loading hyperspectral images and other data into GPU memory before the training starts, which represents a fixed memory usage unrelated to the training process and remains constant for the same dataset. "Gaussian Memory Usage" represents the memory usage during the training process. The memory usage due to Gaussian point cloud data is reduced from about 3GB to around 1.5GB.
Since memory usage is influenced by various 3DGS parameters, the recorded data here corresponds to each method at the point where they achieve their best PSNR values.
4.2.5 Robustness to Noise of 3DGS based on spectrum compression
Hyperspectral data often faces noise interference due to the large number of channels. Therefore, robustness to noise is also necessary for 3D reconstruction methods. In the results section, the proposed method demonstrated a strong ability to reconstruct accurate 3D structures in the presence of noise. This section studies the robustness of the proposed method to noise. Hyperspectral images have many channels, and most current hyperspectral image acquisition methods capture images sequentially channel by channel. This results in each channel’s spectral image being susceptible to independent random noise interference. To simulate this process, the research method in this section involves adding independent and identically distributed Gaussian random noise to each channel of the original hyperspectral images. The signal-to-noise ratio of the images is adjusted by varying the standard deviation of the Gaussian noise. The pixel value range of the images is 0-1, and the noise standard deviations are set to 0.05, 0.1, and 0.2.
Comparison of the noise robustness between the two 3DGS methods. The images are rendered under 600nm spectrum
After adding different levels of noise, the test set images rendered by the two methods are shown in Fig. 13. The differences between the two are clearly visible. For the proposed method, adding noise appears in the test set images as a decrease in contrast, but apart from the detail loss caused by the contrast reduction, there are no other significant differences. For the original full-spectrum 3DGS, however, unexpected stripes appear on the object’s surface, extending from surface and the edges. This indicates that during the training, the shape of some Gaussian points on the surface was overly stretched and their positions also became erroneous. Such noise greatly affects the visual quality, which can be shown by SSIM metrics. The SSIM of images rendered with different noise levels are shown in Fig. 14.
Comparison of SSIM between the two methods when noises with various standard deviants are added
From Fig. 14, when the noise is minimal, such as with a standard deviation of 0.01, the SSIM of images rendered by both methods is relatively high, with almost no difference from the noise-free condition. As the noise increases, the SSIM of images generated by the proposed method remains higher than that of images generated by the full-spectrum 3DGS. Although the SSIM of the proposed method is slightly lower than that of the full-spectrum 3DGS when the noise standard deviation reaches 0.2, visual observation suggests that this is because the images generated by the proposed method appear darker due to noise interference in the brightness channel, leading to a significant drop in SSIM. Although the image generated by the original method maintains normal brightness, it introduces a lot of linear noise in the details and the visual disturbance caused by this noise is greater (cf. the right sub-figures of Fig. 13).
The above comparison shows that the proposed method is more robust to noise than the original method. Two factors contribute to this robustness. First, the proposed method employs PCA for dimensionality reduction. When independent and identically distributed Gaussian noise is added to the full-spectrum image, it is uniformly distributed across all directions. Projecting this noise onto the principal components distributes it evenly across all components. However, as the subsequent training only uses the leading principal components, the influence of noise in the remaining components is filtered out. Another factor is the use of a high-frequency information enhancement method during training, resulting in the model having more high frequency Gaussian points. When affected by noise, these points may float into the air, rendering as small noise dots. The original method, on the other hand, has many larger Gaussian points, and noises can significantly alter their shape and position, leading to more substantial impacts.
However, the proposed method requires extracting the luminance channel, which is not affected by the noise suppression of principal component analysis. Therefore, excessive noise in the luminance channel can cause overall brightness rendering differences, leading to a decrease in image contrast. Adding noise with a standard deviation equal to 20% of the image pixel value range is an extreme case that significantly reduces the signal-to-noise ratio, making the image almost unusable. Under normal circumstances, the proposed method is more robust to noise compared to the original 3DGS.
4.2.6 Ablation study
In order to better support our claims, an ablation study is carried out. In the ablation study, spectrum-compressed 3DGS methods:(1) PCA applied directly to spectrum data, (2) PCA based on luminance-normalized spectrum data, (3) PCA with luminance normalization and high-frequency enhancement, are compared with hyperspectral 3DGS without compression. Details of the ablation study are shown in Fig. 15.
Ablation study. Comparison of 3DGS variants: full spectrum, PCA-only, PCA with luminance normalization, and PCA with luminance normalization and high-frequency enhancement
Compressed 3DGS using PCA directly on spectrum data results in a PSNR below 30, as shown in the second row ("Only PCA"). The third row illustrates compressed 3DGS using PCA based on luminance normalization. A significant PSNR improvement is observed after applying luminance normalization, demonstrating its critical role. High-frequency enhancement further improves PSNR, as shown in the fourth row. Finally, the fifth row shows that 3DGS compressed via PCA with luminance normalization and high-frequency enhancement achieves comparable PSNR to uncompressed 3DGS.
The ablation study confirms the effectiveness of the luminance normalization method and high-frequency enhancement method proposed in this work. The compressed 3DGS with these methods achieves similar PSNR to full-spectrum 3DGS while significantly accelerating training speed.
4.3 Discussion
This paper proposes a method to accelerate 3DGS training based on hyperspectral image spectrum compression, by PCA dimensionality reduction. The proposed method also reduces the GPU memory requirement for training. As demonstrated in the results section, our method significantly shortens training and rendering times and reduces memory usage, achieving results comparable to the original 3DGS, thus meeting the goals of this method.
The luminance channel, which indicates how bright the images are, is used as the information source for 3D reconstruction of Gaussian point cloud. Under natural lighting conditions, most 3D structures are reflected in the brightness and shadow of the image, reserved in the luminance channel. Therefore, using the luminance channel for 3D structure training enables the point cloud to represent all the main 3D structures of the target object.
The PCA dimensionality reduction method used in this paper compresses the spectrum by projecting it onto a few principal components, which is a lossy process. Ideally, due to the sparsity of hyperspectral information, a lossless compression method would be a better choice. There are several reasons for choosing this dimensionality reduction method. First, 3DGS itself cannot perfectly restore the 3D scene, and its rendered image peak signal-to-noise ratio (PSNR) is generally below 34 (33.6 as the highest record for our paper). By performing PCA analysis on the spectrum after normalizing the brightness, the PSNR can reach above 38 by directly restoring full-spectrum images using only the luminance channel and the first few principal feature channels. The results section analysis shows that, after lossy channel compression, the rendered image quality is similar to that of the original 3DGS using the full spectrum. Second, 3DGS requires high consistency in the same area of different images; otherwise, the point cloud cannot accurately restore 3D features. During training, dozens of different viewing angles (42 training angles and 6 testing angles in this paper) are needed. Pixels? spectral distribution varies for each image. If each image undergoes lossless dimensionality reduction separately, each image may be lossless, but their selected basis vectors are independent and may not be transformable through known linear transformations, failing to ensure consistency across images. Furthermore, if all images’ pixels are used as constraints for dimensionality reduction, most hyperspectral compression methods are based on iterative optimization or matrix eigenvalue operations, with time complexity rapidly increasing with the number of points, causing compression time to exceed subsequent 3DGS training time. By using the PCA method and sampling uniformly across all images to calculate the principal component vectors, we can control the time complexity and ensure consistency since the basis is the same among all images. Lastly, PCA-compressed images can be conveniently restored to the original full-spectrum images.
Overall, our proposed method has made progress in significantly improving training speed, rendering speed, and reducing storage space usage, while the generated images are comparable to those obtained by the original 3DGS using the full spectrum.
5 Conclusion
In this paper we have presented a faster, smaller, and more noise-robust hyperspectral 3DGS method. Spectrum compression of hyperspectral images has been employed based on luminance-normalized sampling principal component analysis, utilizing a two-stage training approach with the luminance channel for 3D representation training and feature channels for spectral feature representation training. Combined with a method for enhancing high-frequency regions, this approach significantly reduces the training time for 3DGS with hyperspectral data, improves rendering speed, and decreases required storage space, while achieving rendering results similar to the original 3DGS method, with a better noise robustness. This method can better accomplish tasks such as 3D reconstruction and new viewpoint synthesis of hyperspectral data under limited computational resources.
Future work can further explore integrating deep learning?based spectral compression techniques into the proposed hyperspectral 3DGS framework to achieve higher compression ratios and faster processing of large-scale hyperspectral datasets. In addition, adaptive or task-oriented compression strategies could be investigated to balance reconstruction accuracy and computational efficiency dynamically. Another promising direction is extending the proposed method to real-time or outdoor hyperspectral 3D reconstruction scenarios, where lighting variations and noise conditions are more complex. These improvements could enhance the practicality and scalability of hyperspectral 3DGS in real-world applications such as remote sensing, precision agriculture, and camouflaged object detection.
Materials Availability
There are no materials associated with this paper.
Data Availability
Raw data from the hyperspectral image dataset are not publicly available as the data also forms part of an ongoing study. Data requests can be made to corresponding author via email: "sailing@kth.se".
Code Availability
Code requests can be made to corresponding author via email: "sailing@kth.se".
References
Zhu H, Luo J, Liao J, He S (2023) High-Accuracy Rapid Identification and Classification of Mixed Bacteria Using Hyperspectral Transmission Microscopic Imaging and Machine Learning. Progress In Electromagnetics Research 178:49–62
Gong D, Ma T, Evans J et al (2021) Deep neural networks for image super-resolution in optical microscopy by using modified hybrid task cascade u-net[J]. Progress In Electromagnetics Research 171:185–199
Wang H, Gong D, Cheng G et al (2023) Detecting Temperature Anomaly at the Key Parts of Power Transmission and Transformation Equipment Using Infrared Imaging Based on Segformer[J]. Progress In Electromagnetics Research M 119:117–128
Lu B, Dao PD, Liu J et al (2020) Recent advances of hyperspectral imaging technology and applications in agriculture[J]. Remote Sensing 12(16):2659
Shippert P (2004) Why use hyperspectral imagery?[J]. Photogramm Eng Remote Sens 70(4):377–396
Mehdorn M, Kohler H, Rabe SM et al (2020) Hyperspectral imaging (HSI) in acute mesenteric ischemia to detect intestinal perfusion deficits[J]. J Surg Res 254:7–15
Behmann J, Steinrucken J, Plumer L (2014) Detection of early plant stress responses in hyperspectral images[J]. ISPRS J Photogramm Remote Sens 93:98–111
Mildenhall B, Srinivasan PP, Tancik M et al (2021) Nerf: Representing scenes as neural radiance fields for view synthesis[J]. Commun ACM 65(1):99–106
Ma R, Ma T, Guo D et al (2024) Novel view synthesis and dataset augmentation for hyperspectral data using NeRF[J]. IEEE Access 12:45331–45341
Ma R, He S (2024) Hyperspectral Neural Radiance Field Method Based on Reference Spectrum. IEEE Access 12:133018–133029. https://doi.org/10.1109/ACCESS.2024.3459917
Ma R, He S (2025) Multi-channel volume density neural radiance field for hyperspectral imaging[J]. Sci Rep 15(1):16253
Chen G, Narayanan SK, Ottou TG et al (2024) Hyperspectral neural radiance fields[J]. arXiv preprint arXiv:2403.14839
Feng Y, Ding X, Dai W et al (2024) Hyperspectral 3D reconstruction using neural radiance fields[C]//AOPC 2024: Optical Spectroscopy and Applications. SPIE 13494:54–59
Kerbl B, Kopanas G, Leimkuhler T et al (2023) 3D Gaussian Splatting for Real-Time Radiance Field Rendering[J]. ACM Trans Graph 42(4):139:1–139:14
Lu G, Fei B (2014) Medical hyperspectral imaging: a review[J]. J Biomed Opt 19(1):010901–010901
Yuan X, Brady DJ, Katsaggelos AK (2021) Snapshot Compressive Imaging: Theory, Algorithms, and Applications. IEEE Signal Process Mag 38(2):65–88
Si Y, He S (2025) CTISNeRF: Efficient Four-Dimensional Hyperspectral Scene Rendering and Generation with Computed Tomography Imaging Spectrometer[J]. IEEE Sens J
Si Y, Lin Z, Wang X et al (2025) A new hyperspectral reconstruction method with conditional diffusion model for snapshot spectral compressive imaging[J]. IEEE Trans Instrum Meas
Luo J, Lin Z, Xing Y et al (2022) Portable 4D Snapshot Hyperspectral Imager for Fastspectral and Surface Morphology Measurements[J]. Prog In Electromag Res 173:25–36
Furukawa Y, Hernandez C (2015) Multi-view stereo: A tutorial[J]. Found Trends Comput Graph Vis 9(1–2):1–148
Yi KM, Trulls E, Lepetit V, Lift: Learned invariant feature transform[C], , Computer Vision?ECCV, et al (2016) 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VI 14. Springer International Publishing 2016:467–483
Lowe DG (2004) Distinctive image features from scale-invariant keypoints[J]. Int J Comput Vision 60:91–110
DeTone D, Malisiewicz T, Rabinovich A (2018) Superpoint: Self-supervised interest point detection and description[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp 224–236
Ma T, Xing Y, Gong D et al (2022) A Deep Learning-Based Hyperspectral Keypoint Representation Method and Its Application for 3D Reconstruction[J]. IEEE Access 10:85266–85277
Matsuki H, Murai R, Kelly PHJ et al (2024) Gaussian splatting slam[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 18039–18048
Zhu S, Qin R, Wang G et al (2024) Semgauss-slam: Dense semantic gaussian splatting slam[J]. arXiv preprint arXiv:2403.07494
Li M, Liu S, Zhou H (2024) Sgs-slam: Semantic gaussian splatting for neural dense slam[J]. arXiv preprint arXiv:2402.03246
Yan C, Qu D, Xu D et al (2024) Gs-slam: Dense visual slam with 3d gaussian splatting[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 19595-19604
Tosi F, Zhang Y, Gong Z et al (2024) How nerfs and 3d gaussian splatting are reshaping slam: a survey[J], 4. arXiv preprint arXiv:2402.13255
Chen Z, Wang F, Wang Y et al (2024) Text-to-3d using gaussian splatting[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 21401-21412
Yi T, Fang J, Wang J et al (2024) Gaussiandreamer: Fast generation from text to 3d gaussians by bridging 2d and 3d diffusion models[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp 6796-6807
Li X, Wang H, Tseng KK (2023) Gaussiandiffusion: 3d gaussian splatting for denoising diffusion probabilistic models with structured noise[J]. arXiv preprint arXiv:2311.11221
Mu Y, Zuo X, Guo C et al (2024) GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction[J]. arXiv preprint arXiv:2407.04237
Yu Z, Wang H, Yang J et al (2024) SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior[J]. arXiv preprint arXiv:2403.20079
Huang B, Yu Z, Chen A et al (2024) 2d gaussian splatting for geometrically accurate radiance fields[C]//ACM SIGGRAPH. Conference Papers 2024:1–11
Wu G, Yi T, Fang J et al (2024) 4d gaussian splatting for real-time dynamic scene rendering[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 20310–20320
Chen W, Zhong R, Wang K et al (2025) Li-GS: a fast 3D Gaussian reconstruction method assisted by LiDAR point clouds[J]. Big Earth Data, pp 1–25
Lou W, Li C, Feng H et al (2024) Application of 3D Gaussian Splatting target Reconstruction in short-range detection[C]//Journal of Physics: Conference Series. IOP Publishing 2891(15):152010
Kerbl B, Meuleman A, Kopanas G et al (2024) A hierarchical 3d gaussian representation for real-time rendering of very large datasets[J]. ACM Trans Graph (TOG) 43(4):1–15
Acknowledgements
The authors gratefully thank Dr. Julian Evans of Zhejiang University for helpful discussion and the Special Development Fund of Shanghai Zhangjiang Science City.
Funding
Open access funding provided by Royal Institute of Technology. This work is partially supported by the "Pioneer" and "Leading Goose" R&D Program of Zhejiang (Nos. 2023C03002, 2023C03083 and 2023C03135), the National Key Research and Development Program of China (Nos.2022YFC2010000 and 2022YFC3601000) and the National Natural Science Foundation of China (No.W2412107).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethics approval and consent to participate are not applicable to this paper.
Conflict of interest
There is no conflict of interests on this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ma, R., Chen, T. & He, S. Fast, small and robust hyperspectral 3DGS based on spectral compression. Appl Intell 56, 14 (2026). https://doi.org/10.1007/s10489-025-06983-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s10489-025-06983-4
















