Zum Inhalt springen

Remote Sensing, Vol. 18, Pages 1886: Bathymetric Inversion of Tibetan Plateau Lakes Using Hyperspectral Imagery and ICESat-2 Data

Prometheus Redaktion

Highlights What are the main findings? A hyperspectral–ICESat-2 joint framework was proposed for Tibetan Plateau lake bathymetric inversion. The method achieved more accurate and spatially consistent bathymetric reconstruction than representative baseline methods. What are the implications of the main findings? Multi-source constraints can improve bathymetric inversion in plateau lakes with limited in situ measurements. The framework is useful for lake basin mapping and water storage studies in high-altitude inland environments. Abstract Lake depth is a fundamental parameter for estimating lake storage, analyzing basin morphology, and understanding the evolution of plateau lakes. Compared with typical shallow lakes, Tibetan Plateau lakes are characterized by high elevation, strong radiation, pronounced inter-lake and inter-annual variability, and in some cases considerable basin depth, which limits the accuracy, stability, and generalization ability of existing bathymetric inversion methods based on single-source optical imagery. Meanwhile, although ICESat-2 can provide sparse but high-precision along-track bathymetric constraints, a unified framework suitable for plateau-lake scenarios is still lacking. To address this issue, this study proposes TabKAN, a bathymetric inversion framework for Tibetan Plateau lakes under joint constraints from hyperspectral imagery and ICESat-2 data. TabKAN constructs tabular input features from hyperspectral reflectance, water indices, imaging geometry, and environmental variables; employs TabNet for feature selection and encoding; and introduces a KAN regression head to enhance nonlinear bathymetric mapping. A joint-supervision and bias-correction mechanism is further designed to incorporate ICESat-2 samples, thereby improving model robustness across lakes and acquisition dates. To enhance the temporal coverage of training samples, multi-year sample expansion based on stereo-mapping data is introduced, and a stripe-aware self-supervised learning strategy is developed for hyperspectral image restoration and pretraining. Experiments on five Tibetan Plateau lakes, including Anglaren Co, Caiduo Chaka, Cuoe, Geren Co, and Qixiang Co, show that the proposed method outperforms benchmark methods in both overall accuracy and depth-stratified evaluation, while providing more stable recovery of basin morphology and depth gradients. These results demonstrate that combining hyperspectral information, ICESat-2 laser constraints, and stripe-aware pretraining can effectively improve the accuracy and robustness of bathymetric inversion for Tibetan Plateau lakes and provide a new technical route for storage estimation and change monitoring of cold inland lakes. 1. Introduction It is worth noting that the value of hyperspectral imagery does not simply lie in “having more bands”; the real challenge is how to extract the most bathymetrically informative signals from high-dimensional, redundant observations that are susceptible to noise. Previous studies have pointed out that deep-learning-based hyperspectral bathymetric inversion is still at a relatively early stage. On the one hand, high-dimensional spectral information is not yet fully exploited; on the other hand, environmental variation, imaging-geometry differences, and dispersed key-band importance can substantially weaken model stability. This issue is even more pronounced for Tibetan Plateau lakes, because the same depth may correspond to different spectral responses across lakes, years, and observation conditions. Therefore, one major difficulty is whether high-dimensional spectral information, environmental variables, and cross-scene variability can be handled simultaneously in a unified framework. For point-sample-driven bathymetric inversion, the task is essentially closer to a continuous regression problem driven by multi-source heterogeneous tabular features rather than a simple image-spectrum fitting problem. TabNet and KAN provide useful ideas for model design in this context [ 12, 13]. Complementary to passive optical sensing, ICESat-2 satellite laser altimetry provides sparse but accurate along-track observations. ICESat-2 carries a 532 nm green photon-counting laser altimeter, which has shown the potential to retrieve underwater information in clear lakes and shallow-water environments [ 14]. Previous studies have shown that ICESat-2 can be used not only for bathymetric reconstruction of Tibetan Plateau lakes [ 1], but also in combination with Sentinel-2 and other optical imagery to generate continuous bathymetric maps [ 11, 15]. At the same time, active–passive fusion studies have pointed out that, although ICESat-2 provides valuable high-accuracy constraints, its sparse along-track geometry, environmentally dependent errors, and variable transferability across study areas limit its direct use as ground truth. Therefore, how to take advantage of the high accuracy of ICESat-2 while avoiding the direct injection of its systematic bias into the main optical inversion relationship remains a key gap in current optical–laser bathymetric inversion research. Motivated by the above observations, this study proposes a unified deep-learning framework, namely TabKAN, for hyperspectral lake bathymetric inversion over the Tibetan Plateau. Rather than simply using satellite observations as additional data sources, the proposed framework is designed to reduce the major error sources discussed above. Hyperspectral redundancy and stripe-related noise are addressed through adaptive feature selection and stripe-aware self-supervised pretraining; cross-scene spectral variability is alleviated by incorporating water indices, imaging geometry, and environmental variables; complex nonlinear relationships between reflectance and depth are represented by the KAN regression head; and potential ICESat-2 label bias is handled by an explicit environmental bias branch. The framework formulates lake-depth inversion as a continuous regression problem driven by multi-source heterogeneous tabular features. At the input level, hyperspectral reflectance, water indices, imaging geometry, and environmental variables are fused in a unified representation. At the model level, TabNet is used to adaptively select informative variables, while KAN is introduced to enhance the expressive power for complex nonlinear relationships. At the supervision level, a bias subnet driven only by environmental features is introduced to explicitly characterize the environment-dependent bias of ICESat-2 labels relative to measured samples. In addition, multi-year sample expansion using stereo mapping data and stripe-aware self-supervised pretraining are incorporated to improve hyperspectral data quality and encoder stability. The main contributions of this study are summarized as follows: A multi-source tabular modeling strategy is proposed for bathymetric inversion of Tibetan Plateau lakes, in which hyperspectral reflectance, water indices, imaging geometry, and environmental variables are organized within a unified regression framework. A TabKAN backbone that integrates TabNet and KAN is developed for lake bathymetric inversion, enhancing nonlinear representation while retaining interpretable feature selection. A joint-supervision and bias-correction strategy based on measured samples and ICESat-2 samples is proposed, turning ICESat-2 from a set of additional training points into an explicitly constrained high-precision auxiliary reference. A combination of multi-year sample expansion, self-supervised pretraining, and stripe suppression is introduced to improve the stability and generalization ability of hyperspectral lake bathymetric inversion across multiple acquisition dates. 2. Related Work Beyond empirical and semi-empirical methods, semi-analytical and physical models attempt to improve physical consistency by explicitly describing radiative transfer in the water column. The HOPE framework proposed by Lee et al. is a representative contribution in this direction. Its core idea is to jointly estimate water depth, water optical properties, and bottom reflectance so that hyperspectral reflectance can be interpreted within a unified model [ 7]. Later studies showed that these methods can balance bathymetric inversion and water-property estimation when water transparency is high and the model parameters are reasonably constrained. However, they also suffer from large parameter spaces, complex optimization, and sensitivity to initialization. For lake environments with strong variability, especially where bottom type, water transparency, and imaging conditions co-vary, it is difficult to stably describe the complex spectrum–depth relationship using a fixed radiative-transfer structure. Thus, although physical models provide an important theoretical basis for bathymetric inversion, their practical usability in complex scenarios remains limited. With the development of hyperspectral remote sensing, researchers began to exploit contiguous narrow-band spectral information to improve bathymetric inversion. Compared with conventional multispectral imagery, hyperspectral data are theoretically better suited to characterizing water absorption, scattering, and bottom differences, and are therefore widely considered promising for improving bathymetric accuracy. Earlier work verified the feasibility of hyperspectral imagery for depth retrieval in optically shallow water [ 8]. Subsequent studies summarized the major models and application conditions of satellite hyperspectral bathymetry in shallow seas and further demonstrated the practical value of domestic hyperspectral sensors and attention-based deep learning for this task [ 9]. At the same time, recent studies have emphasized that hyperspectral data are not simply “multispectral data with more bands”; rather, they bring higher redundancy, greater sensitivity to noise, and more demanding requirements for model design. In other words, the key issue is not whether hyperspectral data are available, but whether the most bathymetrically effective information can be extracted from high-dimensional and environmentally sensitive observations. Although progress has been made, most existing studies still focus on clear shallow marine waters, and systematic investigations for inland plateau lakes remain limited. The introduction of machine learning and deep learning has brought a methodological shift to bathymetric inversion. Compared with empirical models and shallow statistical regressions, machine learning and deep learning are better at learning complex nonlinear mappings from multi-band observations, and therefore often achieve stronger fitting ability and better local accuracy in both multispectral and hyperspectral scenarios. Recent studies have begun to use random forests, support vector machines, convolutional neural networks, and hybrid physics-guided networks for bathymetric inversion. For example, a fast feature cascade learning model and a dual-physics-guided deep learning framework have both demonstrated that data-driven models can significantly improve bathymetric performance in complex shallow-water environments [ 20, 21]. However, most existing methods still focus primarily on the imagery itself, typically treating bathymetric inversion as a single image-regression problem and placing emphasis on feature extraction from reflectance and a few handcrafted indices, while rarely accounting systematically for imaging geometry, environmental metadata, time information, and label-source differences. This is particularly important for Tibetan Plateau lakes because their spectral response is controlled not only by depth, but also by lake-level variation, transparency, acquisition time, and local environmental conditions. As a result, models based solely on image spectra or local spatial context may perform well in local tasks but still lack transferability across lakes, years, and sensors. Complementary to passive optical sensing, ICESat-2 satellite laser altimetry has become an important source of auxiliary information. Since launch, ICESat-2 has been widely used in cryospheric, oceanic, and inland-water applications. Its 532 nm green laser can, under favorable water conditions, penetrate water and support underwater observations [ 14]. In the context of lakes, Han et al. demonstrated the potential of ICESat-2 for reconstructing bathymetry and estimating storage of clear Tibetan Plateau lakes [ 1]. This means that ICESat-2 can contribute not only water-surface elevation but also direct subsurface information under suitable conditions. However, repeated studies have shown that the usefulness of ICESat-2 depends strongly on water transparency, noise-photon removal, refraction correction, and track coverage. In deep lakes, turbid waters, or complex environments, the sparsity of along-track observations substantially limits continuous bathymetric reconstruction. Therefore, the real advantage of ICESat-2 is better understood as high-precision but sparse constraint information rather than a continuous truth surface that can directly replace in situ measurements. Under this background, active–passive fusion has become an important direction in bathymetric research. A number of studies have attempted to combine ICESat-2 with optical imagery to generate continuous bathymetric maps. ICESat-2 point-depth observations have been used together with Sentinel-2 imagery to provide new high-precision samples in the absence of large-scale field data [ 11], and cloud-native regional active–passive bathymetric frameworks have also been reported [ 15]. Later work incorporated time series, error reduction, and machine learning into such fusion frameworks to further improve the accuracy and stability of continuous bathymetric maps [ 22, 23]. These studies clearly show that active–passive fusion is an effective route for improving optical bathymetry. Nevertheless, recent reviews have also emphasized that ICESat-2 bathymetry itself involves multiple stages, including water-surface detection, underwater-photon identification, refraction correction, and precision control, and that its error sources are strongly environment dependent [ 24]. This implies that, if ICESat-2 samples are directly treated as labels equivalent to measured samples in a deep-learning framework, the model may inadvertently mix laser-label bias into the true optical spectrum–depth mapping, thereby degrading cross-scene generalization. Although the value of ICESat-2 has been demonstrated, explicit modeling of such heterogeneous label bias remains insufficient. Another often underestimated aspect of point-sample-based bathymetric inversion is that the task is fundamentally closer to multi-source heterogeneous tabular regression than to end-to-end image segmentation. Each training sample usually includes not only high-dimensional spectral reflectance, but also water indices, imaging geometry, temporal information, and environmental variables. Therefore, high-dimensional feature selection, heterogeneous variable fusion, and interpretable modeling within a unified framework are central to model performance. In the tabular deep learning literature, TabNet introduced explicit feature selection through sequential attention and provides a framework that balances predictive performance and interpretability for heterogeneous tabular data [ 12]. Subsequent work has shown that TabNet-like structures have considerable potential for hyperspectral tasks in terms of band selection and representation learning [ 25]. Meanwhile, KAN replaces conventional linear weights with learnable one-dimensional functions and has demonstrated strong nonlinear expressiveness and favorable interpretability in scientific regression problems [ 13]. Although applications of these two model families to bathymetric inversion are still rare, they directly address two critical gaps in current research: how to identify truly informative variables from high-dimensional heterogeneous inputs, and how to represent complex yet interpretable nonlinear relationships in a unified framework. For the sample structure considered here, namely “hyperspectral reflectance + water indices + imaging geometry + environmental variables + heterogeneous supervision”, tabular deep learning is particularly well suited. To make the methodological landscape clearer, existing bathymetric inversion methods can be summarized according to their core models and parameterization strategies. Empirical and semi-empirical methods, such as log-linear and band-ratio models, usually rely on a small number of fitted coefficients and are easy to implement, but their parameters are strongly scene dependent and their transferability is limited. Semi-analytical and physical methods explicitly parameterize water-column optical properties, bottom reflectance, and depth, providing stronger physical interpretability; however, they often require more prior information and are sensitive to initialization and water-optical variability. Traditional machine-learning methods, including instance-based and tree-ensemble models, learn nonlinear mappings from spectral or tabular features to depth and are effective for structured samples, but their extrapolation ability is still constrained by the distribution and representativeness of training samples. Deep-learning methods further enhance nonlinear representation and feature learning, but they usually require larger and more diverse training data and may be affected by heterogeneous label sources and sensor noise. Active–passive fusion methods introduce ICESat-2 or other laser-derived depth constraints to compensate for sparse field measurements, but they must handle track sparsity, water-clarity dependence, and possible laser-label bias. Compared with these approaches, the proposed TabKAN framework combines hyperspectral feature selection, heterogeneous tabular feature fusion, KAN-based nonlinear regression, ICESat-2 bias-aware auxiliary supervision, multi-year sample expansion, and stripe-aware self-supervised pretraining within a unified framework, aiming to improve both accuracy and robustness for Tibetan Plateau lake bathymetry. 3. Study Area and Data 3.1. Study Area This study focuses on five representative lakes on the Tibetan Plateau: Anglaren Co, Caiduo Chaka, Cuoe, Geren Co, and Qixiang Co. The geographic locations of these lakes are presented in Figure 1. The Tibetan Plateau, characterized by high mean elevation, complex climatic conditions, and dense lake distribution, hosts the largest concentration of cold inland lakes in the world. It is also a key region for investigating lake morphology, water-storage changes, and their climatic responses. The selected lakes are all located in high-elevation cold-arid environments and share several common characteristics, including broad water surfaces, complex observation conditions, and pronounced inter-annual water-level fluctuations. At the same time, they differ in basin morphology, shoreline sinuosity, nearshore slope, and the potential existence of deep troughs, providing a favorable test bed for assessing model adaptability and robustness across different lake settings. These environmental and geomorphological characteristics directly affect bathymetric inversion. Strong radiation and variable imaging geometry may change the observed water-leaving reflectance, while interannual water-level fluctuations alter the relationship between shoreline position and local water depth. Differences in basin slope, nearshore morphology, water clarity, and bottom conditions can also cause the same depth to show different spectral responses across lakes and acquisition dates. Therefore, bathymetric inversion for Tibetan Plateau lakes requires not only hyperspectral information, but also auxiliary variables and constraints describing acquisition conditions, environmental variability, and basin morphology. In this study, these factors are considered through water indices, imaging-geometry and environmental variables, multi-year lake-level-constrained sample augmentation, ICESat-2 joint supervision, and depth-stratified evaluation. 3.2. Datasets The data used in this study span the period 2020–2025 and mainly comprise four categories: hyperspectral satellite imagery, ICESat-2 laser altimetry data, ZY-3 and GF-7 stereo mapping data, and in situ bathymetric measurements. These data sources serve different purposes in the proposed framework. Hyperspectral imagery provides continuous areal spectral observations, ICESat-2 supplies sparse but high-precision along-track bathymetric constraints, ZY-3 and GF-7 are used to estimate inter-annual water-level differences and support sample expansion, and in situ depth measurements serve as the main reference for model training and evaluation. The hyperspectral data are mainly based on the Resource-1 02E AHSI sensor (developed by the Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, China), and the dataset was sourced from the China Center for Resources Satellite Data and Application (Beijing, China). This sensor covers the visible, near-infrared, and shortwave infrared spectral ranges, provides 166 bands, has a spatial resolution of 30 m, and a swath width of approximately 60 km. Compared with conventional multispectral imagery, AHSI offers richer spectral information and is therefore more suitable for capturing subtle variations related to water depth, water properties, and bottom conditions. ICESat-2 data were obtained from the ATL03 photon product. Compared with higher-level derived products, ATL03 preserves more complete original photon information and is therefore more suitable for lake applications such as water-surface detection, underwater-photon extraction, and refraction correction, which are necessary to construct along-track depth-constrained samples. ICESat-2 carries the ATLAS photon-counting green laser altimeter, featuring six beams, meter-scale footprints, and high pulse repetition, and has a certain capability for underwater penetration in clear lakes. In this study, ICESat-2 is not treated as a continuous truth surface; rather, it is used as a sparse, high-precision auxiliary supervision source to strengthen model adaptability across times and lakes. To support sample augmentation, multi-year stereo-mapping data from ZY-3 and GF-7 were also introduced. ZY-3 provides three-line-array stereo imaging, and GF-7 offers higher spatial resolution and stronger mapping capability. Both are used here to estimate lake-level changes between different years. The in situ bathymetric data were collected by boat-borne single-beam sounding surveys. The sounding device was an Airmar SS60 dual-frequency single-beam echosounder (Airmar Technology Corporation, Milford, NH, USA), which supports both 50 kHz and 200 kHz operating frequencies. The 200 kHz signal is suitable for shallow-water and local detail measurements, whereas the 50 kHz signal is more stable for deeper waters. The field surveys used as dense a line layout as possible, with additional cross-check lines in key areas to improve basin-shape constraints and quality control. These measured depth points serve as the primary training labels and the main benchmark for accuracy assessment. Figure 2 illustrates a schematic diagram of the in situ bathymetric points and ICESat-2 laser altimetry data over Anglaren Co. 3.3. Sample Augmentation Because of the logistical challenges of field surveys in plateau environments, in situ bathymetric samples usually cover only a limited number of dates and a limited spatial extent, which directly constrains the sample size and temporal generalization ability of deep learning models. To alleviate this problem, we introduce a sample augmentation strategy constrained by multi-year lake-level differences. The strategy expands measured samples from a single survey date to multiple years, thereby increasing both the temporal span of the training set and the range of water-level conditions represented in the samples. The basic idea is that, under the assumption that lake-bottom morphology remains relatively stable at the inter-annual scale for the high-altitude inland lakes considered in this study, depth changes at the same location between different years are primarily controlled by changes in lake-surface elevation. This assumption is physically equivalent to treating the lake basin as a stable container during the short observation period. For a fixed point, water depth can be expressed as D t = H t − Z b , t where H t is the lake-surface elevation and Z b , t is the lake-bed elevation. The migrated label from a reference year can therefore be written as D r e f + ( H t − H r e f ) = H t − Z b , r e f , indicating that the systematic error introduced by the stability assumption is Z b , t − Z b , r e f , namely the actual interannual change in lake-bed elevation. Available sedimentological evidence suggests that this term is very small for typical Tibetan Plateau lakes. Modern sedimentation rates derived from 137Cs and 210Pb dating have been reported to be only 0.45–2.05 mm yr −1 in Qinghai Lake [ 28], and sedimentary records over the past century indicate rates of approximately 1.0–1.59 mm yr −1 [ 29]. Similar bathymetric and sedimentation investigations of high-altitude Tibetan lakes also support the generally stable morphology of plateau lake basins under natural conditions [ 30]. Even using the upper-bound sedimentation rate, the cumulative bed-elevation change during 2020–2025 is only at the centimeter scale, approximately 1–2 cm, which is orders of magnitude smaller than the meter-level uncertainty of satellite bathymetric inversion. Therefore, the influence of natural lake-bed deformation on the migrated depth labels is expected to be negligible in this study. This treatment is also consistent with common satellite-hydrology studies of Tibetan Plateau lake storage and volume changes, in which lake basins are commonly regarded as stable containers over multi-year to decadal periods [ 31, 32, 33, 34]. Therefore, if lake-level differences between years can be estimated from ZY-3 and GF-7 stereo mapping data, the measured depth samples from a reference year can be vertically shifted to generate equivalent depth samples for other years. In other words, the method does not simply duplicate existing samples; instead, it uses independently derived stereo-elevation information to perform physically constrained temporal migration of the labels. The resulting samples preserve the spatial positions and hyperspectral feature expressions of the measured samples while explicitly introducing depth changes caused by multi-year water-level variability. In practice, a year with in situ sounding data is first selected as the reference period, and hyperspectral reflectance, water indices, and environmental variables are extracted for the corresponding sample locations to form the original sample set. Next, lake-level differences between years are estimated for each lake using multi-temporal stereo mapping data from ZY-3 and GF-7. Finally, these water-level differences are added to the measured depths of the reference year to create cross-year equivalent labels, which are matched with the pixels at the same spatial locations in hyperspectral imagery from the corresponding years. This procedure substantially expands the temporal coverage and water-level range of the training data without requiring additional large-scale field sounding. In addition to the cross-year expanded samples, along-track bathymetric samples derived from ICESat-2 are introduced as a second type of auxiliary sample. Compared with measured samples, ICESat-2 samples are spatially sparser but temporally more flexible and usually have high vertical precision. In this study, they are not treated as equivalent to measured truth; rather, they are used as auxiliary supervision samples carrying environment-dependent bias and are incorporated into the joint-supervision framework during training. The final sample system therefore includes three label sources: measured samples, ICESat-2 samples, and cross-year expanded samples. These play the roles of main supervision, auxiliary supervision, and temporal expansion, respectively, and together form the basis for multi-temporal hyperspectral bathymetric inversion. After the above processing and quality control, the final point-sample dataset used in the experiments contained 104,616 valid tabular records. Here, a sample denotes a point/pixel-level record obtained by collocating a depth label with the corresponding hyperspectral reflectance, water indices, imaging geometry, and environmental variables, rather than an independent lake object. Compared with a dataset based solely on single-date field measurements, this sample system represents a substantial expansion not only in sample number but also in diversity with respect to acquisition year, water-level state, and lake condition, thereby providing a stronger foundation for building a more robust and generalizable model. 3.4. Data Preprocessing To ensure that multi-source data could participate in the model within a unified framework, systematic preprocessing was conducted for hyperspectral imagery, ICESat-2 data, stereo-mapping data, and measured bathymetric data. The overall workflow included spatial-reference harmonization, image–point co-registration, outlier removal, auxiliary-variable calculation, and final construction of tabular samples. For hyperspectral imagery, preprocessing mainly included image selection, radiometric correction, atmospheric correction, geometric correction, and cropping to the study area. Because hyperspectral imagery is susceptible to stripe noise and invalid bands, obviously corrupted bands, low-SNR bands, and local outliers were additionally filtered. At later stages, data stability was further enhanced through self-supervised reconstruction and stripe suppression. For multiple scenes covering the same lake, a unified geometric reference and mosaicking procedure were also applied to ensure spatial consistency during sample extraction. For ICESat-2 ATL03 data, preprocessing focused on track selection, noise-photon removal, water-surface detection, underwater-photon extraction, and refraction correction. Effective tracks crossing the study lakes were first extracted according to lake boundaries. The original photon clouds were then denoised to separate water-surface and underwater photons. Finally, underwater-photon elevations were corrected for refraction at the air–water interface, producing along-track bathymetric samples. To reduce the influence of noise photons and anomalous returns, obviously unreasonable outliers were removed before sample construction, and non-lake photons were filtered using lake-boundary constraints. For ZY-3 and GF-7 stereo mapping data, preprocessing concentrated on stereo registration and estimation of lake-level change. Original stereo images were first processed by block adjustment and stereo mapping. Lake-surface elevations were then extracted from the resulting elevation products, and inter-annual lake-level differences were calculated. Because the purpose of the stereo data here was not to derive a lake-bottom DEM directly, but rather to provide lake-level constraints for cross-year sample expansion, preprocessing emphasized relative elevation consistency among dates rather than absolute terrain-detail recovery. For in situ bathymetric data, preprocessing included the co-registration of sounding records with GNSS positions, coordinate-system unification, vertical-datum unification, and inspection of anomalous points. Field quality control had already been performed through dense survey lines and cross-check lines. During office processing, obviously anomalous points were further removed, and the bathymetric points were spatially overlaid with hyperspectral pixels and stereo-elevation results under a unified coordinate and vertical reference. The resulting measured depth data could then serve both as the main supervision for model training and as the reference standard for ICESat-2 bias analysis and overall accuracy evaluation. Based on the above preprocessing, all multi-source observations were finally organized into a point-sample-oriented tabular dataset. Each sample corresponded to a spatial point and contained four types of information: (1) hyperspectral features, namely multi-band reflectance extracted from the corresponding pixel; (2) index features, such as water-related indices including NDWI and MNDWI; (3) environmental and geometric variables, such as acquisition date, coordinates, solar elevation angle, viewing geometry, and elevation-related information; and (4) labels, including measured depth, ICESat-2-derived depth, or cross-year expanded depth. Through this organization, multi-source observations with different origins and formats were unified into heterogeneous tabular samples that can be directly input into the model, thereby laying the data foundation for subsequent TabKAN modeling. 4. Methodology 4.1. Overall Data Flow of the Proposed Framework To provide a clearer description of the proposed framework, Figure 3 illustrates the complete data flow of TabKAN from multi-source data preprocessing to full-scene bathymetric prediction. The workflow consists of four main stages: multi-source preprocessing, tabular sample construction, model pretraining and joint-supervised training, and final bathymetric inference. First, four types of data are preprocessed separately. For AHSI hyperspectral imagery, radiometric correction, atmospheric correction, geometric correction, invalid-band removal, and stripe-mask extraction are performed to obtain reliable spectral observations. For ICESat-2 ATL03 data, photon denoising, water-surface detection, underwater-photon extraction, and refraction correction are conducted to generate along-track auxiliary depth samples. For ZY-3 and GF-7 stereo mapping data, lake-surface elevations from different years are extracted to estimate interannual lake-level differences. For in situ sounding data, coordinate unification, vertical-datum correction, and outlier removal are performed to produce the primary measured depth labels. Second, the preprocessed multi-source observations are converted into point-sample-oriented tabular records. Each sample contains four groups of input features: hyperspectral reflectance, water indices, imaging-geometry variables, and environmental variables. The corresponding labels come from three sources: original measured sounding samples, cross-year expanded samples derived from lake-level migration, and ICESat-2 auxiliary samples. The measured and cross-year expanded samples are used to constrain the main bathymetric mapping, whereas ICESat-2 samples are introduced as auxiliary supervision through the bias-aware learning mechanism. Third, model training contains two closely related parts. The stripe-aware self-supervised pretraining module learns spectral representations from non-striped hyperspectral regions and restores stripe-contaminated spectra. The pretrained TabNet encoder is then transferred to the downstream TabKAN model. In the supervised training stage, the TabNet encoder and KAN regression head form the main depth prediction branch. Meanwhile, ICESat-2 samples are used together with an environmental bias subnet to provide additional along-track constraints without forcing the main branch to directly absorb ICESat-2-specific label bias. Finally, during full-scene inference, each water pixel is converted into the same tabular feature format as the training samples and then fed into the trained TabKAN model. The final bathymetric map is generated using only the main depth prediction branch. The ICESat-2 bias subnet is used to improve training through auxiliary supervision, but it is not added to the final prediction because the target of inference is the true water depth rather than the reproduction of ICESat-2 label bias. 4.2. TabKAN Architecture As shown in Figure 4a, the TabNet encoder adopts a sequential feature-selection mechanism with multiple decision steps. The raw input features are first batch-normalized and then processed through multiple decision steps. In each step, the feature transformer performs nonlinear mapping on the current input and extracts an intermediate representation. A split operation then separates the features into the part used for decision output and the part used for attention assignment. The attentive transformer generates a sparse feature mask according to the prior information from the previous step, thereby determining which input variables should be emphasized at the next step. Some feature-transformer modules share parameters across steps, whereas others are specific to the current decision step. This design allows the model to maintain a consistent representation capability while progressively learning more refined discriminative information. The outputs of different decision steps are aggregated to form the final encoded representation. This TabNet structure is particularly suitable for the task considered in this study. On the one hand, the input features include hyperspectral reflectance, water indices, imaging geometry, and environmental variables, which are high-dimensional, heterogeneous, and strongly correlated. On the other hand, the contributions of different bands and auxiliary variables to bathymetric inversion are highly unbalanced. Through the stepwise feature-selection mechanism implemented by the attentive transformer and the masks, the model can automatically highlight the key bands and environmental information most relevant to water depth while suppressing redundant variables and noisy features, thereby improving the stability of downstream regression. After obtaining the encoded representation from TabNet, the final depth prediction is carried out by the KAN regression head. As shown in Figure 4b, the key characteristic of KAN is that, unlike conventional multilayer perceptrons that mainly rely on learnable weights between nodes, KAN defines learnable nonlinear functions on the edges of the network and performs inter-layer aggregation through node summation. For a complex regression task such as lake bathymetry, which is influenced jointly by spectral response, water conditions, and environmental information, this design is theoretically better suited to expressing complicated nonlinear mappings. 4.3. Joint Supervision and Bias-Aware Learning with ICESat-2 Although ICESat-2 can provide sparse and high-precision along-track bathymetric constraints, its derived depths cannot be treated as identical to true depths. On the one hand, ICESat-2 depth estimation is affected by multiple factors, including water transparency, noise photons, air–water refraction correction, observation geometry, and bottom-return identification errors. On the other hand, these errors usually do not manifest as a constant offset but instead show clear environmental dependence. If ICESat-2 samples were treated as labels equivalent to measured samples and used directly in a unified training set, the model might erroneously absorb systematic differences in the laser-derived labels into the main optical mapping, thereby weakening the physical consistency and cross-scene robustness of the primary regression relationship. Based on this observation, we introduce a joint-supervision and bias-aware learning mechanism on top of the TabKAN backbone, so that measured samples and ICESat-2 samples can participate jointly in training while their systematic discrepancy is explicitly modeled. As illustrated in Figure 5, the model output is decomposed into two parts. One is the main depth term produced by the shared backbone network, which can be understood as the shared depth estimate determined by hyperspectral and related features under ideal conditions. The other is an environmental bias term produced by a bias subnet, which is used to describe the systematic discrepancy of ICESat-2 labels relative to the shared depth estimate under specific environmental conditions. For samples with measured labels, the full input feature vector is denoted by x and the corresponding label by y . The sample is fed into the shared TabNet encoder and KAN regression head to produce the main depth prediction: p = f m a i n x , (1) where f m a i n ⋅ denotes the main mapping composed of the shared encoder and the main regression head. Because measured samples most directly reflect true water depth, the error between p and y is used directly as the supervision signal to constrain the shared backbone to learn the common mapping between hyperspectral/environmental features and true depth. For ICESat-2 samples, the full input feature vector is denoted by x I C E S a t and the corresponding ICESat-2 label by y I C E S a t . As with measured samples, x I C E S a t is passed through the shared backbone to produce the main depth term f m a i n x I C E S a t . Meanwhile, a feature subset containing only imaging-geometry and environmental variables is extracted from the ICESat-2 sample and denoted by x ′ I C E S a t , which is then fed into the KAN bias subnet to estimate the environment-driven systematic bias: b e n v = f b i a s x ′ I C E S a t , (2) where f b i a s ⋅ denotes the bias-modeling function. The final prediction for the ICESat-2 label is then written as p I C E S a t = f m a i n x I C E S a t + b e n v . (3) The key point is that the bias subnet does not use hyperspectral reflectance directly, but only geometry and environmental variables. This design aims to constrain the systematic discrepancy between ICESat-2 and measured samples to an “environment-driven bias term” rather than letting the bias branch learn an independent spectrum–depth relationship. In other words, the main regression head learns the shared bathymetric mapping, whereas the bias subnet absorbs the additional deviation caused by observation conditions, water environment, and label-source differences. Two regression losses are defined separately for measured samples and ICESat-2 samples. For the set of measured samples D m , a mean-squared-error loss is used to constrain the main regression term: L m = 1 D m ∑ x , y ∈ D m f m a i n x − y 2 . (4) Here, the measured samples include both the original field bathymetry and the cross-year expanded samples that are treated as field-equivalent samples in this study, and together they provide the main supervision for learning the primary mapping. For the ICESat-2 sample set D i , the auxiliary supervision loss is defined as L i = 1 D i ∑ x I C E S a t , x ′ I C E S a t , y I C E S a t ∈ D i f m a i n x I C E S a t + f b i a s x ′ I C E S a t − y I C E S a t 2 . (5) The purpose of this loss is not to replace true depth with ICESat-2 labels, but to let ICESat-2 provide additional high-precision constraints through the combination of a main depth term and an environmental bias term without destroying the physical meaning of the main mapping. The total objective of the model is therefore L = L m + L i . (6) This formulation is concise and physically meaningful. L m constrains the main regression branch to learn the shared mapping between hyperspectral/environmental features and true depth, whereas L i guides the model to explicitly perceive and absorb the environment-dependent bias in ICESat-2 labels during joint training. Through this combined loss, the model can simultaneously benefit from the authenticity of measured samples and the high precision of ICESat-2 samples, while maintaining a reasonable balance between shared mapping and bias correction. Training is conducted in a staged manner. First, the TabNet encoder and KAN main regression head are pre-trained using only measured samples so that the main mapping can converge to a stable solution under measured supervision. At this stage, only L m is optimized, and the goal is to let the model first learn the basic depth relationship supported by hyperspectral and environmental information. Then, while keeping the overall backbone stable, ICESat-2 samples are introduced, the bias subnet is activated, and the shared encoder and main head are jointly fine-tuned with a small learning rate while the bias subnet is trained. In this way, the model gradually learns the systematic offset of ICESat-2 labels relative to the shared mapping under different environmental conditions. Finally, all samples are used in joint optimization until the main depth term and the environmental bias term converge under the unified loss. This staged strategy is motivated by two considerations. First, if measured and ICESat-2 samples are introduced simultaneously from the very beginning, the main regression head may be disturbed by heterogeneous label differences before a stable mapping is formed. Second, learning the main mapping from measured samples first and then performing bias-aware fine-tuning with ICESat-2 samples is more consistent with the logic of “first learn the true depth relationship, then incorporate auxiliary constraints”. This allows the model to fully exploit both sample types while keeping a clear division between the main branch and the bias branch. It should also be emphasized that, although ICESat-2 samples participate in joint supervision through the bias branch during training, the final full-scene prediction targeting true depth mainly uses the output of the main regression branch f m a i n x as the final bathymetric result. In other words, the bias subnet mainly plays its role during training and fine-tuning, helping the model identify and absorb environment-dependent systematic differences in ICESat-2 labels, rather than directly being added to the full-scene prediction during inference. In this way, the auxiliary information provided by ICESat-2 can be fully utilized while preserving the physical interpretability of the final output to the greatest extent. Overall, the joint-supervision and bias-aware learning mechanism proposed in this section does not merely address the question of “how to use one more type of sample”. Rather, it addresses how to preserve the physical consistency of the main mapping under multi-source labels while making full use of the high precision of ICESat-2. By explicitly decomposing the environment-dependent error of ICESat-2 into a bias term, TabKAN becomes not just a hyperspectral regression model, but a multi-source supervision framework capable of simultaneously learning the shared depth mapping and absorbing heterogeneous-label bias. This also lays the foundation for further improving model robustness through stripe suppression and self-supervised pretraining. 4.4. Stripe-Aware Self-Supervised Pretraining and Destriping Hyperspectral satellite imagery is often affected by the push-broom imaging mechanism, detector-response inconsistency, and instrument instability, which may produce obvious stripe noise in some bands or local regions. For the Resource-1 02E AHSI imagery used in this study, such stripe noise not only degrades visual quality, but more importantly damages radiometric consistency across bands and spectral-curve shapes, thereby directly affecting sample feature extraction, model training, and the stability of full-scene depth prediction. In point-sample-driven hyperspectral bathymetry, if training samples happen to fall within stripe-contaminated regions, the model may incorrectly learn stripe artifacts as bathymetrically informative features, which in turn weakens cross-temporal and cross-lake generalization. Therefore, stripe processing is not treated here as a simple preprocessing step isolated from the downstream task. Instead, it is jointly designed together with self-supervised representation learning and bathymetric regression in a stripe-aware self-supervised pretraining and destriping strategy. 4.4.1. Stripe Mask Extraction Stripe noise usually appears as line-shaped radiometric anomalies elongated along the scanning direction, and its response varies across bands. To automatically identify such anomalous regions as much as possible, each AHSI band is first analyzed statistically, and a stripe mask is constructed by combining bandwise statistics and local spatial consistency. Let the image of the b -th band be denoted by I b . For each band, the column-wise mean, column-wise standard deviation, and the difference response with neighboring columns are calculated to obtain a stripe-strength indicator s b j . When a column deviates significantly from its local background in radiometric statistics and the anomaly shows strong consistency along the row direction, it is marked as a potential stripe region. The detection results of all bands are then aggregated spectrally to generate a stripe confidence map, which is thresholded to obtain the final stripe mask: M s t r i p e ∈ { 0 , 1 } H ୍ଠ W ୍ଠ B , (7) where M s t r i p e = 1 indicates that the corresponding spatial location and band are judged to be stripe-contaminated, whereas M s t r i p e = 0 indicates that the position is considered relatively reliable. It should be emphasized that the purpose of constructing M s t r i p e is not to obtain absolutely precise noise labels, but rather to identify regions that should not be directly used as supervis

www.mdpi.com

Zum Originalartikel