Zum Inhalt springen

3D Karst Cave Identification Using UKAN-CBAM in Seismic Images of Fractured-Vuggy Reservoir

Prometheus Redaktion

Open AccessArticle 3D Karst Cave Identification Using UKAN-CBAM in Seismic Images of Fractured-Vuggy Reservoir Binpeng Yan Binpeng Yan *, Haobo Gao Haobo Gao , Rui Pan Rui Pan Yongliang Wang Yongliang Wang Department of Petroleum, China University of Petroleum-Beijing at Karamay, Karamay 834000, China * Author to whom correspondence should be addressed. Appl. Sci. 2026, 16(12), 5765; https://doi.org/10.3390/app16125765 (registering DOI) Submission received: 29 April 2026 / Revised: 28 May 2026 / Accepted: 5 June 2026 / Published: 8 June 2026 Abstract Accurate identification of karst caves from seismic data is crucial for carbonate reservoir characterization, as these caves often serve as primary hydrocarbon storage spaces and migration pathways. However, it remains challenging due to the highly nonlinear relationship between seismic waveforms and cave geometries, as well as the noise propagation in skip connections inherent to U-Net-based methods. To address these limitations, this paper proposes UKAN-CBAM, a novel 3D network that synergistically integrates Tokenized Kolmogorov–Arnold Network (Tok-KAN) modules and Convolutional Block Attention Modules (CBAM) within a U-shaped encoder–decoder architecture. Unlike U-Net, which relies on linear convolutional kernels, the Tok-KAN modules employ learnable spline-based activation functions to better capture the nonlinear relationships between seismic waveforms and cave geometries. Furthermore, CBAM embedded in each skip connection adaptively recalibrates features along the channel and spatial dimensions, thereby suppressing noise and sharpening cave boundaries. Trained on synthetic data and validated on physical modeling data from the Sichuan Basin and field data from the Tarim Basin, UKAN-CBAM consistently outperforms U-Net, ResUNet, UNet-CBAM, and coherence attributes across multiple evaluation metrics. The proposed network delineates caves with improved continuity and sharper boundaries while reducing false positives, demonstrating strong generalization capability. The results indicate that the synergistic design of KAN’s nonlinear modeling and CBAM’s attention mechanism effectively mitigates the limitations of traditional approaches for karst cave identification. Keywords: karst cave detection; Kolmogorov–Arnold Network; CBAM; deep learning 1. Introduction Traditional methods for cave identification have largely relied on seismic attribute analysis, such as coherence [ 4, 5], structural curvature [ 6], spectral decomposition [ 7], multidirectional second-order gradient attributes [ 8], and fractal analysis [ 9, 10]. These attributes are designed to highlight discontinuities or anomalies in seismic data that may be associated with cave development. Although these attributes can effectively highlight anomalies associated with cave development, they still suffer from several inherent limitations. Different subsurface features, such as caves, faults, or lithological variations, may produce similar attribute responses, leading to ambiguous interpretations that depend heavily on the interpreter’s expertise [ 11]. Furthermore, conventional seismic attributes, which rely on local linear statistics, fail to adequately capture the nonlinear characteristics inherent in the bead-string reflections of cave systems [ 12]. These limitations have motivated the adoption of deep learning-based approaches for cave identification, which can learn complex patterns directly from data without explicit feature engineering. U-Net and its variants, as an important branch of convolutional neural networks (CNNs), have been widely adopted for seismic cave identification in recent years due to their ability to adaptively extract features, effectively mitigating the interpretation ambiguity inherent in traditional seismic interpretation methods. The pioneering work by Wu et al. [ 13] applied a 3D U-Net to paleokarst characterization, providing the first demonstration of the significant advantages of end-to-end deep learning segmentation over conventional seismic attributes. To address prediction uncertainty in deeply buried cave identification, Zhang et al. [ 14] integrated Bayesian inference with an encoder–decoder architecture, enabling simultaneous prediction of cave locations and their confidence intervals, to support risk-aware decision-making. To tackle the class imbalance problem, Huang et al. [ 15] introduced a class-balanced loss function into a 2D U-Net framework, effectively mitigating training bias when learning from bead-string reflections. More recently, to overcome the shielding effect of strong reflection layers, Cao et al. [ 16] proposed a wavefield-separation-driven method that decouples strong and weak reflection wavefields before feeding them into a 3D CNN. This approach enhances the network’s sensitivity to subtle cave information within weak signals and improves cave identification accuracy beneath strong reflectors. Despite the aforementioned advances, standard U-Net architectures remain limited in their ability to capture the complex nonlinear relationships between seismic waveforms and cave geometries due to their reliance on linear convolutional kernels [ 17]. Another limitation arises from their skip connections, which may propagate redundant or noisy features [ 18], thereby compromising the precision of cave boundary delineation. To address these challenges, this paper proposes a novel network, termed UKAN-CBAM, which builds upon the U-Net backbone by incorporating Tok-KAN modules to enhance standard convolutions and integrating a CBAM into the skip connections. We apply the UKAN-CBAM framework for the first time to seismic karst cave identification. Specifically, Tok-KAN modules replace conventional linear weight matrices with learnable nonlinear activation functions, significantly enhancing the model’s capacity to represent the complex nonlinear relationships between seismic waveforms and cave geometries. Originally developed for medical image segmentation [ 19] and subsequently extended to seismic facies analysis [ 20], the effectiveness of Tok-KAN modules in capturing intricate nonlinear features has been well established [ 21]. Meanwhile, the CBAM integrated into the skip connections enables adaptive feature recalibration along both channel and spatial dimensions prior to feature fusion [ 22], effectively suppressing redundant information and noise propagation [ 23], thereby sharpening the accuracy of cave boundary delineation. Through the synergistic design of these two components, UKAN-CBAM, as a holistic framework, simultaneously improves the modeling capacity for nonlinear features and the precision of boundary segmentation. 2. Generating Synthetic Datasets To construct a labeled dataset for training the proposed UKAN-CBAM network toward karst cave identification, we adopt a reflectivity-series-based forward modeling strategy following the methodology of Wu et al. [ 8]. This approach is widely used in seismic interpretation studies because it allows for the generation of large-scale, diverse, and accurately labeled synthetic data at a relatively low computational cost, which is essential for training deep learning models. To simulate realistic karst caves, we embed irregularly shaped cave geometries directly into the horizontally layered reflectivity model. The caves are approximated by three-dimensional vertically elongated ellipsoids with random perturbations at their boundaries. The indicator function for the interior of an ellipsoid is defined as: f ( x , y , z ) = x − c x 2 a 2 + y − c y 2 b 2 + z − c z 2 c 2 (1) where c (c x, c y, c z) is the ellipsoid center, and a, b, c are the semi-axis lengths along the x, y, z directions, respectively. To generate diverse cave sizes, a and b are randomly selected from the interval [10, 120] m, and c from [4, 80] m, reflecting the typical elongation of karst caves along the vertical direction due to preferential dissolution along fracture networks. To simulate boundary irregularities, a random perturbation δ, following a Gaussian distribution with zero mean and standard deviation σ = 0.1 is introduced. This perturbation adds realism to the cave boundaries, mimicking the effect of heterogeneous dissolution and collapse. The cave is then defined as: f x , y , z ≤ 1 + δ : inside cave f x , y , z > 1 + δ : outside cave (2) To further simulate arbitrary orientations of caves in space, a rotation transformation is applied to the ellipsoid coordinates. The rotation matrix R is defined as the product of two elementary rotations: first a rotation about the y-axis by angle β, followed by a rotation about the x-axis by α angle. This allows caves to be oriented in any direction, reflecting the structural complexity of fractured carbonate reservoirs. R = 1 0 0 0 cos α − sin α 0 sin α cos α cos β 0 sin β 0 1 0 − sin β 0 cos β (3) The rotation angles αβ are randomly selected from the interval [−12°, 12°]. Let the resulting reflectivity model be denoted as r ( x, y, z). To further simulate realistic subsurface structures, we introduce folds and formation dips into the cave-embedded reflectivity model. A vertical displacement field method is adopted to generate folds. A 3D displacement field S ( x, y, z) is defined to represent the vertical shift at each point. This displacement field is composed of a Gaussian-type local uplift term S G ( x , y , z ) and a linear tilt term S L ( x , y , z ) : S ( x , y , z ) = S G ( x , y , z ) + S L ( x , y , z ) (4) where SG employs a two-dimensional Gaussian function to simulate local fold uplift: S G x , y , z = ∑ k = 1 N A k exp − ( x − x k ) 2 2 σ x k 2 − ( y − y k ) 2 2 σ y k 2 (5) where Ak is the fold amplitude, ( xk, yk) is the fold center, and σx, σy control the lateral extent. These parameters are randomly selected from predefined geologically reasonable ranges to ensure diversity in the training set. The linear term SL is used to generate regional stratigraphic dips: S L x , y , z = p x + q y + l (6) where pq control the dips in the xy directions, respectively, and l is an overall vertical shift constant. Under the action of the displacement field, the folded reflectivity rf ( x, y, z) is obtained by coordinate mapping of the cave-embedded model: r f ( x , y , z ) = r x , y , z + S x , y , z (7) After completing the above modeling steps, the final 3D reflectivity model is obtained. Convolving this model with a 35 Hz Ricker wavelet generates the synthetic seismic data volume. The Ricker wavelet is commonly employed in seismic forward modeling due to its zero-phase property and realistic frequency content. Correspondingly, a binary label volume is created by assigning 1 to voxels inside caves and 0 to those outside. Following this workflow, as illustrated in Figure 1, we produce 120 training data pairs, each with seismic data and label volumes of dimensions 256 × 256 × 256. This dataset provides sufficient diversity in cave geometry, orientation, and structural context to train a deep learning model with strong generalization capability. 3. Network Structure 3.1. UKAN-CBAM Architecture Figure 2 illustrates the proposed UKAN-CBAM architecture. This architecture adopts a 3D U-shaped encoder–decoder as its backbone and integrates KAN modules with the CBAM attention mechanism to achieve end-to-end intelligent identification of subsurface karst caves. In the encoder path, the network extracts multiscale features through progressive downsampling. The first three encoding stages employ 3D convolutional modules, while Tok-KAN modules are introduced in the two deepest encoding stages to replace traditional convolutional layers. Shallow stages are designed to capture local textural and edge information, which is generally less nonlinear. Leveraging the kernel transformation characteristics of the KAN architecture, the Tok-KAN module performs high-order nonlinear mapping on the feature maps generated by the convolutional modules, thereby enhancing the discriminative representation for complex karst morphologies. In the decoder path, feature maps are progressively upsampled to restore the original resolution. The first two decoding stages utilize Tok-KAN modules to reconstruct deep semantic features, while the subsequent three stages are completed by convolutional modules to progressively recover spatial details, thereby ensuring boundary precision in the segmentation results. A CBAM is embedded in each skip connection. This module adaptively recalibrates feature responses through channel and spatial attention mechanisms, while effectively suppressing noise and background interference from non-target geological structures, ensuring that only relevant information contributes to the final segmentation, thereby improving the network’s discriminative capability and overall identification accuracy. 3.2. Tokenized KAN Module Architecture As shown in Figure 3a, the Tok-KAN module consists of Tokenization, KAN Layer, depthwise convolution (DwConv), and Layer Normalization, four sequential components, with a residual connection wrapping the combination of the KAN Layer and DwConv. This architecture is designed to balance global nonlinear modeling capacity with local spatial detail preservation. The tokenization operation converts the spatial feature map from the encoder into a sequence of patch embeddings. Given an input feature X ∈ R C ୍ଠ D ୍ଠ H ୍ଠ W (with channels C, depth D, height H, and width W), the tokenization operation first partitions the 3D feature map into non-overlapping patches of size P × P × P. A trainable linear projection E ∈ R ( P 2 C ) ୍ଠ D , implemented using a convolutional layer with a kernel size of 3 × 3 × 3, is then applied to map each patch into a D-dimensional embedding space: Z 0 = [ X 1 E ; X 2 E ; … ; X N E ] ∈ R N ୍ଠ D (8) where N = ( D / P ) ୍ଠ ( H / P ) ୍ଠ ( W / P ) is the number of patches. This tokenization step converts the spatial feature map into a sequence of tokens, enabling subsequent global interactions. As illustrated in Figure 3b, the KAN Layer replaces the conventional multilayer perceptron (MLP) with a learnable nonlinear function matrix. Formally, a KAN Layer R with nin inputs and nout outputs comprises nin ୍ଠ nout learnable activation functions r(q,p) (·). Unlike standard MLPs that apply fixed nonlinearities, KANs learn the shape of each activation function from data. Each r is implemented as a linear combination of B-splines, which are locally supported, enabling flexible approximation of nonlinear relationships. The output Z out ∈ R n o u t of this layer is obtained by summing the outputs of the functions connected to each input: Z out [ q ] = ∑ p = 1 n m r q , p Z in [ p ] , q = 1 , … , n out (9) Although KAN Layers excel at modeling global nonlinear interactions, they operate on tokenized sequences and may lose local spatial coherence. This is particularly problematic for seismic interpretation, where fine-scale textural details are critical for cave detection. To compensate for this, a DwConv is inserted after the KAN Layer. The DwConv operates on the feature map reshaped back to a 3D grid and enhances local feature details. To stabilize training and improve gradient flow, a residual connection is added around the combination of the KAN Layer and the DwConv. This design mitigates the vanishing gradient problem and enables deeper networks. Layer normalization is applied after the residual addition to normalize the activations across feature channels, accelerating convergence. The complete operation of the Tok-KAN module can be formulated as: Z m = L N ( Z m − 1 + D w C o n v ( K A N ( Z m − 1 ) ) ) (10) where Zm denotes the output of the m-th block. In our UKAN-CBAM architecture, we stack two such Tok-KAN modules in each of the two deepest encoding stages. 3.3. Convolutional Block Attention Module (CBAM) As shown in Figure 4a, CBAM employs a sequential cascading manner to perform refined calibration of the input feature map from both channel and spatial dimensions. Given an intermediate feature map F ∈ R C ୍ଠ D ୍ଠ H ୍ଠ W transmitted from the encoder layer, CBAM sequentially generates a channel attention map M c ∈ R C ୍ଠ 1 ୍ଠ 1 ୍ଠ 1 and a spatial attention map M s ∈ R 1 ୍ଠ D ୍ଠ H ୍ଠ W . The entire feature refinement process can be formulated as: F ′ = M c ( F ) ⊗ F , F ″ = M s ( F ′ ) ⊗ F ′ (11) Figure 4b illustrates the channel attention module. It first aggregates spatial information through average-pooling and max-pooling operations, generating FavgFmax, two spatial context descriptors, respectively. Average-pooling captures global statistical information about the feature distribution, whereas max-pooling highlights the most salient local features. These two descriptors are then fed into a parameter-shared MLP for processing. Finally, the channel attention weights are generated through element-wise summation followed by a Sigmoid activation function: M c ( F ) = σ MLP ( AvgPool ( F ) ) + MLP ( MaxPool ( F ) ) (12) where σ denotes the Sigmoid activation function, and the weights of the MLP are shared across different inputs to reduce the number of parameters. The simultaneous use of average-pooling and max-pooling enables the module to capture both global statistical information and salient local features. Figure 4c illustrates the spatial attention module, which first performs average-pooling and max-pooling along the channel axis, generating two 3D feature maps: F a v g s ∈ R 1 ୍ଠ D ୍ଠ H ୍ଠ W F max s ∈ R 1 ୍ଠ D ୍ଠ H ୍ଠ W . These two feature maps are then concatenated along the channel dimension, resulting in a 2-channel feature map that encodes both global and locally salient spatial information. This concatenated feature map is passed through a 7 × 7 × 7 convolutional layer to generate the spatial attention map. The large kernel size of 7 × 7 × 7 enlarges the receptive field, enabling the module to capture broader contextual relationships, which is particularly beneficial for distinguishing karst caves from other types of discontinuities. The spatial attention map is computed as: M s ( F ) = σ f 7 ୍ଠ 7 ୍ଠ 7 ( [ F avg s ; F max s ] ) (13) where [ ; ] denotes the concatenation operation along the channel dimension, and f୭୍ଠ୭୍ଠ୭ represents a convolutional layer with a kernel size of 7 × 7 × 7, which is employed to enlarge the receptive field and capture broader contextual information. In the context of karst cave identification, the integration of CBAM into skip connections provides two critical benefits. First, it suppresses background noise and irrelevant reflections that would otherwise be propagated to the decoder, reducing false positives. Second, it enhances the responses at cave boundaries by emphasizing spatial locations where abrupt reflectivity changes occur, leading to sharper delineation. 4. Training and Model Evaluation 4.1. Training and Validation The algorithm presented in this paper was implemented using PyTorch 2.7.1. The network was trained with the AdamW optimizer, with an initial learning rate of 1.5 × 10 −4, a cosine annealing learning rate scheduler, and a batch size of 2. The loss function is a composite loss combining Dice and Focal losses. The hardware platform utilized is an NVIDIA GeForce RTX 4070 Laptop GPU. Early stopping was applied based on the validation Dice coefficient, with a patience of 30 epochs. The training and validation data used in this study are derived from the synthetic data generated by the workflow described in Section 2. This dataset contains 120 pairs of seismic data and their corresponding karst cave labels, each with the dimensions 256 × 256 × 256. The dataset was randomly split with a ratio of 5:1, resulting in 100 pairs for training and 20 pairs for validation. To enhance the generalization capability of the network, data augmentation was introduced during training by rotating each sample around the vertical axis by 0°, 90°, 180°, and 270°, respectively, effectively expanding the training dataset fourfold and improving the network’s recognition robustness for karst caves from different orientations. 4.2. Loss Function The Dice loss optimizes the model from the perspective of regional overlap and is insensitive to the pixel quantity disparity between foreground and background [ 24], thereby making it suitable in the segmentation of small targets such as subsurface karst caves. However, although the Dice loss effectively mitigates class imbalance, the model still requires stronger supervisory signals for challenging samples near cave boundaries and those affected by noise. The Focal loss introduces a modulation factor to the standard cross-entropy loss, encouraging the model to focus more on hard-to-classify samples during training [ 25]. To integrate the global advantage of the Dice loss in optimizing regional overlap with the targeted advantage of the Focal loss to emphasize hard samples, this paper employs a weighted combination of both [ 26], enabling the model to accurately segment karst cave regions as a whole while finely delineating their boundary details. L Dice = 1 − 2 ୍ଠ ∑ y p + ε ∑ y + ∑ p + ε (14) L Focal = − α 1 − p γ y log p − 1 − α p γ 1 − y log 1 − p (15) where y denotes the ground-truth binary mask, p represents the predicted probability map, ε is a small constant introduced to ensure numerical stability, α adjusts the weights of positive and negative samples, and γ controls the down-weighting of easily classified samples. In our implementation, we set α = 0.85 to give higher weight to the cave class, which represents only a small fraction of voxels, and γ = 2.0 to down-weight background voxels with high confidence and focus training on boundary voxels or those with ambiguous seismic responses. The total loss is defined as: L Total = L Dice + L Focal (16) 4.3. Evaluation Metrics In this paper, to quantitatively evaluate the segmentation performance of the proposed network, we adopted Accuracy, Recall, Dice, and F1 score as evaluation metrics. These metrics offer a comprehensive evaluation of the model’s performance in segmentation tasks. The formula of these metrics is specified as follows: Accuracy = TP + TN TP + TN + FP + FN (17) Recall = TP TP + FN (18) Dice = 2 ୍ଠ TP 2 ୍ଠ TP + FP + FN (19) F 1 = 2 ୍ଠ Precision ୍ଠ Recall Precision + Recall (20) where TP (true positive) denote the number of cave voxels correctly classified as cave, TN (true negative) the number of background voxels correctly classified as background, FP (false positive) the number of background voxels incorrectly classified as cave, and FN (false negative) the number of cave voxels incorrectly classified as background. 4.4. Contrast Experiments for Training and Validation To evaluate the effectiveness of UKAN-CBAM for underground karst cave identification, we conducted quantitative comparisons against three baseline models: U-Net (standard 3D U-Net with four encoder stages), ResUNet (U-Net with residual blocks in each encoder/decoder stage), and UNet-CBAM (U-Net with CBAM inserted in skip connections but without Tok-KAN modules). All baseline models were trained on the same synthetic dataset using identical data augmentation, optimizer, learning rate schedule, and loss function combinations to ensure a fair comparison. We adopted Loss, Accuracy, Recall, Dice, and F1 score as evaluation metrics. As shown in Figure 5a, the loss function convergence curves demonstrate that UKAN-CBAM exhibits the optimal convergence characteristics, achieving both the fastest descent rate and the smallest final convergence loss. Figure 5b,c show the training and validation accuracy curves, respectively. UKAN-CBAM achieved a final validation accuracy of 98.7%, outperforming ResUNet, UNet-CBAM, and U-Net. The relatively small gap between training and validation accuracy suggests that the model does not suffer from overfitting, thanks to the regularization effects of dropout in the KAN layers and the diversity of the synthetic dataset. Figure 5d–f present the F1 score, Dice coefficient, and Recall over training epochs. Consistently indicate that UKAN-CBAM attains the highest scores and the fastest convergence speed across all metrics, thereby fully validating its comprehensive performance advantage. Notably, the other three models exhibit a clear performance gradient: ResUNet ranks the highest, UNet-CBAM falls in the middle, and U-Net performs the weakest. This ordering reveals the differential effectiveness of various network optimization strategies for the karst cave identification task. The superiority of UNet-CBAM over U-Net validates the role of the CBAM attention mechanism in enhancing critical feature focusing; the superiority of ResUNet over UNet-CBAM further suggests that, for data characterized by low signal-to-noise ratios and blurred boundaries, such as underground karst cave identification, the training stability gains provided by residual structures are more fundamental than those offered by attention mechanisms. The advantage of UKAN-CBAM stems from its synergistic design: on one hand, it inherits the strong nonlinear feature fitting capability of the Tok-KAN architecture by replacing fixed activation functions with learnable spline basis functions; on the other hand, it further strengthens spatial attention focusing on karst cave regions through the CBAM modules integrated at the skip connections. This dual-gain mechanism enables UKAN-CBAM to achieve the best performance across all evaluation metrics, fully validating the effectiveness and superiority of the proposed network for underground karst cave identification. 5. Application To evaluate the recognition performance and generalization ability of UKAN-CBAM, we applied it to 3D physical modeling data and 3D field seismic data, and conducted a comparative analysis with the coherence attribute, U-Net, ResUNet, and UNet-CBAM. All deep learning models were used in a zero-shot manner, without any fine-tuning on the physical or field data, to strictly test their generalization capability. 5.1. Comparisons with the 3D Physical Model Data To further evaluate the performance of the proposed network, we conducted experiments using three-dimensional physical simulation data acquired from a cave-rich region in the Sichuan Basin, Southwest China. Physical modeling data are generated by constructing a scaled physical model of the subsurface and acquiring reflections in a laboratory setting. This type of data provides a realistic representation of wave propagation effects, including multiples, diffractions, and attenuation, while still having known positions and geometries of caves in the physical model [ 27]. The dataset has dimensions of 500 (time samples) × 580 (crossline) × 600 (inline), with a temporal sampling interval of 1 ms, and the trace spacing is 25 m. Figure 6Figure 7 present the 3D seismic data volumes and corresponding karst cave identification results for two representative time slices. In each figure, (a) shows the original seismic amplitude, (b) displays the cave detection results of the coherence attribute, and (c)–(f) correspond to the identification results obtained from the U-Net, ResUNet, UNet-CBAM, and UKAN-CBAM, respectively. From panel (b) to panel (f), as the methodology progresses from traditional attribute analysis to deep learning models and further to the progressive optimization of networks, the identified karst cave features exhibit a progressive improvement in both completeness and boundary accuracy, particularly in the areas indicated by the two red rectangles and the red arrow. The caves appear as compact, well-connected bodies with sharp boundaries that closely match the known cave geometries in the physical model. These results indicate the decisive role of model architecture optimization in enhancing feature extraction capability. Comparative analysis demonstrates that UKAN-CBAM achieves the best overall performance, extracting karst cave structures with optimal spatial continuity while significantly reducing false predictions in background regions, particularly in the areas indicated by the two red rectangles. In these rectangles, the baseline deep learning models produce fragmented or overly extended predictions. Meanwhile, in Figure 8b, the coherence attribute exhibits a clear false positive at 1.0 s, whereas neither the original seismic amplitude nor any of the deep learning models produce a misjudgment at this location. This observation underscores that deep learning models, when properly designed, can learn to distinguish between true cave reflections and spurious attribute responses. The performance advantage of UKAN-CBAM can be attributed to its synergistic design. The adaptive spline basis functions in the KAN layers flexibly fit the nonlinear characteristics of the bead-string reflections, which often exhibit amplitude and phase variations that are difficult to model with fixed linear kernels. The CBAM modules integrated at the skip connections guide the network to focus on critical spatial locations corresponding to karst caves, effectively suppressing interference from non-target regions such as continuous strong reflectors or fault planes. 5.2. Evaluations on Field Seismic Data from Shunbei Oilfield To comprehensively assess the generalization capability and practicality of the proposed model under complex geological conditions, we further validated it using 3D seismic data from the Shunbei Oilfield in the Tarim Basin. The Shunbei area is known for its deeply buried carbonate reservoirs, where karst caves are the primary storage spaces. The seismic data exhibit strong background noise, weak cave reflections due to deep burial, and interference from overlying thick salt layers, making this a particularly challenging test for any interpretation method. The dataset has dimensions of 200 (time samples) × 600 (crossline) × 565 (inline), with a temporal sampling interval of 2 ms. Figure 9 presents a 3D seismic data volume at a representative time slice and the corresponding karst cave identification results. Figure 9a shows the original seismic amplitude slice, where karst caves appear as weak reflection anomalies. Figure 9b shows the coherence attribute slice, where low-coherence values (dark zones) highlight potential cave locations. Figure 9c–f display the identification results obtained by U-Net, ResUNet, UNet-CBAM, and UKAN-CBAM, respectively. Among them, UKAN-CBAM demonstrates superior robustness and recognition accuracy, accurately delineating the main cave zone with clear boundaries, especially in the areas indicated by the red circles and arrows in the figure. In contrast, U-Net, ResUNet, and UNet-CBAM all miss some weak cave reflections and produce a few sporadic false detections. The excellent performance of UKAN-CBAM on field data can be attributed to two factors. First, the learnable spline basis functions in the KAN layers can more flexibly fit the non-stationary and morphologically variable nonlinear reflection patterns of karst caves. Second, the CBAM modules integrated at the skip connections adaptively recalibrate spatial and channel features, thereby enhancing the network’s ability to distinguish weak cave signals from strong background interference. 6. Conclusions This paper proposes UKAN-CBAM, a 3D deep learning network for karst cave identification that synergistically integrates Tok-KAN modules and CBAM into a U-shaped encoder–decoder architecture. The Tok-KAN modules, with learnable spline-based activation functions, effectively capture the complex nonlinear relationships between seismic waveforms and cave geometries, overcoming the limitations of conventional linear convolutions. The CBAM modules embedded in skip connections adaptively recalibrate channel and spatial features, suppressing background noise and sharpening cave boundaries during feature fusion. Extensive experiments were conducted on synthetic data, physical modeling data from the Sichuan Basin, and field seismic data from the Tarim Basin. Quantitative and qualitative comparisons demonstrate that UKAN-CBAM consistently outperforms U-Net, ResUNet, UNet-CBAM, and conventional coherence attributes across multiple metrics including Accuracy, Recall, Dice, and F1 score. The proposed network delineates karst caves with improved spatial continuity and clearer boundaries while significantly reducing false positives and false negatives. Furthermore, UKAN-CBAM exhibits strong generalization capability across different geological settings without additional fine-tuning, as validated on both physical models and real field data. These results confirm that jointly addressing nonlinear response modeling and noisy skip-connection propagation is key to advancing automated cave interpretation. Nevertheless, the proposed UKAN-CBAM method has several limitations. The synthetic training data are generated using a reflectivity convolution approximation rather than full waveform modeling. While this approach is computationally efficient, it does not account for key wave propagation effects such as multiples, diffractions, attenuation, and mode conversions, which may affect generalization to field data. Additionally, the field data validation lacks well log control. Moreover, our analysis highlights the inherent limitations of the UKAN-CBAM architecture compared to other networks. Specifically, we now note that: (1) it incurs higher computational cost and GPU memory usage due to the Tok-KAN and CBAM modules; (2) its high expressive power increases the risk of overfitting on small datasets; and (3) it is more sensitive to hyperparameter tuning (e.g., B-spline grid size). Future work will address both the data-related and architectural limitations by incorporating full-waveform simulations, well-based quantitative validation, and extension to fault and channel detection, as well as exploring lightweight KAN variants, adaptive regularization, and multi-scale fusion. Author Contributions Conceptualization, B.Y.; methodology, H.G.; software, R.P.; validation, H.G.; formal analysis, H.G. and B.Y.; resources, B.Y.; data curation, Y.W.; writing—original draft preparation, H.G.; writing—review and editing, B.Y.; visualization, Y.W. All authors have read and agreed to the published version of the manuscript. Funding This research was funded by National Natural Science Foundation, grant number 42564005, and Provincial Key Research and Development Plan of Xinjiang Uygur Autonomous Region, grant number 2024B01016. Institutional Review Board Statement Not applicable. Informed Consent Statement Not applicable. Data Availability Statement The raw field seismic data underlying the conclusions of this study are available from the authors upon reasonable request. Acknowledgments Thanks to Wu et al. for assisting with the methods of generating synthetic datasets, which greatly helped this study and significantly contributed to the intelligent interpretation of seismic data. Conflicts of Interest The authors declare no conflicts of interest. References Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Share and Cite MDPI and ACS Style Yan, B.; Gao, H.; Pan, R.; Wang, Y. 3D Karst Cave Identification Using UKAN-CBAM in Seismic Images of Fractured-Vuggy Reservoir. Appl. Sci. 2026, 16, 5765. https://doi.org/10.3390/app16125765 AMA Style Yan B, Gao H, Pan R, Wang Y. 3D Karst Cave Identification Using UKAN-CBAM in Seismic Images of Fractured-Vuggy Reservoir. Applied Sciences. 2026; 16(12):5765. https://doi.org/10.3390/app16125765 Chicago/Turabian Style Yan, Binpeng, Haobo Gao, Rui Pan, and Yongliang Wang. 2026. "3D Karst Cave Identification Using UKAN-CBAM in Seismic Images of Fractured-Vuggy Reservoir" Applied Sciences 16, no. 12: 5765. https://doi.org/10.3390/app16125765 APA Style Yan, B., Gao, H., Pan, R., & Wang, Y. (2026). 3D Karst Cave Identification Using UKAN-CBAM in Seismic Images of Fractured-Vuggy Reservoir. Applied Sciences, 16(12), 5765. https://doi.org/10.3390/app16125765 Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details . Article Metrics Article metric data becomes available approximately 24 hours after publication online.

www.mdpi.com

Zum Originalartikel