Abstract To address the significant performance degradation of traditional direction-of-arrival (DOA) estimation algorithms under low-signal-to-noise ratio (SNR) conditions, this paper proposes a deep learning-based super-resolution DOA estimation network (DSDE-Net). Unlike classification modeling constrained by angular discretization, this approach formulates the DOA estimation problem as a regression task, thereby effectively improving angular estimation accuracy. DSDE-Net extracts features from both the spatial and temporal dimensions of the signal. A residual denoising module is incorporated during spatial feature extraction to suppress noise interference in covariance features. Building upon this foundation, a dynamic gated feature fusion network is designed. By learning dimension-wise fusion weights, it achieves an adaptive weighted combination of spatiotemporal features, enabling the network to dynamically adjust the contribution of various features under different SNR conditions. Experimental results demonstrate that compared to traditional algorithms and existing deep learning methods, it exhibits the smallest estimation error and overall standard deviation, delivering superior estimation accuracy and robustness in low-SNR environments. 1. Introduction Direction-of-arrival (DOA) estimation refers to the process of determining the direction from which a signal originates using data collected by multiple receiving antennas in an array. It is a core problem in array signal processing and has extensive applications in radar, sonar, communications, electronic countermeasures, and speech processing [ 1, 2, 3]. In recent years, with the rapid advancement of Advanced Driver Assistance Systems (ADASs) and autonomous driving technologies, the importance of in-vehicle perception systems has become increasingly prominent. Modern ADAS platforms typically integrate a variety of sensors, including cameras, optical sensors, LiDAR, and millimeter-wave radar, as illustrated in Figure 1. Among these, radar plays a crucial role in mainstream ADAS functions such as Adaptive Cruise Control (ACC), forward collision warning, lane change assist, and obstacle avoidance assist, as it can directly and accurately measure the distance, relative velocity, and azimuth of multiple targets [ 4, 5]. Although cameras and lidar systems offer high-resolution target imaging, their performance is susceptible to complex weather conditions such as fog, rain, and snow, limiting their environmental adaptability [ 6]. Additionally, while cameras can capture a wealth of visual information, they cannot directly provide data on an object’s distance, speed, or direction. Such information typically requires subsequent algorithms to infer. In contrast, millimeter-wave radar maintains stable operation even in harsh environments, offering advantages such as lower cost and high reliability. Owing to these characteristics, it is expected to continue playing an irreplaceable role in both existing ADASs and future autonomous driving systems [ 7]. In the evolution of DOA estimation methods, subspace-based approaches were among the earliest widely adopted techniques. Representative algorithms include the Multiple Signal Classification (MUSIC) [ 8] algorithm and the Estimating Signal Parameter via Rotational Invariance Techniques (ESPRIT) [ 9] algorithm. These methods, with their high-resolution characteristics, have significantly advanced the theoretical development of DOA estimation. As research deepens, various improved forms of subspace-based methods have emerged, such as root-MUSIC [ 10], LS-ESPRIT [ 11], and ROOT-MUSIC-HDAPA [ 12], further enhancing estimation accuracy. In addition, the maximum likelihood (ML) [ 13, 14, 15] estimation algorithm is another important method in DOA estimation. Reference [ 16] proposed a computationally efficient method for joint angle and distance estimation in a dual-base system based on the subspace principle. Reference [ 17] first used maximum likelihood estimation to estimate the target amplitude and structure, and then used the projected gradient descent method to estimate the target angle coordinates. Compared to subspace-based methods, ML algorithms maintain relatively stable angle measurement performance under adverse conditions such as signal coherence and limited snapshot counts, while offering superior super-resolution capabilities. However, their primary drawback lies in high computational complexity, typically requiring multidimensional searches that struggle to meet real-time requirements. With the emergence of compressed sensing [ 18] (CS) and sparse reconstruction, spatial spectral estimation algorithms [ 19] based on sparse reconstruction have gradually gained prominence. These methods can effectively exploit the sparsity of signals in the angular domain to achieve high-resolution DOA estimation. However, they require multiple optimization iterations to achieve highly accurate estimates, leading to high computational complexity. In recent years, the continuous advancement of deep learning theory and methods has opened new avenues for research in DOA estimation [ 20, 21, 22]. By learning the mapping rules between array data and signal angles, data-driven approaches not only eliminate the reliance of traditional algorithms on precise physical models but also demonstrate superior adaptability in complex and dynamic environments. Papageorgiou [ 23] applied convolutional neural networks to DOA estimation, improving estimation accuracy. Alizadeh et al. [ 24] introduced a channel attention mechanism into the residual network architecture, enhancing the network’s robustness to covariance-matrix features under conditions of interference and array errors. Elbir [ 25] combined music with deep learning to enhance the estimation performance, while Kase [ 26] effectively improved the model’s estimation accuracy by constructing training samples tailored to the targeted application scenarios. Barthelme [ 27] combined neural networks with gradient updates to the likelihood function to improve performance under low-sampling-rate conditions. Zheng et al. [ 28] enhanced performance in multi-source scenarios by integrating classification and regression. Although the aforementioned methods perform well at high SNR, their performance often deteriorates sharply in low-SNR environments. Lin et al. [ 29] proposed a dual-branch neural network that combines classification and regression in parallel, which significantly improves the model’s accuracy under defect conditions. To address this issue, this paper proposes a feature-enhanced dual-branch network specifically designed for low-SNR environments. The network extracts spatial and temporal features separately, incorporates a covariance residual denoising mechanism, and employs a dynamic gating fusion strategy. These enhancements effectively improve angular measurement stability and estimation accuracy under noisy interference conditions. The main contributions of this paper are as follows: Modeling DOA estimation as a regression task rather than a multi-label classification problem. The network output directly corresponds to the signal’s angle of incidence, fundamentally overcoming the accuracy limitations of grid partitioning and enabling truly mesh-free estimation. Proposing a denoising module based on residual learning. This module reconstructs the input vector covariance into a two-dimensional matrix, employs a residual learning strategy to predict noise components, and achieves effective denoising via subtraction operations. Simultaneously, global residual connections are introduced to prevent excessive denoising from degrading array structural information, thereby enhancing feature fidelity. Designing a dynamic gating fusion network. By constructing a gating fusion module, it adaptively integrates temporal and spatial signal features. This module generates candidate feature representations through a nonlinear enhancement branch and utilizes a gating network to learn dimension-wise fusion weights. This enables adaptive weighted combinations of different features, enhancing the model’s representational capabilities across diverse scenarios. 2. Signal Model Consider a uniform linear array (ULA) composed of N array elements, where the spacing between the elements is d = 0.5 m . Suppose K incoherent far-field narrowband signal sources impinge on the array, with angles of incidence θ 1 , θ 2 , … , θ K . The received signal can then be expressed as y ( t ) = A ( θ ) s ( t ) + n ( t ) , t = 1 , … , L (1) where L is the number of snapshots, and n ( t ) represents the noise. The signal source vector and the array manifold matrix are respectively defined as s ( t ) = [ s 1 ( t ) , s 2 ( t ) , … , s K ( t ) ] T (2) A ( θ ) = [ a ( θ 1 ) , … , a ( θ K ) ] ∈ ℂ N ୍ଠ K (3) Here, a ( θ K ) is the steering vector corresponding to the signal incident from direction θ K , specifically given by: a ( θ K ) = [ 1 , e j 2 π d λ sin ( θ K ) , … , e j 2 π d λ sin ( θ K ) ( N − 1 ) ] T (4) The covariance matrix of the received signal can be expressed as R y = E [ y ( t ) y H ( t ) ] (5) Assuming the noise n ( t ) is zero-mean complex white Gaussian noise uncorrelated with the signals, with noise power σ n 2 , the covariance matrix can be written as R y = A ( θ ) R s A H ( θ ) + σ n 2 I N (6) where R s is the signal covariance matrix, and I N is the N ୍ଠ N identity matrix. In practical applications, the covariance matrix is usually estimated from a finite number of snapshots. The sample covariance matrix is defined as R ^ y = 1 L ∑ t = 1 L y ( t ) y H ( t ) (7) 3. Network Models and Training Methods To address the significant degradation in DOA estimation accuracy under low-SNR conditions observed in existing methods, this paper proposes a deep learning-based super-resolution DOA estimation network, termed DSDE-Net, to enhance both performance and robustness in noisy environments. This model formulates DOA estimation as a regression task, directly outputting continuous angle values. DSDE-Net employs a dual-branch architecture to extract features from both the spatial and temporal domains. Inspired by the Denoising Convolutional Neural Network (DnCNN), this network incorporates a residual denoising module (RDM) to effectively separate noise from the signal. The Space Correlation Extraction Module (SCEM) takes the covariance matrix as input and incorporates a residual learning mechanism to predict noise components, achieving adaptive denoising via subtraction. Additionally, skip connections are introduced to capture global features while preserving fine-grained structural information of the original signal. Meanwhile, the Time Correlation Extraction Module (TCEM) processes the raw signal and utilizes LSTM networks to capture temporal dependencies. To further enhance feature discriminability, a dynamic gated fusion mechanism was designed to adaptively weight and integrate dual-branch features. The final angle estimate is produced by a multi-layer fully connected mapping. Figure 2 illustrates the overall architecture of the proposed network. 3.1. SCEM Architecture To enhance the network’s ability to model array spatial correlation features under conditions of low SNR and finite snapshots, this paper introduces a residual denoising module based on 2D convolution into the spatial information extraction network. The specific structure is shown in Figure 3. This module takes the covariance matrix as input and constructs a channel to obtain X 1 : X 1 ∈ ℝ B ୍ଠ 1 ୍ଠ N ୍ଠ N (8) Unlike traditional methods that directly learn the denoised signal, this module adopts a residual learning strategy. It uses a lightweight multi-layer convolutional network to model the noise component within the covariance matrix. The forward propagation process can be expressed as H 1 = Re LU ( Conv 2 D 1 → 64 ( X 1 ) ) (9) H 2 = Re LU ( Conv 2 D 64 → 64 ( H 1 ) ) (10) H 3 = Re LU ( Conv 2 D 64 → 64 ( H 2 ) ) (11) N pred = Conv 2 D 64 → 1 ( H 3 ) (12) Here, H 1 , H 2 , and H 3 denote the intermediate feature maps after the first, second, and third convolutional layers, respectively. Specifically, Equation (9) uses 64 filters of size 3 ୍ଠ 3 to transform the single-channel input into a 64-channel feature map H 1 . Equation (10) applies another 64 filters to H 1 , yielding H 2 with the same spatial dimensions, which further abstracts noise-related patterns. Equation (11) repeats this operation to generate H 3 , which is then fed into a final 1 ୍ଠ 1 convolution to produce the predicted noise residual N p r e d . Adaptive denoising is then achieved via a subtraction operation between the input covariance representation and the estimated noise component: Y 2 D = X 2 D − N pred (13) To prevent excessive denoising from degrading the intrinsic structural information of the array, global residual connections are introduced. The denoised output is combined with the original input through a skip connection, resulting in a higher-fidelity covariance feature representation: Y o u t = Flatten ( Y 2 D ) + R y (14) This design enables the network to suppress spatial spectrum distortions caused by limited snapshots and noise while preserving as many of the effective array spatial features as possible. Consequently, it provides a more accurate and robust feature representation foundation for subsequent DOA estimation. After obtaining the denoised output Y o u t , we further enhance its representational capacity by employing two consecutive convolutional layers. Specifically, we pass Y o u t through two 2D convolutional layers to generate a refined feature map: Y t = ReLU ( BN ( Conv 64 ୍ଠ 64 ( ReLU ( BN ( Conv 64 ୍ଠ 64 ( Y o u t ) ) ) ) ) (15) This additional CNN module extracts higher-level features from the already denoised covariance matrix, further suppressing residual noise and enriching the angular information. Then, a skip connection adds this refined feature map back to the original Y o u t via element-wise addition, producing the final output: Y final = Y o u t + Y t (16) Since the fusion is performed on feature maps with identical dimensions, both structural information and high-level representations are effectively preserved. This design improves the fidelity of covariance features and enhances the model’s ability to discriminate fine angular details. Finally, the enhanced spatial feature map Y final is flattened into a vector: x s = Flatten ( Y final ) (17) This vector x s is then ready to be concatenated or fused with the output of the temporal feature branch, enabling joint spatiotemporal information processing for accurate DOA estimation. 3.2. TCEM Architecture In this module, the input consists of discrete received signals sampled at multiple time instants. To fully exploit the information correlation across all time samples, a Long Short-Term Memory (LSTM) network is employed for temporal modeling of the received data. First, to meet the requirement of real-valued inputs for deep neural networks, the real and imaginary parts of the original complex array observation sequence X 2 ∈ ℂ L ୍ଠ N are concatenated to form a real-valued input sequence: X ˜ = [ ℜ ( X 2 ) ; ℑ ( X 2 ) ] (18) Subsequently, a multi-layer LSTM network is used to capture the short-term temporal correlations in the signal sequence. The feature extraction process at each layer can be expressed as H i = LSTM i ( X ˜ ) (19) where i denotes the LSTM layer index. After the ReLU activation, the resulting feature map is denoted as x t , thereby enhancing the feature representation capability. Through this design, the LSTM effectively models short-term dependencies in the signal sequence, thereby improving the representation power of temporal features. 3.3. Dynamic Gating Fusion Network To achieve efficient collaborative modeling of temporal and spatial correlation features, this section designs a dynamic gated feature fusion network. The specific structure is illustrated in Figure 4. This module takes the concatenated vector of two feature streams as input, denoted as x = [ x t ; x s ] ∈ ℝ D (20) where x t and x s represent the time-domain and space-domain features, respectively. First, the input features are mapped through a nonlinear transformation branch to generate a candidate enhanced representation: x ˜ = f ( x ) (21) Here, f ( ⋅ ) denotes a feature enhancement mapping composed of multiple fully connected layers and nonlinear activation functions, which is used to reshape and compensate the original fused features. Subsequently, a dynamic gating branch is constructed to jointly model the original features and enhanced features, and the per-dimensional gating coefficients are learned via the Sigmoid function: g = σ ( h ( [ x ; x ˜ ] ) ) (22) where h ( ⋅ ) denotes the gating network, and g ∈ ( 0 , 1 ) D is a gating coefficient. Based on these gating coefficients, the fused features are obtained by element-wise weighting: z = g ⊙ x + ( 1 − g ) ⊙ x ˜ (23) This enables the network to adaptively adjust the contribution ratios of temporal and spatial features across different dimensions, depending on conditions such as varying SNR. To highlight discriminative information more sensitive to DOA estimation, this paper introduces a vector-level Squeeze-and-Excitation (SE) attention mechanism on the fusion output to recalibrate the importance of feature dimensions. This mechanism is implemented through the following three steps: z c = 1 D ∑ i = 1 D z i (24) a = ReLU ( W 1 z c + b 1 ) , a ∈ ℝ D / r (25) s = σ ( W 2 ⋅ a + b 2 ) , s ∈ ℝ D (26) z ′ = s ⊙ z (27) Here, Z c denotes the compressed feature after global average pooling. The variable a is the output of the first fully connected layer, representing the intermediate feature vector after linear transformation and ReLU activation, with a dimension of D / r . The variable r represents the compression ratio, and after a linear transformation, the dimension is reduced. W 1 , W 2 are learnable weight matrices. The bias terms b 1 and b 2 are learnable parameters that provide independent offsets to the neurons in each fully connected layer. They enhance the model’s expressiveness, prevent the neurons from producing zero output in the absence of input signals, and help the activation functions operate more effectively. Subsequently, the second fully connected layer maps a back to D dimensions, and the sigmoid function generates the weight vector s . Finally, the original features are recalibrated via element-wise multiplication z ′ = s ⊙ z , thereby suppressing redundant and noisy dimensions while enhancing the responses of features critical for DOA estimation. To match the input dimension required by the subsequent regression head, the SE-enhanced features z ′ are passed through a dimension projection layer—a linear transformation that maps the D-dimensional features to the desired projected dimension D P . This operation is expressed as z p = W p z ′ + b p , z p ∈ ℝ D p (28) Here, W p is a learnable weight matrix, and b p is a bias term. This layer performs only a linear mapping without introducing nonlinear activation, thereby preserving the original scale information of the features. The projected features are then fed into the final angle regression network. Through the above dynamic gated fusion and attention reweighting strategy, the complementarity of spatiotemporal features is fully exploited, while the robustness of the fused features is significantly improved. This provides a more stable and discriminative feature representation for the subsequent angle regression network. 3.4. Dataset and Network Training This paper employs a uniform linear array with N = 12 elements and L = 10 snapshots for experiments. The signal wavelength is λ = 1 m , and the element spacing is half-wavelength 0.5 m . A two-source scenario is considered, where the angles of incidence of the signal sources range from −60° to 60 ∘ . Since DSDE-Net belongs to gridless DOA estimation, the signal angles must take continuous values within the search range, resulting in an enormous potential sample space. To reduce the complexity of sample generation, this paper adopts the following data generation strategy: let the angular interval between the two sources be Δ θ , where Δ θ ∈ [ 1 ∘ , 15 ∘ ] . The SNR range is set to [ − 10 dB , 10 dB ] , with an interval of 4 dB, giving a total of 6 SNR conditions. Under each SNR condition, we generate 500,000 training samples. Thus, we obtain a dataset containing D = 500,000 ୍ଠ 6 = 3,000,000 samples. The detailed data generation process is described in Algorithm 1. Algorithm 1: Generation of Training Samples Input:the number of array elements N , the number of snapshots L , array definition d , S N R = { − 10 , − 6 , − 2 , 2 , 6 , 10 } dB , angle range [ − 60 ∘ , 60 ∘ ] , angle separation Δ θ ∈ [ 1 ∘ , 15 ∘ ] Output:Snapshot data 1,covariance feature data 2, DOA labels θ = [ θ 1 , θ 2 ] . Initiation:Initialize the random seed and set array parameters N , d , L and number of sources K ; Select one SNR value from the predefined SNR set; Uniformly sample the angle separation Δ θ from [ 1 ∘ , 15 ∘ ] ; Randomly generate the first source angle under the angle constraint: θ 1 ∼ U ( − 60 ∘ , 60 ∘ − Δ θ ) ; Compute the second source angle: θ 2 = θ 1 + Δ θ ; Sort the two angles in ascending order to obtain the DOA label: θ = [ θ 1 , θ 2 ] ; Independently generate the amplitude and initial phase for each source: a i ∼ U ( 0.5 , 1.5 ) , ϕ i ∼ U ( 0 , 2 π ) ; Construct the complex source signals: s i = a i e j ϕ i ; Construct the array steering matrix based on DOA labels: A = [ a ( θ 1 ) , a ( θ 2 ) ] ; Generate complex Gaussian white noise according to the current SNR: N w = 1 10 S N R / 20 ( N r + j N i ) , where N r and N i follow standard normal distributions; Construct the noisy received signal: X = A S + N w ; Normalize the received signal using max-absolute normalization: X n o r m = X max ( X , ε ) , ε = 10 − 8 ; Save the normalized snapshot data: d a t a 1 = X n o r m ; Compute the sample covariance matrix from the normalized data: d a t a 2 = 1 L X n o r m X n o r m H ; After traversing all SNR conditions, merge the generated data and save as: { d a t a 1 , d a t e 2 , θ } . In terms of specific network parameters, the SCEM first uses three convolutional denoising modules for denoising, then applies two consecutive skip convolutional layers for feature enhancement, and finally outputs spatial information, while the TCEM adopts an LSTM encoding architecture. The LSTM part consists of five layers, with the number of hidden units being 32, 64, 128, 64, and 32 respectively. All data are randomly split into a training set and a validation set at an 8:2 ratio. During training, the Adam optimizer is used with a batch size of 128, combined with a cosine annealing learning rate schedule. The initial learning rate is set to 0.0001, which was determined through grid search within the range [ 10 − 5 , 10 − 3 ] , and the minimum learning rate to 1 ୍ଠ 10 − 6 , with the learning rate smoothly decaying from the initial value to the minimum according to a cosine function as training epochs progress. An early stopping strategy is employed: training is terminated when the validation loss shows no improvement for 20 consecutive epochs, and the best model is saved simultaneously. Experiments are performed on a Windows 11 system. The model is built using Python 3.12 and PyTorch 2.3.0 (CUDA 13.1), and training and testing are completed on an NVIDIA GeForce RTX 4090 GPU. 4. Simulation Experiments and Results Analysis 4.1. DOA Estimation Error Analysis This section compares the DSDE-Net algorithm with several representative approaches, including traditional model-based methods (such as MUSIC, ESPRIT and MVDR algorithms) and typical data-driven methods (such as CNNs [ 23]). The training data generation process for the CNN follows the methodology described in [ 23]. To ensure fairness and reproducibility, all data-driven methods are trained and tested on datasets generated with identical parameters. Specifically, the training data are constructed using a unified data generation pipeline: the angle range is set to [ − 60 ∘ , 60 ∘ ] , the angular separation between the two sources is randomly sampled within [ 1 ∘ , 15 ∘ ] , and the SNR range is [ − 10 dB , 10 dB ] , with an equal number of samples allocated to each SNR level. All methods share the same array configuration and the same noise model. Furthermore, both the CNN and DSDE-Net adopt identical data partitioning schemes and evaluation metrics during model training. With these settings, it is guaranteed that all algorithms are evaluated under consistent signal environments during both training and performance assessment, thereby avoiding any influence from external condition discrepancies. During the testing phase, all DOA estimation methods operate on signals with 10 snapshots, with two incident sources set at a fixed angular separation of 3 ∘ . The incident angle of Target 1 varies from − 50.5 ∘ to 50.5 ∘ in steps of 1 ∘ , and the incident angle of Target 2 accordingly varies from − 47.5 ∘ to 53.5 ∘ , resulting in a total of 101 pairs of test samples. All evaluations are conducted at an SNR of 5 dB. The DOA estimation results of various methods are illustrated in Figure 5. Performance is quantitatively assessed using several metrics, including the range of maximum and minimum errors, mean absolute error (MAE), and overall standard deviation (STD). Detailed results are summarized in Table 1. Experimental results indicate that the performance of traditional algorithms deteriorates significantly at low SNR. The combination of low SNR and small angular separation leads to the loss or distortion of spatial spectrum peaks, which will seriously reduce the estimation accuracy of MUSIC and MVDR algorithms. Similarly, the ESPRIT algorithm struggles to effectively separate the corresponding eigenvalues under such conditions, resulting in distortions in signal subspace estimation and consequently large estimation errors. To ensure the consistency of evaluation criteria among different algorithms, the optimal pairing strategy is still employed to calculate the errors of the estimated results. In contrast to traditional approaches, deep learning-based methods exhibit superior performance, achieving substantially higher angular estimation accuracy. From the error analysis, DSDE-Net achieves correct estimation on all test samples, with the lowest mean MAE and STD, indicating that the network not only has the highest estimation accuracy but also produces the most stable results. Although the estimation error of the CNN was controlled within the range of [ − 3.5 ∘ , 3.5 ∘ ] , and it achieved correct estimation with a large number of samples, the results were highly fluctuating and its stability was much lower than that of DSDE-Net. By comparison, MUSIC, ESPRIT and MVDR all cannot guarantee that the estimations are completely accurate. These traditional methods exhibit obvious estimation errors, rendering them incapable of meeting the requirements for practical applications. To further evaluate the performance of DSDE-Net, a second set of experiments was conducted under more challenging conditions, with the SNR fixed at 1 dB and the source separation set to 4.2 ∘ . The angular range of Source 1 remained unchanged, while the range of Source 2 was accordingly adjusted to [−46.3°, 54.2°]. The DOA estimation results of all methods are shown in Figure 6, and the detailed error metrics are listed in Table 2. Similar to the first set of experiments, neither the CNN nor the traditional algorithms achieved reliable estimation, and only DSDE-Net succeeded in producing correct estimates, further demonstrating its excellent performance under low-SNR conditions. In summary, DSDE-Net achieves the most compact error distribution, the smallest mean MAE, and the lowest standard deviation under low-SNR conditions, fully demonstrating its strong robustness. Traditional methods such as MUSIC, ESPRIT and MVDR are limited by noise and the number of snapshots, often producing large errors that deviate significantly from the true values, leading to extremely wide error distributions that are unsuitable for practical engineering needs. Although the CNN method outperforms traditional algorithms, its error distribution range and dispersion are still significantly higher than those of DSDE-Net, further verifying the superior performance of DSDE-Net under low-SNR conditions. 4.2. Statistical Performance Analysis of DOA Estimation To evaluate the statistical performance of the DSDE-Net algorithm for DOA estimation, Monte Carlo simulations are conducted to compare it with traditional methods, including MUSIC, ESPRIT, MVDR, and CNNs. The Root Mean Square Error (RMSE) is adopted as the performance metric, defined as R M S E = 1 M K ∑ m = 1 M ∑ k = 1 K ( θ ^ k ( m ) − θ k ( m ) ) 2 (29) where K denotes the number of signal sources, M is the number of test samples, and θ ^ k ( m ) and θ k ( m ) represent the estimated and true values of the target DOA, respectively. In the simulation experiment, the incident angles of the two signal sources are set to θ 1 = 3.18 ∘ and θ 2 = 5.58 ∘ , and the sample covariance estimation is also performed using 10 snapshots. The SNR conditions are configured as − 10 dB , − 5 dB , 0 dB , 5 dB , 10 dB , resulting in five scenarios. For each SNR condition, 1000 independent experiments are conducted to compute the RMSE, and the Cramér–Rao Lower Bound (CRLB) is provided as a performance benchmark. The results are illustrated in Figure 7. As the SNR gradually increases from − 10 dB to 10 dB , the DOA estimation accuracy of traditional methods continuously improves. However, in low-SNR regions, their performance remains significantly constrained, exhibiting substantial errors that hinder stable and precise estimation. Due to its reliance on discrete angle grids, the CNN method exhibits a plateau in performance improvement after the SNR reaches 5 dB . In contrast, DSDE-Net adopts a regression framework for angle estimation, which frees its accuracy from the limitations of grid resolution. It achieves the smallest RMSE, and its estimation accuracy continuously improves with increasing SNR. Compared with the CNN, DSDE-Net not only attains higher estimation accuracy, but also demonstrates stronger adaptability over a wider range of SNR conditions. We further conducted a second set of experiments to compare the RMSE performance under different angular separations. In the experiments, the SNR was fixed at 3 dB , and the first source angle was set to θ 1 = − 2.62 ∘ . The second source angle θ 2 was set to − 1.62 ∘ , − 0.62 ∘ , 2.38 ∘ , 5.38 ∘ , and 7.38 ∘ , respectively, corresponding to angular separations of Δ θ = [ 1 ∘ , 2 ∘ , 5 ∘ , 8 ∘ , 10 ∘ ] . Each point was evaluated using 1000 Monte Carlo simulations to compute the RMSE, and the detailed experimental results are shown in Figure 8. The experimental results indicate that when the angular separation Δ θ is small, the estimation performance of traditional methods faces severe challenges, whereas deep learning-based methods exhibit significant performance advantages, with DSDE-Net outperforming the CNN method. As the angular separation gradually increases, the RMSE of traditional methods decreases markedly. The CNN method, which takes only the covariance matrix as input, is heavily affected by low SNR and shows limited sensitivity to changes in angular separation. DSDE-Net takes dual-source data as input and achieves effective noise suppression through the residual denoising module. Additionally, its network structure contains skip connections, which can retain the low-level features of the input information, thereby enabling better discrimination of adjacent targets. As a result, this model maintains the optimal RMSE performance throughout the entire angle interval range. 4.3. Computational Complexity In this section, we compare the model parameter scale and computation time of various DOA estimation methods. The experiment was repeated 20 times, with each method performing 5000 estimation operations per run on the same device. The average time per single estimation was calculated, and the detailed results are presented in Table 3. As shown in the results, DSDE-Net achieves a substantially more compact model architecture compared to CNN, with significantly fewer parameters and floating-point operations (FLOPs). Although the inference speed of DSDE-Net is slightly slower than that of CNN, its estimation accuracy and stability are markedly superior. Meanwhile, the overall computational cost of DSDE-Net still meets the real-time requirements of practical array signal processing. For traditional subspace-based methods, ESPRIT eliminates the need for angle search, resulting in considerably faster inference than both MUSIC and MVDR. Nevertheless, its estimation accuracy remains inferior to that of DSDE-Net. Overall, DSDE-Net achieves an effective balance between computational efficiency and estimation accuracy. 4.4. Ablation Experiments To verify the effectiveness of the proposed modules, ablation experiments were conducted in this section. The network proposed in this paper consists of two key modules: the residual denoising module and the dynamic gated fusion module. Four models were set up for comparison: Model 1 (without both modules), Model 2 (with only the residual denoising module), Model 3 (with only the dynamic gated fusion module), and Model 4 (the complete model). All tests were conducted under the conditions of an SNR of 2 dB and an angular separation of 3.6 ∘ between the two signal sources. The evaluation metrics include the angle estimation error distribution range, RMSE, and MAE. Detailed experimental results are presented in Table 4. Experimental results show that, in the absence of both the denoising module and the fusion network, the model exhibits the poorest performance, achieving the highest average RMSE of 0.9022. When the residual denoising module (Model 2) and the dynamic gated fusion module (Model 3) are incorporated individually, the RMSE is significantly improved in both cases, with the average MAE decreasing accordingly. When all modules are integrated into the model, both error metrics drop to their minimum values, indicating the effectiveness of the overall architecture. To further verify the effectiveness of the proposed modules, we conducted a second set of ablation experiments. The experimental conditions were set as follows: S N R = 4 dB and Δ θ = 2.8 ∘ . The detailed results are presented in Table 5. It can be observed that the inclusion of each individual module consistently improves estimation performance, which is in agreement with the findings from the first ablation study. These results demonstrate that the proposed modules are complementary, and their combination produces a significant synergistic effect, thereby enhancing DOA estimation accuracy under low-SNR conditions. 5. Conclusions This paper addresses the performance degradation of DOA estimation under low-SNR conditions by constructing a spatial–temporal feature enhancement network that integrates residual denoising and a dynamic gating mechanism. In the spatial feature extraction stage, a residual denoising module based on two-dimensional convolution is introduced to suppress covariance estimation errors caused by noise interference and limited snapshots, thereby preserving the structural integrity of the array. Building upon this, a dynamic gating fusion network is designed to adaptively integrate spatial and temporal features through learnable weighting, allowing the model to adjust the contribution of each feature according to varying SNR conditions. An attention re-calibration mechanism is also employed to enhance the discriminative information relevant to DOA estimation. Experimental results demonstrate that the proposed method effectively improves estimation accuracy and robustness under low-SNR conditions. Author Contributions Writing—original draft preparation, H.Z. and C.Z.; Writing—review and editing, J.G. and C.Z.; Visualization, H.Z. and C.Z.; Supervision, H.Z. and J.G. All authors have read and agreed to the published version of the manuscript. Funding This work was supported in part by the National Natural Science Foundation of China (Grant No. 62471484). Institutional Review Board Statement Not applicable. Informed Consent Statement Not applicable. Data Availability Statement Data can be obtained by contacting the authors under reasonable request and limitation of use. Conflicts of Interest The authors declare no conflicts of interest. References Wan, L.; Sun, Y.; Sun, L.; Ning, Z.; Rodrigues, J.J.P.C. Deep learning based autonomous vehicle super-resolution DOA estimation for safety driving. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4301–4315. [ Google Scholar] [ CrossRef] Engels, F.; Heidenreich, P.; Zoubir, A.M.; Jondral, F.K.; Wintermantel, M. Advances in automotive radar: A framework on computationally efficient high-resolution frequency estimation. IEEE Signal Process. Mag. 2017, 34, 36–46. [ Google Scholar] [ CrossRef] Feng, R.; Uysal, F.; Aubry, P.; Yarovoy, A. MIMO–monopulse target localisation for automotive radar. IET Radar Sonar Navig. 2018, 12, 1131–1136. [ Google Scholar] [ CrossRef] Milanés, V.; Shladover, S.E.; Spring, J.; Nowakowski, C.; Kawazoe, H.; Nakamura, M. Cooperative adaptive cruise control in real traffic situations. IEEE Trans. Intell. Transp. Syst. 2014, 15, 296–305. [ Google Scholar] [ CrossRef] Wang, J.; Aubry, P.; Yarovoy, A. 3-D short-range imaging with irregular MIMO arrays using NUFFT-based range migration algorithm. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4730–4742. [ Google Scholar] [ CrossRef] Li, J.; Wang, Y.; Huang, Z.; Zheng, J.; Xian, K.; Cao, Z.; Zhang, J. Diffusion-Augmented Depth Prediction with Sparse Annotations. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM 2023), Ottawa, ON, Canada, 29 October–3 November 2023; Association for Computing Machinery: New York, NY, USA; pp. 2865–2876. Wu, Y.; Li, C.; Hou, Y.T.; Lou, W. Real-time DoA estimation for automotive radar. In Proceedings of the 18th European Radar Conference (EuRAD), London, UK, 5–7 April 2022; IEEE: New York, NY, USA, 2022; pp. 437–440. [ Google Scholar] Krim, H.; Viberg, M. Two decades of array signal processing research: The parametric approach. IEEE Signal Process. Mag. 1996, 13, 67–94. [ Google Scholar] [ CrossRef] Roy, R.; Paulraj, A.; Kailath, T. Estimation of signal parameters via rotational invariance techniques—ESPRIT. In Proceedings of MILCOM 1986—IEEE Military Communications Conference, Monterey, CA, USA, 5–9 October 1986; IEEE: New York, NY, USA, 1986; pp. 41–48. [ Google Scholar] Wagner, M.; Park, Y.; Gerstoft, P. Gridless DOA estimation and Root-MUSIC for non-uniform linear arrays. IEEE Trans. Signal Process. 2021, 69, 2144–2157. [ Google Scholar] [ CrossRef] Zhang, Q.; Liu, Y.; Long, X.; Song, K.; He, X.; Ren, X.; Qiu, T. A cyclostationarity based ESPRIT algorithm for DOA estimation of uniform circular array. In Proceedings of the IEEE Statistical Signal Processing Workshop (SSP), Rio de Janeiro, Brazil, 11–14 July 2021; IEEE: New York, NY, USA, 2021; pp. 216–220. [ Google Scholar] Shu, F.; Qin, Y.; Liu, T.; Gui, L.; Zhang, Y.; Li, J.; Han, Z. Low-complexity and high-resolution DOA estimation for hybrid analog and digital massive MIMO receive array. IEEE Trans. Commun. 2018, 66, 2487–2501. [ Google Scholar] [ CrossRef] Wu, T.; Deng, Z.; Hu, X.; Li, A.; Xu, J. DOA estimation of incoherently distributed sources using importance sampling maximum likelihood. J. Syst. Eng. Electron. 2022, 33, 845–855. [ Google Scholar] [ CrossRef] Cheng, C.; Liu, S.; Wu, H.; Zhang, Y. An efficient maximum-likelihood-like algorithm for near-field coherent source localization. IEEE Trans. Antennas Propag. 2022, 70, 6111–6116. [ Google Scholar] [ CrossRef] Barat, M.; Karimi, M.; Masnadi-Shirazi, M.A. High-order maximum likelihood methods for direction of arrival estimation. IEEE Open J. Signal Process. 2021, 2, 359–369. [ Google Scholar] [ CrossRef] Fang, Y.; Zhu, S.; Li, B.X.; Liao, G. Target localization with bistatic MIMO and FDA-MIMO dual-mode radar. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 952–964. [ Google Scholar] [ CrossRef] Xu, Z.; Liu, W.; Chen, X.; Dai, H.; Liu, J.; Chen, H. Distributed target detection in compound-Gaussian clutter under steering vector uncertainty. IEEE Trans. Aerosp. Electron. Syst. 2026, 62, 26–40. [ Google Scholar] [ CrossRef] Uehashi, S.; Ogawa, Y.; Nishimura, T.; Ohgane, T. Prediction of time-varying multi-user MIMO channels based on DOA estimation using compressed sensing. IEEE Trans. Veh. Technol. 2019, 68, 565–577. [ Google Scholar] [ CrossRef] Wang, G.; He, M.; Yu, C.; Han, J.; Chen, C. Fast underdetermined DOA estimation based on generalized MRA via original covariance vector sparse reconstruction. IEEE Access 2021, 9, 66805–66815. [ Google Scholar] [ CrossRef] Fuchs, J.; Gardill, M.; Lübke, M.; Dubey, A.; Lurz, F. A machine learning perspective on automotive radar direction of arrival estimation. IEEE Access 2022, 10, 6775–6797. [ Google Scholar] [ CrossRef] Wu, X.; Yang, X.; Jia, X.; Tian, F. A gridless DOA estimation method based on convolutional neural network with Toeplitz prior. IEEE Signal Process. Lett. 2022, 29, 1247–1251. [ Google Scholar] [ CrossRef] Ahmed, Q.W.M.; Thanthrige, U.S.K.P.M.; Gamal, A.E.; Sezgin, A. Deep learning for DOA estimation in MIMO radar systems via emulation of large antenna arrays. IEEE Commun. Lett. 2021, 25, 1559–1563. [ Google Scholar] [ CrossRef] Papageorgiou, G.K.; Sellathurai, M.; Eldar, Y.C. Deep networks for direction-of-arrival estimation in low SNR. IEEE Trans. Signal Process. 2021, 69, 3714–3729. [ Google Scholar] [ CrossRef] Alizadeh, M.; Chavoshi, M.; Samir, A.; Hegazy, A.M.; Bahri, A.; Basha, M.; Safavi-Naeini, S. Experimental deep learning assisted super-resolution radar imaging. In Proceedings of the 18th European Radar Conference (EuRAD); IEEE: New York, NY, USA, 2022; pp. 153–156. [ Google Scholar] Elbir, A.M. DeepMUSIC: Multiple signal classification via deep learning. IEEE Sens. Lett. 2020, 4, 1–4. [ Google Scholar] [ CrossRef] Kase, Y.; Nishimura, T.; Ohgane, T. DOA estimation of two targets with deep learning. In Proceedings of the 15th Workshop on Positioning, Navigation and Communications (WPNC), Bremen, Germany, 25–26 October 2018; IEEE: New York, NY, USA, 2018; pp. 1–5. [ Google Scholar] Barthelme, A.; Utschick, W. A machine learning approach to DoA estimation and model order selection for antenna arrays with subarray sampling. IEEE Trans. Signal Process. 2021, 69, 3075–3087. [ Google Scholar] [ CrossRef] Zheng, S.; Yang, Z.; Shen, W.; Zhang, L.; Zhu, J.; Zhao, Z.; Yang, X. Deep learning-based DOA estimation. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 819–835. [ Google Scholar] [ CrossRef] Lin, L.; She, C.; Chen, Y.; Guo, Z.; Zeng, X. TB-NET: A Two-Branch Neural Network for Direction of Arrival Estimation under Model Imperfections. Electronics 2022, 11, 220. [ Google Scholar] [ CrossRef] Figure 1. Vehicle radar in autonomous driving. Figure 1. Vehicle radar in autonomous driving. Figure 2. Structure diagram of DSDE-Net. Figure 2. Structure diagram of DSDE-Net. Figure 3. Residual denoising module. Figure 3. Residual denoising module. Figure 4. Dynamic gated fusion network architecture diagram. Figure 4. Dynamic gated fusion network architecture diagram. Figure 5. The first group of experiments.Estimation results: ( a) DSDE-Net, ( c) CNN, ( e) MUSIC, ( g) ESPRIT, ( i) MVDR. Estimation error distribution: ( b) DSDE-Net, ( d) CNN, ( f) MUSIC, ( h) ESPRIT, ( j) MVDR. Figure 5. The first group of experiments.Estimation results: ( a) DSDE-Net, ( c) CNN, ( e) MUSIC, ( g) ESPRIT, ( i) MVDR. Estimation error distribution: ( b) DSDE-Net, ( d) CNN, ( f) MUSIC, ( h) ESPRIT, ( j) MVDR. Figure 6. The second group of experiments. Estimation results: ( a) DSDE-Net, ( c) CNN, ( e) MUSIC, ( g) ESPRIT, ( i) MVDR. Estimation error distribution: ( b) DSDE-Net, ( d) CNN, ( f) MUSIC, ( h) ESPRIT, ( j) MVDR. Figure 6. The second group of experiments. Estimation results: ( a) DSDE-Net, ( c) CNN, ( e) MUSIC, ( g) ESPRIT, ( i) MVDR. Estimation error distribution: ( b) DSDE-Net, ( d) CNN, ( f) MUSIC, ( h) ESPRIT, ( j) MVDR. Figure 7. Comparison of RMSE performance at different SNRs. Figure 7. Comparison of RMSE performance at different SNRs. Figure 8. Comparison of RMSE performance at different angle separations. Figure 8. Comparison of RMSE performance at different angle separations. Table 1. Various indicators of different algorithms in the first group of experiments. Table 1. Various indicators of different algorithms in the first group of experiments. Algorithm Distribution of Errors (°) Mean MAE (°) STD (°) DSDE-Net [−1.5570, 1.4447] 0.3975 0.5103 CNN [−3.5000, 3.5000] 0.7600 0.9600 MUSIC [−69.1612, 94.4760] 6.8815 14.6241 ESPRIT [−58.8517, 39.6534] 2.4651 6.5148 MVDR [−90.9000, 72.1000] 9.3350 17.0580 Table 2. Various indicators of different algorithms in the second group of experiments. Table 2. Various indicators of different algorithms in the second group of experiments. Algorithm Distribution of Errors (°) Mean MAE (°) STD (°) DSDE-Net [−1.8532, 1.9912] 0.6000 0.7269 CNN [−4.5000, 4.7000] 1.0600 1.4500 MUSIC [−94.8451, 96.7935] 11.4306 20.4795 ESPRIT [−106.3264, 89.3145] 3.9048 11.1071 MVDR [−91.9200, 91.9800] 16.1255 24.7031 Table 3. Analysis of model complexity. Table 3. Analysis of model complexity. Method DSDE-Net CNN MUSIC ESPRIT MVDR params 776,782 62,267,257 / / / FLOPs(M) 13.045 113.537 / / / Time (ms) 2.549 0.860 5.420 0.124 4.983 Table 4. The results of the first group of ablation experiments. Table 4. The results of the first group of ablation experiments. Algorithm Distribution of Errors (°) Mean RMSE (°) Mean MAE (°) Model 1 [−2.5205, 2.8831] 0.9022 0.7781 Model 2 [−2.1776, 1.6854] 0.7344 0.5995 Model 3 [−1.9328, 1.8927] 0.7262 0.5842 Model 4 [−1.681, 1.7005] 0.6296 0.5133 Table 5. The results of the second group of ablation experiments. Table 5. The results of the second group of ablation experiments. Algorithm Distribution of Errors (°) Mean RMSE (°) Mean MAE (°) Model 1 [−1.9804, 2.5205] 0.7936 0.6632 Model 2 [−1.7540, 1.3892] 0.5934 0.4606 Model 3 [−1.7488, 1.5433] 0.5991 0.4685 Model 4 [−1.7201, 1.3985] 0.5184 0.4054 Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Share and Cite MDPI and ACS Style Zhao, H.; Gong, J.; Zhou, C. A Deep Learning Approach for Direction-of-Arrival Estimation in Low-SNR Environments. Appl. Sci. 2026, 16, 5747. https://doi.org/10.3390/app16125747 AMA Style Zhao H, Gong J, Zhou C. A Deep Learning Approach for Direction-of-Arrival Estimation in Low-SNR Environments. Applied Sciences. 2026; 16(12):5747. https://doi.org/10.3390/app16125747 Chicago/Turabian Style Zhao, Haiqin, Jian Gong, and Changlin Zhou. 2026. "A Deep Learning Approach for Direction-of-Arrival Estimation in Low-SNR Environments" Applied Sciences 16, no. 12: 5747. https://doi.org/10.3390/app16125747 APA Style Zhao, H., Gong, J., & Zhou, C. (2026). A Deep Learning Approach for Direction-of-Arrival Estimation in Low-SNR Environments. Applied Sciences, 16(12), 5747. https://doi.org/10.3390/app16125747 Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here. Article Metrics Article metric data becomes available approximately 24 hours after publication online.