Fault diagnosis of diesel engines is a critical task in the operation and maintenance of complex equipment. Diesel engine fault diagnosis technology based on deep learning has seen widespread development due to its powerful feature learning and fault classification capabilities. However, traditional data-driven deep learning models cannot explicitly uncover relationships between signals, which hinders better fault information capture. Therefore, this paper proposes a diesel-engine valve-clearance fault diagnosis method driven by a combination of knowledge and data. Firstly, the original signals are converted into graph data with a topological structure based on the spatiotemporal relationships of events occurring within the cylinder, thereby uncovering the intrinsic structural information of the samples. Then, the graph structure is input into a graph convolutional attention network to extract features and learn fault patterns. Valve fault experiments were conducted on a diesel engine test bench, and the results indicate that the proposed knowledge and data-driven deep learning fault diagnosis model achieves better diagnostic performance and clearer interpretability compared to traditional data-driven deep learning fault diagnosis models, and it still has a relatively high accuracy in a diagnostic environment with scarce data. 1. Introduction Currently, deep learning (DL) algorithms have seen exponential growth in their use for mechanical fault diagnosis research over the past decade due to their high accuracy, efficient data representation, and automatic feature extraction and selection [ 7, 8, 9]. Unlike traditional machine learning, DL algorithms stack multiple layers to express complex objective functions, uncover intrinsic relationships between variables, and improve model generalization performance [ 10]. DL algorithms, represented by various neural networks, have powerful feature extraction capabilities, allowing for automatic representation learning from large datasets and strong adaptability [ 11]. Deep Belief Networks (DBN), one of the classic DL models [ 12, 13], have been applied early on in fault diagnosis [ 14, 15]. DBN can extract effective fault features from feature sets constructed for specific tasks. Unlike traditional shallow machine learning algorithms, DBN can learn high-level feature representations from large amounts of data [ 16, 17]. By directly utilizing raw signals or spectra, end-to-end intelligent diagnostic models can be established, reducing reliance on expert experience and knowledge. Early studies such as Deep Belief Networks (DBN) were capable of automatically extracting features. However, their black-box nature and the limitation of being unable to explicitly model the relationship between signals led to the exploration of structured modeling methods like graph neural networks. While these DNN-based methods effectively capture hidden features of conventional data (e.g., images, time series), most methods overlook the interdependencies between data from multiple sensors or various physical measurements [ 23, 24]. In recent years, graph neural networks (GNN) have gained significant attention from researchers due to their ability to establish associations between data [ 25]. By aggregating information from node neighbors at any depth, GNN can more effectively extract and infer data relationships [ 26]. For instance, Li et al. used Horizontal Visibility Graph (HVG) and GNN to propose a new bearing fault diagnosis model. The HVG algorithm converts time series samples into graphs with specific conditional topologies, providing additional valuable information for classification compared to pure numerical data, demonstrating that GNN models outperform RNN models in bearing fault diagnosis [ 27]. Graph Convolutional Networks (GCN) [ 28, 29, 30], a variant of GNN, use associative graphs to establish data associations, speeding up training and improving model performance. For instance, Zhou [ 31]. proposed a GCN-based fault diagnosis method for rotating machinery using multi-sensor data. To diagnose wind turbine gearbox issues, Yu [ 32] proposed a Fast Deep Graph Convolutional Network (FDGCN), which efficiently and adaptively learns discriminative fault features from initial graph inputs, and then uses these learned features to classify relevant fault types. Zhang [ 33] applied Deep Convolutional Graph Neural Networks (DGCN) to diagnose acoustic faults of roller bearings. However, most existing GNN-based fault diagnosis methods construct graphs mainly from data-driven relationships, such as sample similarity, sensor correlation, visibility rules, or generic graph construction strategies including KNNGraph, RadiusGraph, and PathGraph [ 34, 35]. Xiao et al. proposed a GNN-based bearing fault detection method in which graph nodes and edges were constructed according to sample similarity, demonstrating the effectiveness of similarity-based graph modeling for bearing fault diagnosis [ 35]. Li et al. systematically reviewed GNN-based fault diagnosis methods and pointed out that most existing graph construction strategies rely on data similarity, sensor relationships, or predefined generic graph structures rather than explicit physical event mechanisms [ 36]. These methods have shown effectiveness in representing non-Euclidean relationships in mechanical signals. Nevertheless, such graph structures are usually determined by statistical relationships among samples or sensors. Under small-sample conditions, similarity estimation may be unstable, and the obtained graph topology may not always correspond to the actual physical mechanism of the monitored system. To address the above issues, this paper proposes a prior-knowledge-guided Graph Attention Network (GAT) method for valve-clearance fault diagnosis. In the proposed method, the cylinder-head vibration signal is first synchronized in the crank-angle domain. Key event-centered vibration segments within one working cycle are then extracted as graph nodes, and directed edges are constructed according to the physical occurrence sequence of these events. In this way, the working-cycle event process is embedded into the graph topology. The constructed event-driven graph reduces the dependence on data-driven similarity estimation under small-sample conditions, preserves the physical correspondence between vibration segments and valve-related events, and provides a basis for interpreting the learned attention weights. The main contributions of this work are summarized as follows: A prior-knowledge-guided directed graph is constructed by treating event-centered vibration windows as graph nodes and connecting them according to the physical occurrence sequence of the working cycle. This design preserves the physical correspondence between vibration segments and valve-related events. A GAT-based diagnostic model is developed to learn discriminative graph-level representations from the constructed event-driven graph samples. The learned attention weights are further visualized, providing an interpretable basis for analyzing event-to-event relationships related to valve-clearance faults. Fault simulation experiments are conducted on an eight-cylinder diesel engine test bench under full-load conditions. The results verify that the proposed event-driven graph modeling strategy achieves effective small-sample diagnostic performance and provides clearer physical interpretability. 3. The Proposed Diesel Fault Diagnosis Method Based on a GAT 3.1. Overview The proposed framework for diesel engine valve-clearance fault diagnosis, driven by data and knowledge fusion based on GAT, is illustrated in Figure 2. It mainly comprises two stages: graph construction based on prior knowledge and fault diagnosis based on GAT. Firstly, according to the working mechanism of the diesel engine cylinder, the original vibration signal is segmented according to the event occurrence intervals, converting the Euclidean structure data (original vibration time-domain signal) into non-Euclidean structure data (graph) that contains the prior knowledge of the diesel engine cylinder. In the second stage, the constructed graph is used as the input to the GAT model to diagnose the health status of the diesel engine valve clearance. 3.2. Constructing Affinity Graphs from Time Series Based on Prior Knowledge 3.2.1. Analysis of Diesel Engine Vibration Signal Characteristics To capture the impact-related vibration induced by valve events, a triaxial piezoelectric accelerometer (model 1A313E, Donghua Test) was mounted on the cylinder head close to the monitored cylinder. The sensor has a measurement range of ±100 g and a frequency response of 0.5–7000 Hz (±10%). The calibrated sensitivities at 160 Hz are approximately 4.0 mV/(m/s 2) on all three axes. The sensor is powered by a constant-current source of 2–20 mA (18–30 VDC), with a DC bias voltage of 8–12 V, and is installed through an M5 threaded mounting hole to ensure reliable mechanical coupling. The collected continuous vibration signal was subsequently segmented according to the event-driven strategy described in this study. To achieve the node division and structural construction of the graph, this study analyzed the valve phase data of diesel engine cylinder block A1, as shown in Table 1. Taking a complete working cycle as a reference, the sequence of key events of this cylinder is as follows: Based on the phase sequence of the above-mentioned cylinder actions, a complete working cycle can be divided into five main physical events [ 54]: Exhaust valve opening: Towards the end of the power stroke, the exhaust valve opens in advance, allowing the high-temperature and high-pressure exhaust gas after combustion to start being discharged, providing favorable initial conditions for the subsequent exhaust stroke, while reducing the pressure inside the cylinder and alleviating the upward resistance of the piston. Intake valve opening: When the exhaust stroke is not yet completed, the intake valve opens in advance, forming a valve overlap area with the exhaust valve. This helps to utilize the negative pressure generated by the exhaust gas flowing out to draw fresh air into the cylinder, thereby enhancing the ventilation efficiency. Exhaust valve closure: After the intake stroke begins, the exhaust valve closes with a delay, further enhancing the complete discharge of exhaust gases and, in conjunction with the intake process, creating a more efficient air replacement effect, thereby improving the intake quality. Intake valve closure: The intake valve closes when the intake stroke is completed and the compression stroke is about to begin. This moment is slightly later than the bottom dead center. The delayed closure can increase the air intake volume in the cylinder by taking advantage of the intake inertia, thereby improving the compression efficiency and combustion effect. Ignition: Before the end of the compression stroke, the fuel self-ignites through high-pressure fuel injection, generating high-temperature and high-pressure gas that pushes the piston to do work. This is a key node for energy conversion in the entire four-stroke cycle, marking the beginning of the power stroke. 3.2.2. The Construction of Graph Based on Prior Knowledge First, we sample the original signal over a working cycle period, dividing it into numerous fixed-length samples. For each sample, we then segment it according to the events of ignition, exhaust valve opening, intake valve opening, exhaust valve closing, and intake valve closing. For each cycle-based sample X , five event-centered signal segments are extracted according to the key physical events, as shown in the following equation: X = x 1 1 x 2 1 … x m 1 x 1 2 x 2 2 … x m 2 … … … … x 1 i x 2 i … x m i (9) Here, x represents the original sampling points, m is the sample length, and i is the event index, x i denotes the signal segment corresponding to the i-th physical event, and i = 1, 2, …, 5 represents ignition, exhaust valve opening, intake valve opening, exhaust valve closing, and intake valve closing, respectively. Next, min-max normalization is used to preprocess the original signals from different events separately. x i n o l = x i − x min x max − x min , i = 1 , 2 , … , m (10) Here, x m a x and x m i n are the maximum and minimum values of the corresponding signal segment, x i nol is the normalized value, and m is the sample length. Next, in this study, each sliding window signal segment is regarded as a node in the graph, and the signals contained in the nodes retain the essential characteristics of the event. Then, in the sequence of events occurring within the cylinder, the directed edges between the nodes are constructed: ignition → exhaust valve opening → intake valve opening → exhaust valve closing → intake valve closing. In this way, the independent samples of the Euclidean space are constructed as a graph with a knowledge topology structure. The graph can be defined as G V , E , where V = v 1 , v 2 , v 3 … v n represents a set of vertices, and E represents a set of edges connecting the vertices.The process of constructing the prior-knowledge-guided graph is shown in Figure 3. 3.3. Fault Diagnosis Based on the GAT To achieve efficient identification of abnormal valve clearance states in diesel engines, this paper designs and constructs a structured fault diagnosis model based on Graph Attention Network (GAT). This model introduces the mechanism knowledge of the intake and exhaust system to construct a graph structure, models the dependency relationship between vibration signal fragments in a topology-aware manner, and strengthens the feature aggregation ability of key nodes through the attention mechanism, thereby achieving more robust classification performance under the condition of small sample. The overall architecture of the model is shown in Figure 4, which integrates the feature extraction module, the feature compression module and the fault classification module. (1) Feature extraction module In the feature extraction stage, the model introduces a two-layer graph attention convolution structure (GATConv) to mine the structural associations between nodes layer by layer. The first-layer GAT receives the initial features of the nodes and adaptively allocates the weight coefficients of adjacent nodes through the attention mechanism to achieve effective encoding of local topological relationships. The nonlinear modeling ability is enhanced by integrating the batch normalization (BatchNorm) and the Leaky ReLU activation function. The second layer of GAT further extracts the high-order structure information on the basis of the previous layer. Meanwhile, it enhances the expression and transmission ability of the deep features of the model through the residual connection mechanism to alleviate the problem of vanishing gradient. After the node-level features are encoded through graph convolution, the global mean pooling operation is adopted to compress the graph representation vectors of all nodes into a unified graph level feature vector, achieving the structural representation of the entire signal sample. This graph-level feature, as the core input for fault discrimination, is passed to the subsequent fully connected layer for classification decision-making. Calculate the attention coefficient α i j of node i and its neighbor node j for each layer of GAT using Formula (7). The model uses a total of two GAT layers, and its output is: H ( 1 ) = G A T 1 ( H ( 0 ) , A ) , H ( 2 ) = G A T 2 ( H ( 1 ) , A ) (11) Finally, all node features are aggregated into a graph-level representation vector through the average pooling method: h g r a p h = 1 N ∑ i = 1 N h i ( 2 ) (12) In the formula, h i ( 2 ) denotes the representation of node i after the second GAT layer, N is the number of graph nodes, and h g r a p h is the graph-level representation obtained by global mean pooling. (2) Feature Compression module In the feature compression stage, the graph-level representation first inputs a two-layer perceptron (MLP) module for processing: The first layer of linear mapping compresses the node features to a low dimension and is activated by ReLU. The second layer of linear mapping maps the feature dimensions to the number of categories and introduces dropout for feature perturbation and regularization. z 1 = Re L U ( W 1 h g r a p h + b 1 ) , z 1 ′ = D r o p o u t ( z 1 ) (13) z 2 = W 2 z 1 ′ + b 2 (14) where z 1 and z 2 denote the outputs of the first and second fully connected layers, respectively, W 1 , W 2 , b 1 and b 2 are learnable parameters. (3) Fault Classification module In the fault classification stage, the graph representation vector is sent to the softmax layer to output the predicted probability, achieving the classification and judgment of the valve clearance state, including three states: “normal clearance”, “abnormal intake valve clearance”, and “abnormal exhaust valve clearance”. During the training process, the cross-entropy loss function is adopted as the optimization objective, and the parameters are updated in combination with the Adam optimizer y ˇ = softmax z 2 , y ˇ c = exp ( z 2 , c ) ∑ k = 1 C exp ( z 2 , k ) (15) In the formula, y ˇ c represents the predicted probability that the sample belongs to category c and C denotes the number of classes. 4. Experimental Validation 4.1. Experimental Setup The main equipment of the test bench is an 8-cylinder V-type diesel engine with a firing sequence of 1-8-4-5-7-3-6-2. Acceleration sensors are used to record the vibration acceleration. Figure 5 shows a simplified schematic diagram of the test bench. The experimental steps are as follows: Firstly, in the cold state of the unit, that is, when it has not been started, disassemble the intake and exhaust pipes, cylinder head, fuel injection pipe, return oil pipe, and fault setting cylinder head in sequence. Then loosen the adjusting bolt, quantitatively set it at the intake and exhaust valve gap position of the A4 cylinder with a feeler gauge, and then tighten the adjusting bolt. In this study, the intake and exhaust valve clearances were artificially increased. According to the maintenance manual and engineering practice of this type of diesel engine, the normal valve clearance range (cold state) is 0.25 ± 0.02 mm. When the clearance exceeds 0.27 mm, it is considered abnormal and may lead to performance degradation. In this study, we defined 0.25 mm as the normal baseline, and artificially increased the clearance to 0.30 mm to simulate a typical abnormal condition caused by valve wear or improper adjustment. At a speed of 1500 rpm under 100% load, performance tests were conducted at 0.25 mm (normal) and 0.3 mm (abnormal), respectively. The vibration signals of the diesel engine’s cylinder head were obtained. In addition, the experiment was implemented in Python 3.10 with PyTorch 2.9.1 and PyTorch Geometric 2.7.0, using a computer with an AMD Ryzen 9 3900X CPU, an NVIDIA GeForce RTX 3060 GPU, and 8 GB of operating memory. 4.2. Data Description The total number of samples in the dataset is 3000, with 1000 samples for each health condition. The original vibration signals were synchronized according to the crank-angle reference and then segmented into cycle-based graph samples. Each graph sample was constructed by extracting five non-overlapping event-centered signal windows corresponding to the key valve-related physical events. Specifically, when both the intake valve clearance and exhaust valve clearance were 0.25 mm, the engine was regarded as being in the normal condition and the corresponding samples were marked as label 0. When the intake valve clearance was 0.30 mm and the exhaust valve clearance was 0.25 mm, the samples represented an intake valve clearance fault and were marked as label 1. When the intake valve clearance was 0.25 mm and the exhaust valve clearance was 0.30 mm, the samples represented an exhaust valve clearance fault and were marked as label 2. In the initial setup, 50 samples were randomly selected from each category to construct the training set, and 400 samples were randomly selected from each category as the testing set for model evaluation. The specific sample combinations are shown in Table 2. This split was designed to evaluate the diagnostic performance of the proposed method under small-sample training conditions. The relatively large testing set was used to provide a more stable evaluation of the model performance. The training and testing sets were generated after graph-sample construction, and no identical graph sample was included in both subsets. It should be noted that the split in this study was performed at the graph-sample level under the same operating condition, rather than strictly at the independent acquisition-run level. Therefore, although there was no direct duplication between the training and testing samples, cycle samples collected under the same operating condition may still exhibit temporal correlation. 4.3. Model HyperParameters Setting The component graph sample is that the feature length within the node is F = 256, the number of nodes is n = 5, and the number of attention heads of the GAT diagnostic model is K = 1. The number of convolutional layers of the graph attention is 2. The specific network layers and structure are shown in Table 3. GATConv and BatchNorm1d are respectively the graph attention convolutional layer and the batch normalization layer, linear is the fully connected layer, and ReLU is the activation function. The hyperparameters during training are shown in Table 4. 4.4. Fault Classification Results The model was trained under the hyperparameter settings described in Section 4.3. Figure 6 presents the evolution of the training loss and the test accuracy within 100 epochs. During the early stage of optimization, the loss decreases rapidly and then gradually stabilizes. After approximately 20 epochs, the test accuracy stabilizes at around 0.98, indicating that the proposed GAT model can complete high-quality feature learning with good convergence and training efficiency. The prediction results are shown in the confusion matrix as illustrated in Figure 7. It can be seen from the figure that the prediction accuracy of the model on each type of sample is relatively high, with an accuracy of over 97%. Among them, category 0 (with normal valve clearance) and category 2 (exhaust valve clearance fault) are classified almost accurately, demonstrating the robustness of the model to edge-class samples. Category 1 (intake valve clearance is relatively large) has a small amount of confusion with adjacent categories, which may be due to the existence of certain ambiguous intervals in the temporal structure characteristics of such samples. To evaluate the statistical stability of the proposed method, repeated experiments were further conducted using 10 different random seeds. In each run, the random seed controlled the model initialization, sample selection, and mini-batch order. For each health condition, 50 samples were randomly selected for training and 400 samples were randomly selected for testing. Thus, the repeated experiments considered both random initialization and different sample selections. The repeated experimental results are shown in Table 5. The proposed method achieved a mean best test accuracy of 97.45% ± 0.98% and a mean best macro-F1 score of 97.44% ± 0.98% over 10 runs. The 95% confidence intervals were ±0.61% for both accuracy and macro-F1. The average convergence epoch was 10.2, where convergence was defined as the first epoch at which the test accuracy reached 95% of the best test accuracy in the corresponding run. These results indicate that the proposed method maintains stable diagnostic performance under different random initializations and sample selections. Considering that diesel engines often face a diagnostic environment with scarce data in actual operation, this paper further tests the classification performance of the GAT model under different training sample sizes. The specific experimental settings are as follows: On the basis of ensuring the balance of the three types of data, five training set scales with the number of samples of each type being 60, 50, 40, 30 and 20 are constructed respectively, and the same test set is used for model evaluation. The experimental results are shown in Table 6 and Figure 8. As the number of samples gradually decreases, although the accuracy of the model fluctuates slightly, the overall downward trend is slow and still remains at a relatively high level. Among them, when trained with only 20 samples per category (a total of 60 samples), the model still achieved a test accuracy of 94.58%, demonstrating good small-sample learning capability. Overall, the GAT model maintains a high recognition accuracy while taking into account the fine discrimination ability among the three types of states, and is suitable for the health status discrimination of multi-state complex devices. 6. Conclusions This paper proposes a prior-knowledge-guided Graph Attention Network method for diesel-engine valve-clearance fault diagnosis. The proposed method starts from the working mechanism of the diesel-engine intake and exhaust system and constructs an event-driven graph representation from cylinder-head vibration signals. Specifically, the vibration signal is synchronized in the crank-angle domain, and key event-centered vibration segments are extracted as graph nodes. Directed edges are then constructed according to the physical occurrence sequence of valve-related events within one working cycle, so that the graph topology explicitly reflects the event evolution process of the valve train. Based on the constructed event-driven graph samples, a GAT-based diagnostic model is developed to learn graph-level fault representations and classify three valve-clearance states: normal clearance, intake valve clearance fault, and exhaust valve clearance fault. The experimental results under full-load test-bench conditions show that the proposed method achieves high diagnostic accuracy and fast convergence under small-sample conditions. The comparison with CNN and LSTM baselines demonstrates the benefit of introducing graph-structured event representations, while the ablation experiments further indicate that the performance improvement mainly comes from the incorporation of mechanism knowledge into graph construction. The attention-weight visualization also provides an interpretable basis for analyzing event-to-event relationships related to valve-clearance faults. It should be noted that the experimental validation in this study was conducted under a controlled test-bench condition, with a fixed rotational speed, fixed load, fixed sensor installation position, and a single experimental platform. This setting helps reduce the interference caused by operating-condition variations and allows the proposed method to focus on discriminating valve-clearance-related vibration characteristics. However, it also limits the demonstrated generalization capability of the model. Therefore, the reported results should be interpreted as within-condition diagnostic performance rather than evidence of full robustness under varying industrial operating conditions. In future work, split-by-run validation, leave-one-recording-out validation, variable-speed and variable-load tests, different sensor positions, and cross-condition transfer validation will be further investigated to more comprehensively evaluate the generalization capability of the proposed method. Author Contributions M.L.: conceptualized the study, conducted the experiments, performed data analysis, and drafted the manuscript; J.W.: supervised the research and provided academic guidance X.Y.: provided resources and revised the manuscript; Y.H.: contributed to investigation and critically revised the manuscript; X.L.: contributed to data curation and analysis; Z.S.: assisted in experimental design and resources. All authors have read and agreed to the published version of the manuscript. Funding This work was supported in part by the National Key R&D Program of China (2022YFB3306301). Institutional Review Board Statement Not applicable. Informed Consent Statement Not applicable. Data Availability Statement Data are available from the corresponding author upon reasonable request. Conflicts of Interest The authors declare no conflicts of interest. References Zhan, X.; Bai, H.; Yan, H.; Wang, R.; Guo, C.; Jia, X. Diesel Engine Fault Diagnosis Method Based on Optimized VMD and Improved CNN. Processes 2022, 10, 2162. [ Google Scholar] [ CrossRef] Myung, C.L.; Choi, K.H.; Hwang, I.G.; Lee, K.H.; Park, S. Effects of valve timing and intake flow motion control on combustion and time-resolved HC & NOx formation characteristics. Int. J. Automot. Technol. 2009, 10, 161–166. [ Google Scholar] [ CrossRef] Karamangil, M.I.; Avci, A.; Bilal, H. Investigation of the effect of different carbon film thickness on the exhaust valve. Heat Mass Transf. 2008, 44, 587–598. [ Google Scholar] [ CrossRef] Forsberg, P.; Hollman, P.; Jacobson, S. Wear mechanism study of exhaust valve system in modern heavy duty combustion engines. Wear 2011, 271, 2477–2484. [ Google Scholar] [ CrossRef] Tharanga, K.L.P.; Liu, S.; Zhang, S.; Wang, Y. Diesel Engine Fault Diagnosis with Vibration Signal. J. Appl. Math. Phys. 2020, 8, 2031–2042. [ Google Scholar] [ CrossRef] Flett, J.; Bone, G.M. Fault detection and diagnosis of diesel engine valve trains. Mech. Syst. Signal Process. 2016, 72–73, 316–327. [ Google Scholar] [ CrossRef] El Aziz Ahmed, E.A.; Ibrahim, R.A.; Abdelsalam, A.K. A Comparative Analysis for Machine Learning-based Short-Term Load Forecasting Techniques. In Proceedings of the 2023 IEEE 6th International Electrical and Energy Conference (CIEEC), Hefei, China, 12–14 May 2023; pp. 1166–1171. [ Google Scholar] [ CrossRef] Mushtaq, S.; Islam, M.M.M.; Sohaib, M. Deep Learning Aided Data-Driven Fault Diagnosis of Rotatory Machine: A Comprehensive Review. Energies 2021, 14, 5150. [ Google Scholar] [ CrossRef] Yang, Y.; Haque, M.M.M.; Bai, D.; Tang, W. Fault Diagnosis of Electric Motors Using Deep Learning Algorithms and Its Application: A Review. Energies 2021, 14, 7017. [ Google Scholar] [ CrossRef] Attallah, O.; Ibrahim, R.A.; Zakzouk, N.E. A lightweight deep learning framework for transformer fault diagnosis in smart grids using multiple scale CNN features. Sci. Rep. 2025, 15, 14505. [ Google Scholar] [ CrossRef] Yuan, B.; Li, Y.; Chen, S. Efficient Gearbox Fault Diagnosis Based on Improved Multi-Scale CNN with Lightweight Convolutional Attention. Sensors 2025, 25, 2636. [ Google Scholar] [ CrossRef] [ PubMed] Hinton, G.E.; Osindero, S.; Teh, Y.-W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [ Google Scholar] [ CrossRef] Chen, Z.; Li, W. Multisensor Feature Fusion for Bearing Fault Diagnosis Using Sparse Autoencoder and Deep Belief Network. IEEE Trans. Instrum. Meas. 2017, 66, 1693–1702. [ Google Scholar] [ CrossRef] Shao, H.; Jiang, H.; Zhang, X.; Niu, M. Rolling bearing fault diagnosis using an optimization deep belief network. Meas. Sci. Technol. 2015, 26, 115002. [ Google Scholar] [ CrossRef] Qin, S.J. Survey on data-driven industrial process monitoring and diagnosis. Annu. Rev. Control 2012, 36, 220–234. [ Google Scholar] [ CrossRef] Jia, F.; Lei, Y.; Lin, J.; Zhou, X.; Lu, N. Deep Neural Networks: A Promising Tool for Fault Characteristic Mining and Intelligent Diagnosis of Rotating Machinery with Massive Data. Mech. Syst. Signal Process. 2016, 72–73, 303–315. [ Google Scholar] [ CrossRef] Tamilselvan, P.; Wang, P. Failure diagnosis using deep belief learning based health state classification. Reliab. Eng. Syst. Saf. 2013, 115, 124–135. [ Google Scholar] [ CrossRef] Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [ Google Scholar] [ CrossRef] Zhang, W.; Peng, G.; Li, C.; Chen, Y.; Zhang, Z. A New Deep Learning Model for Fault Diagnosis with Good Anti-Noise and Domain Adaptation Ability on Raw Vibration Signals. Sensors 2017, 17, 425. [ Google Scholar] [ CrossRef] [ PubMed] Wei, Q.; Tian, X.; Cui, L.; Zheng, F.; Liu, L. WSAFormer-DFFN: A model for rotating machinery fault diagnosis using 1D window-based multi-head self-attention and deep feature fusion network. Eng. Appl. Artif. Intell. 2023, 124, 106633. [ Google Scholar] [ CrossRef] Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [ Google Scholar] [ CrossRef] Randall, R.B.; Antoni, J. Rolling element bearing diagnostics—A tutorial. Mech. Syst. Signal Process. 2011, 25, 485–520. [ Google Scholar] [ CrossRef] Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [ Google Scholar] [ CrossRef] Patil, C.; Theotokatos, G.; Tsitsilonis, K. Data-driven model for marine engine fault diagnosis using in-cylinder pressure signals. J. Mar. Eng. Technol. 2025, 24, 70–82. [ Google Scholar] [ CrossRef] Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric Deep Learning: Going beyond Euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [ Google Scholar] [ CrossRef] Wang, C.; Wang, Y.; Wang, Y.; Li, X.; Chen, Z. Richly connected spatial–temporal graph neural network for rotating machinery fault diagnosis with multi-sensor information fusion. Mech. Syst. Signal Process. 2025, 225, 112230. [ Google Scholar] [ CrossRef] Chen, Z.; Zeng, X.; Li, W.; Liao, G. Machine fault classification using deep belief network. In Proceedings of the 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings, Taipei, Taiwan, 23–26 May 2016; pp. 1–6. [ Google Scholar] [ CrossRef] Zhang, L.; Zhou, F.; Duan, P.; Yuan, X. Fault diagnosis of mobile robot based on dual-graph convolutional network with prior fault knowledge. Adv. Eng. Inform. 2024, 62, 102865. [ Google Scholar] [ CrossRef] Li, T.; Zhao, Z.; Sun, C.; Yan, R.; Chen, X. Multireceptive Field Graph Convolutional Networks for Machine Fault Diagnosis. IEEE Trans. Ind. Electron. 2021, 68, 12739–12749. [ Google Scholar] [ CrossRef] Jia, M.; Liu, Y.; Xu, D.; Yang, T.; Yao, Y. Topology-Informed Graph Convolutional Network for Fault Diagnosis. In Proceedings of the 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS), Chengdu, China, 3–5 August 2022; pp. 595–599. [ Google Scholar] [ CrossRef] Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [ Google Scholar] [ CrossRef] Yu, X.; Tang, B.; Zhang, K. Fault Diagnosis of Wind Turbine Gearbox Using a Novel Method of Fast Deep Graph Convolutional Networks. IEEE Trans. Instrum. Meas. 2021, 70, 6502714. [ Google Scholar] [ CrossRef] Zhang, D.; Stewart, E.; Entezami, M.; Roberts, C.; Yu, D. Intelligent acoustic-based fault diagnosis of roller bearings using a deep graph convolutional network. Measurement 2020, 156, 107585. [ Google Scholar] [ CrossRef] Gao, Y.; Zhong, Z.; Ma, M.; Zhang, Z.; Zhang, Y.; Wang, C.; Wang, Z. Physics-Embedded Recurrent Graph Neural Network for Fault Diagnosis of Complex Systems. IEEE Access 2024, 12, 122426–122436. [ Google Scholar] [ CrossRef] Xiao, L.; Yang, X.; Yang, X. A graph neural network-based bearing fault detection method. Sci. Rep. 2023, 13, 5286. [ Google Scholar] [ CrossRef] [ PubMed] Li, T.; Zhou, Z.; Li, S.; Sun, C.; Yan, R.; Chen, X. The emerging graph neural networks for intelligent fault diagnostics and prognostics: A guideline and a benchmark study. Mech. Syst. Signal Process. 2022, 168, 108653. [ Google Scholar] [ CrossRef] Wu, W.; Song, C.; Zhao, J.; Xu, Z. Physics-Informed Gated Recurrent Graph Attention Unit Network for Anomaly Detection in Industrial Cyber-Physical Systems. Inf. Sci. 2023, 629, 618–633. [ Google Scholar] [ CrossRef] Liu, J.; Chen, S.; Cai, M.; Shao, H.; Gui, W. Semi-Heterogeneous Graph-Perception Network With Gradient-Weighted Class Activation Mapping for Class-Incremental Industrial Fault Recognition and Root Cause Diagnosis. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 16825–16839. [ Google Scholar] [ CrossRef] Tao, L.; Liu, H.; Zhang, J.; Su, X.; Li, S.; Hao, J.; Lu, C.; Suo, M.; Wang, C. Associated Fault Diagnosis of Power Supply Systems Based on Graph Matching: A Knowledge and Data Fusion Approach. Mathematics 2022, 10, 4306. [ Google Scholar] [ CrossRef] Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [ Google Scholar] [ CrossRef] Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [ Google Scholar] [ CrossRef] Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. arXiv 2017. [ Google Scholar] [ CrossRef] Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv 2016, arXiv:1502.03044. [ Google Scholar] [ CrossRef] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [ Google Scholar] Yang, X.; Bi, F.; Cheng, J.; Tang, D.; Shen, P.; Bi, X. A Multiple Attention Convolutional Neural Networks for Diesel Engine Fault Diagnosis. Sensors 2024, 24, 2708. [ Google Scholar] [ CrossRef] Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [ Google Scholar] [ CrossRef] Jin, C.; Zhao, W.; Liu, Z.; Lee, J.; He, X. A vibration-based approach for diesel engine fault diagnosis. In Proceedings of the 2014 International Conference on Prognostics and Health Management, Cheney, WA, USA, 22–25 June 2014; pp. 1–9. [ Google Scholar] [ CrossRef] Hu, B.; Liu, J.; Liu, S.; Li, B.; Lei, X. Simultaneous multi-parameter identification algorithm for clearance-type nonlinearity. Mech. Syst. Signal Process. 2020, 139, 106423. [ Google Scholar] [ CrossRef] Fan, S.; Cai, Y.; Zhang, Z.; Wang, J.; Shi, Y.; Li, X. Adaptive Convolution Sparse Filtering Method for the Fault Diagnosis of an Engine Timing Gearbox. Sensors 2023, 24, 169. [ Google Scholar] [ CrossRef] Hu, Z.; Lv, C.; Hang, P.; Huang, C.; Xing, Y. Data-Driven Estimation of Driver Attention Using Calibration-Free Eye Gaze and Scene Features. IEEE Trans. Ind. Electron. 2022, 69, 1800–1808. [ Google Scholar] [ CrossRef] Badawi, B.; Shahin, M.; Kolosy, M.; Shedied, S.; Elmaihy, A. Identification of Diesel Engine Cycle Events using Measured Surface Vibration. In Small Engine Technology Conference & Exposition; SAE International: San Antonio, TX, USA, 2006. [ Google Scholar] [ CrossRef] Elamin, F.; Fan, Y.; Gu, F.; Ball, A. Diesel Engine Valve Clearance Detection Using Acoustic Emission. Adv. Mech. Eng. 2010, 2, 495741. [ Google Scholar] [ CrossRef] Jiang, Z.; Mao, Z.; Wang, Z.; Zhang, J. Fault Diagnosis of Internal Combustion Engine Valve Clearance Using the Impact Commencement Detection Method. Sensors 2017, 17, 2916. [ Google Scholar] [ CrossRef] Patel, S.; Torgal, S.; Purohit, T.; Kumar, R.; Singh, D.V.; Kanchan, S.; Soudagar, M.E.M.; Ahamad, T.; Kalam, M.; Patel, M. Impact of variable exhaust valve timing on diesel engine characteristics fueled with waste cooking oil biofuel blends: A numerical analysis. Proc. Inst. Mech. Eng. Part E J. Process Mech. Eng. 2025, 239, 1329–1352. [ Google Scholar] [ CrossRef] Dixit, S.; Verma, N.K. Intelligent Condition-Based Monitoring of Rotary Machines With Few Samples. IEEE Sens. J. 2020, 20, 14337–14346. [ Google Scholar] [ CrossRef] Wang, Z.; Luo, Q.; Chen, H.; Zhao, J.; Yao, L.; Zhang, J.; Chu, F. A high-accuracy intelligent fault diagnosis method for aero-engine bearings with limited samples. Comput. Ind. 2024, 159–160, 104099. [ Google Scholar] [ CrossRef] Zhong, B.; Zhao, M.; Wang, L.; Fu, S.; Zhong, S. DCSN: Focusing on hard samples mining in small-sample fault diagnosis of marine engine. Measurement 2024, 235, 114929. [ Google Scholar] [ CrossRef] Figure 1. Schematic of the attention mechanism. Figure 1. Schematic of the attention mechanism. Figure 2. Overview of the proposed method. Figure 2. Overview of the proposed method. Figure 3. Schematic diagram for the construction of graphs based on prior knowledge. Figure 3. Schematic diagram for the construction of graphs based on prior knowledge. Figure 4. Overall architecture of the model. Figure 4. Overall architecture of the model. Figure 5. Experimental bench system. Figure 5. Experimental bench system. Figure 6. Training loss and test set accuracy curve. Figure 6. Training loss and test set accuracy curve. Figure 7. Confusion matrix. Figure 7. Confusion matrix. Figure 8. Confusion matrices with different numbers of training samples. Figure 8. Confusion matrices with different numbers of training samples. Figure 9. ( a) Accuracy curves of different graph construction strategies; ( b) Loss curves of different graph construction strategies. Figure 9. ( a) Accuracy curves of different graph construction strategies; ( b) Loss curves of different graph construction strategies. Figure 10. ( a) Test accuracy curves of the prior-knowledge-guided GAT, CNN, and LSTM; ( b) Loss curves of the prior-knowledge-guided GAT, CNN, and LSTM. Figure 10. ( a) Test accuracy curves of the prior-knowledge-guided GAT, CNN, and LSTM; ( b) Loss curves of the prior-knowledge-guided GAT, CNN, and LSTM. Figure 11. Attention heatmaps under different fault conditions: ( a) intake valve fault; ( b) exhaust valve fault. (Node 0: Ignition, Node 1: Exhaust Valve Open, Node 2: Intake Valve Open, Node 3: Exhaust Valve Close, Node 4: Intake Valve Close). Figure 11. Attention heatmaps under different fault conditions: ( a) intake valve fault; ( b) exhaust valve fault. (Node 0: Ignition, Node 1: Exhaust Valve Open, Node 2: Intake Valve Open, Node 3: Exhaust Valve Close, Node 4: Intake Valve Close). Table 1. Action phase table of A1 cylinder. Table 1. Action phase table of A1 cylinder. Cylinder Top Dead Center/Deg Exhaust Valve Open/Deg Intake Valve Open/Deg Exhaust Valve Close/Deg Intake Valve Close/Deg Spark Plug Ignition/ Deg A1 0 122 326 386 580 713 Table 2. Sample composition used in the initial experiment. Table 2. Sample composition used in the initial experiment. Name Value Complete dataset size 3000 Number of training samples 150 Number of testing samples 1200 Number of training samples for each type 50 Number of testing samples for each type 400 Table 3. GAT structure and parameters. Table 3. GAT structure and parameters. Layer Name Input Shape Output Shape GATConv [5, 256] [5, 512] BatchNorm1d [5, 512] [5, 512] GATConv [5, 512] [5, 512] BatchNorm1d [5, 512] [5, 512] Readout [5, 512] [1, 512] Linear + ReLU [1, 512] [1, 256] Dropout [1, 256] [1, 256] Linear [1, 256] [1, 3] Softmax [1, 3] [1, 3] Table 4. Hyperparameters in GAT. Table 4. Hyperparameters in GAT. Parameter Configuration Batch size 32 Train epoch 100 Learning rate ୧ ୍ଠ ୧୦ −4Optimizer Adam Loss function Cross-entropy Table 5. Repeated experimental results under different random seeds. Table 5. Repeated experimental results under different random seeds. Experiment Accuracy F1 Loss Convergence Epoch 1 0.9750 0.9749 0.0716 9 2 0.9650 0.9649 0.0987 7 3 0.9783 0.9783 0.0587 11 4 0.9850 0.9850 0.0515 11 5 0.9700 0.9699 0.0742 15 6 0.9917 0.9917 0.0360 9 7 0.9667 0.9666 0.1237 10 8 0.9817 0.9817 0.0621 9 9 0.9600 0.9598 0.1085 7 10 0.9717 0.9716 0.0721 14 Average 0.9745 0.9744 0.0757 10.2 Table 6. The test accuracy of the GAT model under different small sample sizes. Table 6. The test accuracy of the GAT model under different small sample sizes. Samples Per Class Total Number of Training Samples Test Accuracy 60 180 0.9833 50 150 0.9800 40 120 0.9733 30 90 0.9617 20 60 0.9458 Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. © 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. MDPI and ACS Style Li, M.; Wen, J.; Yang, X.; Hu, Y.; Li, X.; Shi, Z. Prior-Knowledge-Guided Graph Attention Network for Fault Diagnosis of Engine Valve Clearance. Sensors 2026, 26, 3565. https://doi.org/10.3390/s26113565 AMA Style Li M, Wen J, Yang X, Hu Y, Li X, Shi Z. Prior-Knowledge-Guided Graph Attention Network for Fault Diagnosis of Engine Valve Clearance. Sensors. 2026; 26(11):3565. https://doi.org/10.3390/s26113565 Chicago/Turabian Style Li, Mingyu, Jingqian Wen, Xiaonan Yang, Yaoguang Hu, Xinlong Li, and Zhongjie Shi. 2026. "Prior-Knowledge-Guided Graph Attention Network for Fault Diagnosis of Engine Valve Clearance" Sensors 26, no. 11: 3565. https://doi.org/10.3390/s26113565 APA Style Li, M., Wen, J., Yang, X., Hu, Y., Li, X., & Shi, Z. (2026). Prior-Knowledge-Guided Graph Attention Network for Fault Diagnosis of Engine Valve Clearance. Sensors, 26(11), 3565. https://doi.org/10.3390/s26113565 3. The Proposed Diesel Fault Diagnosis Method Based on a GAT 3.1. Overview The proposed framework for diesel engine valve-clearance fault diagnosis, driven by data and knowledge fusion based on GAT, is illustrated in Figure 2. It mainly comprises two stages: graph construction based on prior knowledge and fault diagnosis based on GAT. Firstly, according to the working mechanism of the diesel engine cylinder, the original vibration signal is segmented according to the event occurrence intervals, converting the Euclidean structure data (original vibration time-domain signal) into non-Euclidean structure data (graph) that contains the prior knowledge of the diesel engine cylinder. In the second stage, the constructed graph is used as the input to the GAT model to diagnose the health status of the diesel engine valve clearance. 3.2. Constructing Affinity Graphs from Time Series Based on Prior Knowledge 3.2.1. Analysis of Diesel Engine Vibration Signal Characteristics To capture the impact-related vibration induced by valve events, a triaxial piezoelectric accelerometer (model 1A313E, Donghua Test) was mounted on the cylinder head close to the monitored cylinder. The sensor has a measurement range of ±100 g and a frequency response of 0.5–7000 Hz (±10%). The calibrated sensitivities at 160 Hz are approximately 4.0 mV/(m/s 2) on all three axes. The sensor is powered by a constant-current source of 2–20 mA (18–30 VDC), with a DC bias voltage of 8–12 V, and is installed through an M5 threaded mounting hole to ensure reliable mechanical coupling. The collected continuous vibration signal was subsequently segmented according to the event-driven strategy described in this study. To achieve the node division and structural construction of the graph, this study analyzed the valve phase data of diesel engine cylinder block A1, as shown in Table 1. Taking a complete working cycle as a reference, the sequence of key events of this cylinder is as follows: Based on the phase sequence of the above-mentioned cylinder actions, a complete working cycle can be divided into five main physical events [ 54]: Exhaust valve opening: Towards the end of the power stroke, the exhaust valve opens in advance, allowing the high-temperature and high-pressure exhaust gas after combustion to start being discharged, providing favorable initial conditions for the subsequent exhaust stroke, while reducing the pressure inside the cylinder and alleviating the upward resistance of the piston. Intake valve opening: When the exhaust stroke is not yet completed, the intake valve opens in advance, forming a valve overlap area with the exhaust valve. This helps to utilize the negative pressure generated by the exhaust gas flowing out to draw fresh air into the cylinder, thereby enhancing the ventilation efficiency. Exhaust valve closure: After the intake stroke begins, the exhaust valve closes with a delay, further enhancing the complete discharge of exhaust gases and, in conjunction with the intake process, creating a more efficient air replacement effect, thereby improving the intake quality. Intake valve closure: The intake valve closes when the intake stroke is completed and the compression stroke is about to begin. This moment is slightly later than the bottom dead center. The delayed closure can increase the air intake volume in the cylinder by taking advantage of the intake inertia, thereby improving the compression efficiency and combustion effect. Ignition: Before the end of the compression stroke, the fuel self-ignites through high-pressure fuel injection, generating high-temperature and high-pressure g