Abstract The integration of artificial intelligence (AI), machine learning (ML), and computational modeling with experimental catalysis is reshaping materials design and chemical process development. Tailored heterogeneous catalysts including supported metals, zeolites, defect-engineered materials, and multi-element systems exhibit enhanced activity, selectivity, and stability through engineered active sites and porosity. AI and ML approaches enable predictive modeling, high-throughput screening, mechanistic insight, and rational catalyst design by linking synthesis conditions, structural features, and performance metrics across scales. Applications span CO 2 conversion, methane reforming, hydrogen production, polymer recycling, and photocatalysis, with platforms such as PHOTOREAC, QMOF, and PhotoCatDB facilitating the translation from laboratory experiments to reactor-scale processes. Hybrid strategies that combine mechanistic understanding with data-driven models improve interpretability, predictive accuracy, and process optimization. These advances underscore a paradigm shift toward data-driven catalysis, accelerating discovery, supporting sustainable chemical technologies, and emphasizing the role of human expertise in guiding responsible AI deployment. 1. Introduction The global energy crisis, increasing climate change, and severe environmental pollution demand urgent solutions to reduce petroleum use, produce alternative chemicals and fuels, and develop sustainable chemical processes aimed at achieving carbon neutrality. Catalytic processes are the foundation of the chemical process industry and are exceptionally important for the development of modern society. It is estimated that catalysts and catalytic processes are used in more than 90% of modern chemical industrial processes. Key knowledge in chemistry, particularly organic synthesis and analytical chemistry, along with fundamental knowledge in chemical engineering, enables the continuous development, optimization, and improvement of chemical processes. The chemical engineering discipline has evolved continuously since its inception in the late 19th century, undergoing several transformations mainly driven by technological advances and societal needs. Today, we are witnessing further evolution driven by the application of artificial intelligence technologies. Given the scope of the problem and the broad application of chemical engineering in developing sustainable technologies, this work focuses on artificial intelligence (AI)- and machine learning (ML)-driven strategies for catalyst design and the development of sustainable chemical processes. Artificial intelligence is a broad field that includes various subfields such as machine learning and deep learning (DL). Although these concepts are related, they are sometimes mistakenly considered synonymous, even though they are based on different principles ( Figure 1). Therefore, understanding these distinctions helps clarify the rapidly evolving AI landscape. This review is organized into several chapters. The introduction outlines the challenges in catalyst design and the development of sustainable chemical processes in response to increasing demands caused by the energy crisis, climate change, and environmental pollution. The next chapter discusses the integration of AI and ML in catalytic research, with particular emphasis on the transformative role and impact on catalyst design, reaction optimization and multi-scale process engineering. After that, achievements in applying artificial intelligence and machine learning are presented, focusing on photocatalysis and the design of catalysts for CO 2 conversion. The following chapter addresses ethical issues, safety risks, and the responsible use of generative AI. In the final part of the review, challenges and guidelines for future research are discussed, and key conclusions are drawn. 1.1. Advances and Challenges in Modern Catalysis Heterogeneous catalysis continues to advance through the design of multifunctional materials, such as supported metals, molecular sieves, and oxides, with growing emphasis on activity, selectivity, and control of reaction environments. Tailoring active sites and modifying porosity remain essential for processes such as dehydrogenation, hydroisomerization, epoxidation, and hydrogenation. Zeolites and bifunctional catalysts remain central due to their tunable acidity and ability to regulate supported metal properties [ 1]. These developments illustrate how improvements in material design and control of catalytic environments continue to refine the performance of heterogeneous catalysts. At the same time, they highlight a broader point: catalysis underpins a wide range of chemical technologies, and the design of effective catalytic systems requires navigating complex relationships between synthesis, structure, and function. This broader perspective motivates a closer examination of the central role of catalysis and the multifaceted challenges inherent in catalyst design. 1.2. The Central Role of Catalysis and the Complexity of Catalyst Design Catalysis research, including biocatalysis, homogeneous catalysis, and heterogeneous catalysis, is central to advances in sustainable energy, materials, and pharmaceuticals. This field is inherently interdisciplinary, requiring collaboration among chemists, physicists, engineers, and computational scientists. Achieving efficient and sustainable catalytic processes requires a combination of experimental techniques, theoretical models, and advanced instrumentation. To develop effective catalysts, it is essential to understand how the way they are made influences how they behave in a specific reaction. However, this relationship is complex. A catalyst’s performance depends not only on its chemical composition but also on many closely related factors involved in its preparation. The ability to measure and map how formulation affects performance is crucial for designing improved catalysts [ 2 Despite its potential, applying supervised learning to catalyst development raises questions specific to this field. To build meaningful models, it is necessary to understand how the catalytic data itself affects the results. This is important because catalytic datasets have several constraints: they are typically small due to the high cost of experiments, the choice and range of variables must be defined by the researcher, measurement errors are unavoidable, and many experimental variables are strongly correlated. These correlations can be simple or highly nonlinear, depending on the underlying catalytic processes [ 3]. Taken together, these considerations emphasize that understanding catalytic performance requires more than identifying active materials. It requires a systematic ability to relate synthesis decisions to the structural and physicochemical features that govern reactivity. However, the multidimensional nature of catalyst formulation makes these relationships difficult to disentangle through experimentation alone. This challenge has created an opportunity for data-driven methods, particularly supervised machine learning, which can help clarify how specific synthesis parameters shape catalyst properties and, ultimately, catalytic behavior. 1.3. Catalysis for Sustainable and Renewable Technologies Sustainable catalytic applications are gaining momentum, with progress in carbon dioxide utilization, methane reforming, hydrogen production, polymer recycling, etc. Examples include copper-doped tin oxide for carbon dioxide reduction, nickel–ceria (Ni-CeOx) bifunctional catalysts for methane reforming, and mixed ionic and electronic conducting reactors for hydrogen generation [ 1]. Advances in catalytic depolymerization, glycolysis, and other recycling strategies show how kinetic modeling and experimental validation can support environmentally responsible and industrially viable technologies. Recently, many studies have focused on photocatalysis. Photocatalysis enables solar-driven chemical production, environmental remediation, and carbon-neutral strategies, but traditional design approaches face challenges, such as low efficiency and high discovery costs. Integrating computational materials science with machine learning helps address these barriers by providing insights into electronic structure, band gaps, surface reactivity, and microstructure–performance relationships. This combination enables the exploration of novel materials, microstructure optimization, and mechanistic understanding, thereby improving photocatalyst performance and expanding practical applications. Advanced computational methods, including density functional theory (DFT) and molecular dynamics (MD), provide detailed insights into electronic structures, band gaps, and surface reactions that are difficult to capture experimentally, while high-throughput screening (HTS) enables rapid evaluation of large candidate libraries, prioritizing promising materials for experimental validation [ 4 High-throughput screening has accelerated the identification of promising photocatalysts, as demonstrated in studies exploring ternary organic heterojunction [ 5] photocatalysts for hydrogen evolution, two-dimensional Janus heterostructures for solar energy applications [ 6], and tens of thousands of photocathode candidates for carbon dioxide reduction [ 7]. Deep learning further enhances the capabilities of HTS by predicting key material properties, such as band gaps, charge separation efficiency, and light absorption, from large experimental and computational datasets. Neural network-based models, combined with feature selection and regression techniques, enable rapid identification of high-performance photocatalysts and reduce reliance on trial-and-error approaches. Notable applications include the design of perovskite oxides for photocatalytic water splitting, with models successfully predicting hydrogen production rates and optimal band gaps. Integrating deep learning with computational modeling and HTS establishes a dynamic, data-driven framework that accelerates the discovery, optimization, and mechanistic understanding of photocatalysts, supporting the development of next-generation materials for renewable energy and environmental applications. 1.4. Computational, Digital, and Data Infrastructure for Modern Catalysis Supervised machine learning can directly identify which synthesis variables are most important, making it a valuable tool for guiding catalyst design. In practice, researchers often include intermediate properties that link formulation to performance. When these properties are clearly related to catalytic behavior, they are called catalyst descriptors. As pointed out by several researchers [ 11, 12, 13, 14] catalyst descriptors are numerical representations of catalyst, reactant, and reaction properties that enable the prediction of catalytic performance (activity, selectivity, durability) using data-driven models. Generally, catalyst descriptors can be categorized by the kind of information they encode into: geometric structural descriptors [ 15], electronic descriptors [ 16], thermodynamic/kinetic descriptors [ 17], composition and materials descriptors, reaction or environment descriptors, data-driven or learned descriptors [ 18], and emerging spectroscopic descriptors [ 16]. With the aid of ML, descriptors play a central role in optimizing catalyst performance, elucidating the essence of catalytic activity, and predicting more efficient catalysts [ 19]. However, selection of the optimal input variables for ML is a very challenging task in developing accurate and interpretable models [ 20, 21]. Additionally, descriptor choice must be mechanism-specific and application-specific. The most reliable design insights come from descriptors linked to the elementary step, the active sites, or the stability constraint that matters most in the target catalytic system [ 22, 23]. A schematic representation of multidimensional spaces during ML-driven rational catalyst design, showing the relationship between the features of catalytic materials, their quantifiable descriptors, and performance metrics, is given in Figure 2 [ 24]. For new catalytic systems, computational screening is usually the most efficient way to identify these descriptors, while for well-established reactions, the key properties are already known. Computational methods, including DFT, and multi-scale modeling deepen understanding of adsorption, reaction energetics, surface intermediates, and transport phenomena, supporting reactor design and process optimization [ 1]. Machine learning and AI have further accelerated catalyst design by enabling predictive modeling, rapid screening of materials, and analysis of complex reaction systems, bridging experimental observations with theoretical insights. These approaches are increasingly applied to model gas–solid flows, reaction kinetics, and mesoscale phenomena, supporting scale-up and process optimization [ 1]. The digitalization of research complements these computational advances by facilitating collaboration and data sharing among scientists. Online platforms and databases enhance transparency, reproducibility, and access to knowledge, fostering a global, interconnected community of catalysis researchers and accelerating discovery. In Germany, the National Research Data Infrastructure (NRDI) initiative [ 25] provides a standardized framework for research data management. Within this initiative, NFDI4Cat [ 26], the catalysis-focused consortium, promotes the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) principles [ 27]. Through partnerships with organizations, such as Chemistry Europe, NFDI4Cat supports the integration of FAIR practices and fosters a culture of digitalization and robust data stewardship [ 28]. The consortium employs advanced algorithms and standards including the Resource Description Framework (RDF), to ensure machine-readability and data quality. Its infrastructure includes a central repository, automated curation tools, and visualization interfaces, collectively improving interoperability and accelerating catalysis research [ 29]. Despite progress, establishing a standardized FAIR-compliant representation of catalysis data remains a significant challenge. Common data and metadata standards such as DataCite [ 30], PREMIS [ 31], CodeMeta [ 32], ExptML, and EngMeta provide useful foundations but often fail to capture the complexity of chemical workflows. XML-based schemas are similarly restrictive due to the diversity of methods and instrumentation used in chemistry. The Resource Description Framework (RDF) offers a more flexible alternative. RDF structures information as triples—subject, predicate, object—allowing rich, machine-interpretable descriptions [ 33]. In catalysis, the subject might represent a research step, measurement, substance, mixture, sample, or method; objects can be other resources or literal data. Predicates such as “has part”, “has numerical value”, or “is used in” express relationships, enabling interoperable data models suitable for complex chemical research. High-quality, diverse datasets are essential for training robust machine learning models. Understanding dataset diversity, novelty, and redundancy guides efficient model development [ 34]. In low-data regimes, carefully designed descriptors remain practical, while multi-level learning, delta learning, and physics-inspired inductive biases help reduce data demands. Sampling methods, such as entropic sampling and self-learning population annealing enable efficient exploration of chemical space [ 35]. Accessibility is equally important, as databases must be usable by non-experts to broaden participation and impact. For example, while the QM9 dataset contains over 100,000 molecules with computed energies, deriving meaningful chemical properties requires domain knowledge, which presents barriers for AI practitioners. Intuitive web interfaces can lower these barriers and broaden participation [ 36]. Reliable metadata and provenance tracking are also essential. Tools such as AiiDA [ 37] and NoMaD [ 38] ensure reproducibility in materials simulations. Differentiating small high-accuracy datasets from large benchmark datasets helps prevent overfitting and promotes practical model deployment. Community-driven curation further enhances both data quality and reliability, ensuring that datasets remain valuable resources for developing accurate and generalizable machine learning models. 2. Artificial Intelligence in Chemical Discovery and Engineering The integration of AI and ML into catalysis research has fundamentally transformed catalyst discovery and process optimization. Traditionally, catalyst development relied on empirical, trial-and-error methods, which were time-consuming and limited in exploring vast chemical design spaces. Recent advances in ML have enabled the extraction of complex structure–activity relationships, supporting predictive modeling of catalytic performance and the rational design of novel materials [ 39, 40]. At the same time, the increasing focus on sustainability has driven the use of AI not only in catalyst discovery but also in optimizing chemical processes, including energy efficiency, emission reduction, and resource utilization [ 41]. Interested readers can find more detailed information on this topic in an excellent overview of several applications of AI and ML in analyzing catalytic performance, characterizing structures through spectroscopic data, developing kinetic and mechanistic models, and addressing transport limitations, as reported by Günay and Yıldırım [ 41]. However, despite rapid progress, the field remains fragmented, with significant challenges in data availability, model generalizability, and experimental validation. Artificial intelligence is increasingly addressing challenges in retrosynthetic planning, catalyst design, reaction optimization, and autonomous experimentation. In retrosynthetic analysis, ML models, particularly transformer-based systems and generative models, have demonstrated the ability to predict reaction outcomes and propose synthetic routes with high accuracy [ 41]. In the domain of catalyst design, AI methods have been applied to uncover complex structure–property relationships and accelerate the discovery of new catalytic materials. Reaction optimization has similarly benefited from AI approaches, where techniques such as Bayesian optimization efficiently navigate multidimensional parameter spaces to identify optimal reaction conditions. Finally, the emergence of self-driving laboratories, integrating robotics with machine learning, has enabled fully autonomous experimental workflows, exemplified by systems such as ChemOS [ 42], which iteratively design, execute, and analyze experiments without human intervention. These tools help chemists navigate complex chemical spaces, refine reaction conditions, and discover new reactivity with unprecedented speed [ 43]. As pointed out by Li et al. [ 44], the future of the laboratory is envisioned as the integration of testing modules into automated workflows that provide real-time feedback on experimental results in significantly less time, while simultaneously improving the accuracy and efficiency of the studied processes. AI now supports route planning for simple molecules and provides insights into complex natural products. Virtual library screening is promising but remains limited by scaffold diversity. AI has enhanced reaction optimization and is beginning to support autonomous experimentation using flow chemistry and robotics. Remaining challenges include limited domain knowledge in current models, high automation costs, and the need for chemical expertise to interpret outcomes. Future progress will depend on high-quality databases, intuitive tools, and strong collaboration between chemistry, computer science, and engineering. Recent advances have produced chemistry-specific AI agents that enhance large language models with domain tools and structured workflows. Kangyong Ma [ 45] developed a chemical intelligent assistant using eight fine-tuned open-source large language models (LLMs) trained on 1.7 million chemistry instructions, achieving strong performance with Mistral NeMo. The system integrates molecular visualization, SMILES processing, and literature retrieval, and improves through continuous feedback. Other specialized agents extend these capabilities. ChemCrow, developed by Kevin Maik Jablonka and colleagues [ 46], integrates GPT-4 with expert chemical tools for organic synthesis, drug discovery, and materials design. Coscientist, created by Daniil A. Boiko and collaborators [ 47], autonomously plans and performs scientific experiments through web search, retrieval, code execution, and robotic automation. ChemCrow and Google’s AI Co-scientist have shown promising results, but demonstrate limited reliability, because ChemCrow has had tool-integration problems and brittle behavior in some settings, while CoScientist is framed more as a hypothesis-generation aid than a fully autonomous, experimentally validated system [ 48]. CACTUS (Chemistry Agent Connecting Tool Usage to Science), introduced by Andrew D. McNaughton and his team [ 49], enhances reasoning and discovery by integrating LLMs with cheminformatics resources. Together, these systems demonstrate how tailored AI agents expand the usefulness of large language models in chemistry, broadening their practical applications and enabling more sophisticated problem-solving and molecular discovery. Artificial intelligence is increasingly shaping chemical engineering in areas such as monitoring, process control, catalyst discovery, and product design. Hybrid models that combine data-driven and physics-based elements improve accuracy and interpretability. Recent developments include reinforcement learning for process optimization, Bayesian tuning strategies, transformer-based reactor control, and multimodal data fusion techniques [ 50 Responsible use of generative AI in chemical engineering must follow established engineering ethics, including honesty, integrity, respect for life, public safety, and environmental protection [ 51]. The inherent risks of chemical processes demand rigorous oversight, and while large language models include basic safeguards, they require additional frameworks to ensure transparency, reliability, and safe use. Generative AI offers opportunities to automate flowsheet design, advance material and electrode development, and accelerate green technologies such as fuel cells and batteries. Challenges include limited machine-readable datasets, insufficient integration of chemical knowledge, and risks of generating unsafe or impractical designs. Human expertise and regulatory structures remain essential to ensure responsible adoption. 2.1. Advances in AI and ML for Catalyst Design, Reaction Optimization, and Multi-Scale Process Engineering PHOTOREAC, a MATLAB-based tool introduced by Acosta-Herazo et al. (2020), provides an accessible platform for modeling slurry solar photocatalytic reactors, offering radiation-field calculations and kinetic modeling capabilities [ 52]. Although simplified, it helps bridge the gap between laboratory experiments and engineering-scale design. Other material-specific advances include defect-engineered iron oxide catalysts for volatile organic compound oxidation, machine learning-guided discovery of multi-element reverse water–gas shift catalysts, artificial intelligence methods in geoscience, and data-driven design of single-atom catalysts [ 53]. Large datasets such as the oxidative coupling of methane database compiled by Mine et al. (2021) further demonstrate how machine learning can uncover key descriptors and enable extrapolative catalyst discovery [ 54 In recent years, ML has moved beyond specialized fields to become increasingly integrated into everyday applications and scientific research, including chemistry and physics. Its potential in catalysis is particularly promising, given the complexity of catalytic systems, which span from atomic-level active sites to large-scale industrial reactors and underpin the production of over 90 percent of industrial chemicals. Traditional approaches, including mechanistic studies, empirical exploration, and first-principles modeling, have provided valuable insights but often struggle with the high dimensionality, nonlinearity, and multi-scale interactions of real-world systems. Machine learning offers a complementary strategy, enabling robust predictions without complete mechanistic knowledge and facilitating tasks, such as estimating adsorption energies and reaction barriers, optimizing operating conditions, and designing reactors. Recent developments include surrogate models trained on density functional theory (DFT) data for catalyst screening, graph-based learning for reaction network exploration, reinforcement learning for process optimization, and physics-informed machine learning and neural networks that embed fundamental scientific laws into model architectures. These hybrid approaches allow for reliable, physically consistent predictions, accelerating catalyst discovery and process optimization while generating interpretable insights that bridge data-driven methods with chemical theory [ 55 Machine learning is a multidisciplinary and rapidly evolving field that develops algorithms capable of learning from data without explicit programming, drawing on expertise from computer science, statistics, mathematics, engineering, physics, chemistry, and neuroscience. It is generally categorized into several types based on how algorithms learn from data, which can be broadly divided into supervised, unsupervised, and reinforcement learning ( Figure 3). Supervised learning is the most widely used category in catalysis, where models are trained on labeled datasets (input–output pairs). Unsupervised learning operates on unlabeled data to identify hidden patterns, structures, or clusters within the data. Reinforcement learning (RL) is an approach in which an agent learns to make decisions by interacting with the environment, receiving feedback based on its actions. In catalyst development, ML enables researchers to analyze large datasets, identify key descriptors, such as material composition, structure, synthesis methods, and physical properties, and predict target properties like activity, selectivity, and stability. By revealing complex relationships between features and catalytic performance, ML supports more efficient catalyst design and material screening. However, careful attention to best practices and potential pitfalls is essential to ensure reliable and accurate predictions [ 56 A summary of the key advantages, disadvantages, and critical perspectives of the major AI/ML techniques is provided in Table 1. The field of chemical engineering has continually evolved, driven by technological advances and societal needs, progressing from traditional unit operations to molecular simulation, nanotechnology, catalysis, and sustainability. Currently, artificial intelligence and machine learning (AI/ML) are driving another major transformation, reshaping how chemical engineers approach complex problems [ 50]. AI/ML applications are already providing tangible benefits, including enhanced process monitoring and control, accelerated drug and catalyst design, optimization of industrial processes, and the development of products with tailored properties. Despite these advances, challenges remain due to limited and noisy process data, the potential for model errors with safety and regulatory implications, and the need for interpretable models that integrate domain knowledge. Contributions in the Special Issue [ 50] highlight how AI/ML is being applied across multiple scales, from atomic and molecular systems to process and systems engineering, enabling fundamentally new approaches to both long-standing and emerging challenges in chemical engineering. Key themes emerging from these studies include the integration of mechanistic insights into AI/ML models and the development of hybrid modeling strategies that combine first-principles understanding with data-driven discovery. Examples include convolutional neural networks enhanced with mechanistic knowledge for predicting gas adsorption in metal–organic frameworks, physics-informed transfer learning to reduce data requirements in process control, and symbolic regression methods for generating interpretable mathematical models. Other advances demonstrate the use of AI/ML for process design, optimization, and control, such as reinforcement learning for bilevel optimization, Bayesian optimization for autotuning controllers, and transformer-based models for reactor operation. Additionally, multimodal and multigranularity data fusion techniques allow models to integrate diverse datasets of varying quality effectively. Overall, these works show that AI/ML is not only enhancing predictive capabilities, but also providing deeper scientific insights and enabling more efficient, sustainable, and innovative solutions in chemical engineering as well as in the chemical process industry [ 50 In many domains, AI/ML methods can significantly outperform classical methods. As illustrated in Table 2, AI/ML methods often offer significant improvements, such as catalyst screening that is 100 to 1000 times faster, a substantial reduction in required experiments and resources, higher prediction accuracy, shorter experimentation time, and improved yield and/or selectivity [ 58 Despite the numerous advantages described above, the lack of transparency and interpretability of “black-box” models can lead to serious mistakes and dangerous decisions, which can hinder regulatory trust [ 14, 59]. The “black-box” AI/ML models suffer from serious limitations involving interpretability, causality, reproducibility, generalization, and insufficient mechanistic understanding. Nevertheless, there are promising strategies to address these problems, including explainable AI (XAI), physics-informed machine learning, hybrid quantum chemistry–ML approaches, symbolic regression, and uncertainty quantification, among others [ 60]. However, the most scientifically valuable future direction will probably be a balanced integration of AI prediction, mechanistic chemistry, physical theory, and experimental validation. It is especially important to emphasize the need for validation of AI-based models. Myllyaho et al. [ 61] presented a systematic literature review of validation methods used to ensure the dependability and trustworthiness of practical AI systems, based on 90 primary studies. They concluded that AI-based models are generally validated in a limited but useful way. These models often perform well on benchmark data, simulations, or control trials, but fewer studies report rigorous validation in realistic, industrial settings with experiments, external datasets, or continuous post-deployment monitoring. It should also be pointed out that the fit between AI models and experiments is often good at the level of statistical prediction but weak when judged against causal, mechanistic, or operational experimental evidence. Therefore, AI-based models are predictively validated much more often than they are experimentally validated. Table 3 summarizes methods for photocatalyst discovery, optimization, and reactor modeling. It highlights tools such as PHOTOREAC, derivative-free sparse identification (DF-SINDy), and ML frameworks applied to photocatalytic CO 2 reduction, multicomponent reactions, and catalyst screening. The main findings focus on accelerated prediction, mechanistic insights, and experimental efficiency. Key challenges include limited datasets, generalizability, model interpretability, and assumptions in reactor modeling. Table 4 focuses on catalyst development and reaction optimization for applications such as volatile organic compound (VOC) oxidation, oxidative coupling of methane (OCM), reverse water–gas shift reaction, CO 2 reduction and H 2 evolution, single-atom catalysts (SACs) and hydrodesulfurization. Methods include ML models (version 0.23.1) (XGBoost (version 1.2.1), Random Forest), ML potentials, structural engineering, and high-throughput AI experiments. The main findings show that AI/ML can identify new catalysts, predict activity, and reveal structure–property relationships. Limitations include dataset quality, transferability, stability, and integration with sustainability metrics. Table 5 covers materials discovery and process optimization using databases, ML, LLMs, and generative AI. Applications include MOF electronic property prediction, CO 2 capture materials, MOF synthesis optimization, and chemical process design. As an example of the successful application of ML in the development of advanced materials, Huang et al. [ 70] propose a data-driven MOF design strategy that links adsorption conditions, pore structure, site chemistry, and phosphate speciation to guide efficient phosphate removal and resource recovery. The findings demonstrate that AI/ML enables accelerated materials discovery, property prediction, inverse design, and process automation. Challenges include dataset diversity, interpretability, integration with experiments, and safety concerns in generative outputs. Acosta-Herazo and colleagues (2020) [ 52] present PHOTOREAC, a MATLAB-based graphical application developed to model and simulate large-scale slurry solar photocatalytic reactors for water-treatment applications. The software integrates modules for two core functions: (i) a photon absorption-scattering module that computes the radiation field (using a six-flux model and a variant coupled with the Henyey–Greenstein scattering phase function) and (ii) a kinetic modeling module that fits experimental photodegradation data with multiple kinetic expressions. The application comes pre-loaded with a database of 26 experimental datasets (different pollutants, catalyst concentrations, and reactor types) and allows users to import their own data. Through three example cases the authors demonstrate how PHOTOREAC can estimate radiation-independent kinetic constants, compare different kinetic models, and analyze the influence of key operational parameters (such as reactor geometry, catalyst loading, and incident radiation). The authors argue that PHOTOREAC lowers the barrier for non-expert researchers to engage in photoreactor design and scale-up, by offering a more accessible alternative to full computational fluid dynamic (CFD) simulations. They also note several limitations, mainly the current restriction to using only one photocatalyst (i.e., titanium dioxide (TiO 2P 25)), and the reliance on simplified assumptions such as a well-mixed system and the absence of mass transport limitations. The authors further outline possible directions for future improvements. In the broader context of environmental photocatalysis, the tool addresses a recognized gap: while process and reactor modeling are vital for reactor design and optimization, many researchers focused on experimental work have limited access to dedicated simulation platforms. Thus, PHOTOREAC contributes to closing the gap between experimental photoreactor testing and engineering-scale design by offering a flexible, simplified, and user-friendly modeling approach [ 52 While PHOTOREAC provides a practical and accessible platform for modeling and optimizing photocatalytic reactors, advancing catalyst performance itself requires a complementary focus on material design and mechanistic understanding. Recent studies have shown that structural engineering and defect modulation, as in Fe 2O 3-based catalysts, can significantly enhance activity and selectivity, yet challenges in stability and complex reaction environments persist [ 53]. Zhang et al. (2024) [ 53] present a comprehensive analysis of Fe 2O 3-based catalysts developed for the catalytic oxidation of toluene, emphasizing how structural engineering and defect modulation can significantly enhance their performance. The review identifies oxygen vacancies as the key active sites that facilitate lattice oxygen mobility and accelerate redox cycling during the Mars–van Krevelen mechanism, which dominates toluene oxidation over iron oxides. The authors discuss how strategies such as morphology control, heteroatom doping, and the construction of Fe 2O 3-based composites effectively increase the concentration of surface defects and improve electron transfer efficiency. Incorporating secondary metal oxides or supports was shown to enhance oxygen activation and lower the reaction temperature required for complete toluene conversion. Additionally, the paper highlights the potential of using waste-derived iron materials as precursors for sustainable catalyst synthesis. Despite these advances, challenges remain in achieving high stability and activity under low-temperature and complex-gas conditions. The authors advocate integrating advanced characterization and data-driven modeling approaches to guide the rational design of next-generation Fe 2O 3-based catalysts for efficient VOC abatement [ 53 The study by Mine et al. (2021) [ 54] presents an updated dataset comprising 4759 experimental data points on the oxidative coupling of methane (OCM), compiled from the literature up to 2019. Using machine learning (ML) techniques, such as Extra Trees Regressor (ETR), eXtreme Gradient Boosting (XGBoost), and Random Forest Regression, the authors analyzed the dataset to identify key features influencing C 2 hydrocarbon yields. The ML models successfully predicted catalyst compositions that were not previously represented in the dataset, demonstrating the potential of ML for extrapolative catalyst discovery. The study also highlighted the significance of elemental features over direct catalyst compositions in model predictions, offering insights into the design of more effective OCM catalysts [ 54]. The integration of machine learning techniques has further expanded the toolkit for catalyst development, enabling predictive modeling, feature identification, and extrapolative discovery in systems such as oxidative coupling of methane [ 54] and multi-element catalysts for CO 2 conversion. Beyond conventional machine learning, artificial intelligence approaches, such as neural networks and large language models (LLMs), are increasingly applied to the design, screening, and optimization of complex catalytic systems, including single-atom catalysts, offering mechanistic insights, high-throughput predictions, and guidance for rational catalyst design. Wang et al. (2023) [ 67] introduced an extrapolative machine learning framework to accelerate the discovery of multi-element catalysts for the reverse water–gas shift reaction, a key process for converting carbon dioxide into carbon monoxide. By integrating data-driven prediction with iterative experimental validation, the authors evaluated approximately 300 catalyst compositions through 44 learning cycles and identified more than one hundred highly active candidates. A notable achievement of this approach was the discovery of an efficient platinum–rubidium–barium–molybdenum–niobium catalyst supported on titanium dioxide, in which niobium, a previously untested element, played a critical role, demonstrating the model’s capacity to extrapolate beyond known chemical spaces. The study used a sorted weighted elemental descriptor to represent catalysts based on fundamental elemental properties, enabling the model to predict high-performance combinations from limited training data. Feature-importance analysis further revealed that catalytic activity strongly correlates with parameters related to electronic configuration and oxygen affinity. Overall, this work exemplifies how machine learning can transcend traditional trial-and-error approaches in heterogeneous catalysis, offering a powerful route for the rapid identification and rational design of complex multi-element systems for carbon dioxide utilization [ 67 Yu et al. (2025) [ 68] review the emerging role of artificial intelligence in the design, optimization, and application of single-atom catalysts. Single-atom catalysts are highly promising in electrocatalysis due to their atom-level dispersion, which enhances activity, selectivity, and stability, but their design and optimization are complex. Artificial intelligence, particularly machine learning and neural networks, has emerged as a powerful tool to accelerate SAC development by enabling high-throughput simulations, identifying key performance features, screening structural models, and predicting novel catalyst structures. AI-driven approaches extend the capabilities of DFT to larger and more complex systems, integrate experimental and computational data to construct predictive models, and simulate reactions under realistic conditions, including temperature, pressure, and solvent effects. By combining data-driven modeling with physical and chemical principles, AI not only interprets existing experimental results, but also guides the rational design of high-performance SACs, opening new avenues for innovation in energy conversion and electrocatalysis [ 68 Beyond catalyst discovery, AI methods are also transforming our understanding of catalytic reaction mechanisms. By integrating experimental and computational data, machine learning enables accurate modeling of reaction kinetics, identification of key mechanistic features, and optimization of reaction conditions, addressing many limitations of traditional phenomenological approaches. A deep understanding of catalytic reaction mechanisms is crucial for advancing chemical kinetics, but traditional phenomenological models have limitations, such as convergence to local minima, reliance on difficult-to-measure parameters, and high computational costs, particularly for complex catalyst structures or feedstocks. In recent years, machine learning has emerged as a powerful alternative, enabling data-driven modeling of catalytic systems with applications ranging from material and condition screening to mechanism classification and reaction rate analysis. ML approaches facilitate accurate kinetic parameter estimation, extraction of complex patterns from experimental data, and integration with molecular dynamics and optimization methods, overcoming many constraints of conventional techniques. While deep learning often requires large, high-dimensional datasets, simpler ML models can provide valuable insights with smaller, high-quality datasets. Key considerations in applying ML include model interpretability, generalizability, computational efficiency, and data quality. Establishing standardized benchmarks for dataset size and quality remains an important direction for future research. Overall, machine learning offers transformative potential in catalysis by enhancing the design, optimization, and mechanistic understanding of catalytic systems [ 73 Taking this a step further, AI is not only a tool for modeling and prediction but can also actively collaborate with human researchers. Co-intelligence (CoI) as described by Ethan Mollick [ 74], exemplified by large language models, represents a new paradigm in which AI engages directly with human intelligence to assist in research design, problem-solving, and optimization, complementing both conventional ML and mechanistic modeling. Co-intelligence (CoI) arises from the collaboration of multiple individuals sharing diverse knowledge, and in scientific research, it can manifest through experimental–theoretical partnerships or human–robot interactions. In this context, LLMs represent a form of artificial intelligence capable of engaging with human intelligence (HI) to assist in research design and problem-solving. These models leverage natural language processing and generative AI to answer questions, perform optimization tasks, and deliver reasoning, with applications spanning catalysis, data mining, molecular and materials design, chemical space exploration, organic synthesis, property optimization, and education. Despite their potential, LLMs face limitations, including hallucinated outputs, difficulty with specialized scientific language, and dual-use concerns. In a recent study, an LLM was used to codesign a computational catalysis project with machine learning under minimal prompt engineering, demonstrating its ability to contribute meaningfully to high-level project design, workflow development, and evaluation, while also highlighting persistent flaws. This work illustrates the potential of LLMs to enhance research productivity and innovation in complex scientific domains, though further refinement and domain-specific integration are needed [ 63 Extending beyond individual projects, AI and ML are increasingly applied across multiple scales in process systems engineering, from molecular-level reactions to full plant and supply chain operations. These multi-scale applications highlight the versatility of AI in integrating physical principles, human–AI collaboration, and generative modeling to optimize complex systems holistically. Srinivasan et al. (2025) [ 71] provide a comprehensive review of artificial intelligence and machine learning applications across multiple scales in process systems engineering, ranging from molecular and reaction levels to materials, processes, plants, and supply chains. The authors examine the utility of AI and ML at both the design and operational stages, emphasizing the distinct representational frameworks employed at different scales and the physical principles they capture, including equivariance, additivity, injectivity, connectivity, hierarchy, and heterogeneity. They highlight key AI techniques, including hybrid AI modeling, human–AI collaboration, and generative AI methods, and stress the importance of hyperparameter tuning in hybrid models, particularly with physics-informed regularization. The review also discusses human–AI interactions, distinguishing between human-complements-AI and AI-complements-human systems, and emphasizes model explainability through rule-based explanations, example-based reasoning, simplification, visualization, and feature relevance. Additionally, generative AI approaches, such as generative adversarial networks, graph neural networks, large language models and transformers are highlighted for their ability to leverage non-traditional process data, including images, audio, and text, in situations where high-quality labeled data are limited. Overall, the work underscores how AI and ML can enhance process design, optimization, and automation while addressing challenges related to data representation, model interpretability, and multi-scale system complexity [ 71 At the material and reaction scale, AI and machine learning offer powerful strategies to accelerate experimental optimization, enabling more efficient discovery and performance tuning of catalysts and functional materials. Li et al. (2025) [ 64] present a dynamic machine learning-driven approach to optimize the microwave-assisted synthesis of photocatalysts for enhanced hydrogen peroxide production. Their methodology combines iterative cycles of machine learning analysis and experimental validation, allowing efficient optimization without large datasets. Applied to quercetin-based photocatalysts, the approach achieved optimal performance after only three iterations, resulting in significantly improved hydrogen peroxide production rates. The study demonstrates the effectiveness of few-shot machine learning strategies in guiding catalyst synthesis, highlighting a sustainable and efficient pathway for accelerating the development of high-performance photocatalytic materials [ 64 Beyond optimizing synthesis conditions, machine learning can also establish direct, interpretable links between experimental measurements and catalytic performance, enabling predictive evaluation of materials from limited data. In this study [ 65], machine learning (ML) was employed to link infrared (IR) spectral signals of adsorbed species with macroscopic catalytic performance, providing a direct, interpretable, and transferable approach for catalyst screening. Using the photocatalytic NO oxidation reaction as a model system, the ML framework accurately predicted nitrate formation solely from IR signals of NO adsorption, and its generalizability was further demonstrated with a CaCO 3-decorated g-C 3N 4 catalyst. The model’s predictions aligned with mechanistic understanding, confirming the physical plausibility of the approach, and enabled a quantitative assessment of catalytic activity. Notably, the ML-driven method reduced experimental time by approximately 3.5 times, highlighting its efficiency and potential to extend traditional spectroscopic techniques for predictive catalyst evaluation. 2.2. AI in Catalyst Design and High-Throughput Platforms Recent advances in artificial intelligence (AI) are transforming the landscape of catalyst design and synthesis. Machine learning (ML) methods, in particular, are revolutionizing traditional approaches by enabling rapid identification of catalytic materials, optimization of synthesis conditions, and automation of experimental workflows [ 65]. By efficiently processing large datasets, AI can uncover complex structure–property relationships and predict catalyst performance with remarkable accuracy. This capability allows researchers to explore extensive chemical spaces and accelerate the development of novel catalysts for applications ranging from energy conversion to environmental remediation. The integration of AI with high-throughput experimental platforms further enhances this proces
Artificial Intelligence- and Machine Learning-Driven Strategies for Catalyst Design and Sustainable Chemical Processes