Human-Centered AI for Decision Support Systems: Enhancing Usability and Trustworthiness

In recent years, Artificial Intelligence (AI) technologies have become increasingly integrated into Decision Support Systems (DSS) across critical domains such as healthcare, finance, cybersecurity, and risk management [ 1, 2]. Machine learning and deep learning models are widely used to support tasks including medical diagnosis, loan approval, fraud detection, and predictive risk assessment. These systems can process large volumes of heterogeneous data and identify complex patterns beyond the reach of traditional analytical techniques, improving efficiency, predictive accuracy, and consistency. Despite these advantages, many AI systems operate as “black-boxes,” generating predictions without understandable explanations regarding how decisions are produced [ 3, 4]. Although such models often achieve high predictive performance, their lack of transparency creates challenges in domains where accountability, interpretability, and human judgment are essential. Clinicians and financial analysts may hesitate to rely on AI recommendations when the underlying reasoning remains unclear. Although previous studies have explored explainability and trust in AI systems, important research gaps remain [ 4, 7]. Existing work has focused mainly on algorithmic performance or technical explainability methods rather than evaluating how human-centered explainable systems influence expert decision-making behavior in realistic DSS contexts. In particular, there is limited empirical evidence regarding the impact of explainable AI interfaces on trust, usability, decision accuracy, and human-AI collaboration in professional environments. To address these limitations, this study proposes a Human-Centered AI Decision Support System (HCAI-DSS) that integrates explainable AI mechanisms, interactive human oversight, and user-centered design principles. The framework combines machine-learning prediction models with SHAP-based explanations, transparency features, and user-override functionality to support collaborative and interpretable human-AI decision-making. Unlike conventional black-box DSS approaches, the proposed system enables users not only to receive AI recommendations but also to understand prediction rationale and maintain meaningful control over final decisions. The proposed HCAI-DSS was evaluated through a controlled within-subject comparative study involving experts from healthcare and finance domains. Participants completed decision-making tasks using both the HCAI-DSS and a conventional black-box AI system. The evaluation examined trust, trustworthiness, usability, decision accuracy, and decision time through quantitative statistical analysis complemented by qualitative participant feedback. This study contributes (1) a fully implemented HCAI-DSS architecture integrating explainable AI and human oversight mechanisms, (2) a comparative experimental evaluation involving healthcare and finance experts, (3) a multidimensional assessment framework covering trust, usability, accuracy, and efficiency, and (4) empirical evidence demonstrating the advantages of human-centered explainable AI over conventional black-box systems. 2.1. Decision Support Systems: Foundations and Modern Extensions With the rise of machine learning and large-scale data, modern DSS increasingly rely on predictive algorithms for risk scoring, forecasting, anomaly detection, and recommendation. However, this evolution introduced new challenges: opaque “black-box” models often sacrifice transparency for accuracy, raising concerns in regulated sectors such as healthcare, finance, and public administration. These limitations have motivated renewed interest in explainable, trustworthy, and user-centered DSS designs. 2.2. Human-Centered Artificial Intelligence (HCAI) Human-Centered AI (HCAI) advocates for AI systems that maintain high levels of human agency, control, and oversight while ensuring transparency, safety, usability, and alignment with human values. Shneiderman [ 5] introduced a two-dimensional model positioning HCAI as an approach that combines high levels of automation with high levels of human control, countering the misconception that increasing autonomy must diminish human oversight. HCAI research emphasizes transparency, accountability, fairness, and meaningful human oversight as core requirements for trustworthy decision-support systems [ 5, 9 Shneiderman [ 10] argues that responsible AI should augment human capabilities rather than replace human judgment, ensuring that humans remain accountable for final decisions. These principles align with the EU Guidelines for Trustworthy AI (2019), which identify seven key requirements, including transparency, technical robustness, governance, diversity, and societal benefit [ 6]. In the context of DSS, HCAI reframes AI not as a replacement for expert judgment but as a cognitive augmentation tool that enhances users’ decision-making capabilities while preserving their authority and responsibility [ 11 Despite increasing recognition of HCAI’s importance, there remains limited empirical work on how the framework can be operationalized through concrete system architectures, algorithms, and workflows in real decision environments. 2.3. Explainable Artificial Intelligence (XAI) and Transparency Techniques Explainable AI (XAI) aims to make machine learning models more interpretable, either by using inherently transparent models (e.g., decision trees, linear models) or by applying post hoc interpretation techniques to complex models. Previous explainable AI research established core principles of explanation quality, interpretability, selectiveness, and human-centered transparency in AI systems [ 3, 4 The growing demand for explainability has led to the development of Explainable Artificial Intelligence (XAI), a research field aimed at making AI decisions understandable to human users [ 12 Common explainability approaches include feature-importance analysis, local surrogate models such as LIME, SHAP-based interpretation, counterfactual explanations, rule-extraction techniques, and attention visualization methods [ 7, 13 Among these, SHAP has emerged as one of the most widely adopted due to its theoretical grounding in cooperative game theory and its ability to provide both local and global explanations. SHAP values quantify each feature’s contribution to individual model predictions, making them suitable for regulated environments that require verifiable and consistent explanations. SHAP explanations are derived from Shapley values in cooperative game theory, where each feature contribution is calculated based on its marginal impact on the model prediction across different feature combinations. Unlike simpler feature-importance methods, SHAP provides both local explanations for individual predictions and global explanations describing overall model behavior. This dual capability is particularly valuable in high-stakes DSS environments where users must understand both specific recommendations and broader system logic. Furthermore, TreeSHAP implementations enable computationally efficient explanation generation for ensemble tree-based models such as Random Forests and XGBoost [ 7 In many high-stakes DSS applications, explainability has become increasingly important for supporting transparency, accountability, and informed human oversight. Healthcare regulations (e.g., EU AI Act, FDA Software-as-a-Medical-Device (SaMD) Clinical Evaluation Guidance (2017)) and financial regulations (e.g., Basel III [ 14], GDPR Article 22 [ 15]) require that AI-driven decisions be interpretable and auditable. However, many existing studies do not sufficiently describe how explanations are generated, validated, or integrated into operational decision workflows. 2.4. Trust, Human-AI Interaction, and Human-in-the-Loop Approaches Trust is a central determinant of whether human decision-makers adopt AI recommendations. Previous human-AI interaction studies distinguish between calibrated trust, over-trust, and under-trust in AI-assisted decision-making environments. Human-AI collaboration models such as human-in-the-loop (HITL) and human-on-the-loop (HOTL) [ 16, 17] help maintain calibrated trust by ensuring that users can inspect, challenge, and override AI outputs. Studies in aviation, clinical decision-making, and risk assessment demonstrate that user empowerment, explainability, and feedback mechanisms improve decision accuracy and reduce automation bias. Recent interaction-design studies indicate that trust is influenced not only by predictive accuracy but also by users’ understanding of AI reasoning, confidence levels, error-correction capabilities, and feedback mechanisms. Although prior work has shown the promise of explainable AI and human-centered design principles, much research has either remained on algorithmic explainability or developed conceptual HCAI frameworks only. Relatively fewer studies have offered brief descriptions of these systems along with some empirical evaluations of HCAI-assisted decisions made by domain experts in realistic decision scenarios. As such, research is required to bridge the divide between theoretical HCAI principles and their implementation in real-life, high-stakes decision settings. 2.5. Identified Research Gap The literature reveals several unresolved limitations, including insufficient integration of HCAI principles into fully implemented DSS architectures, limited transparency regarding model training and explainability procedures, lack of empirical evaluation involving domain experts, limited statistical rigor in comparative studies, and weak operationalization of adaptive feedback mechanisms in practical DSS environments. To address these gaps, this study provides a fully specified HCAI-DSS architecture that integrates SHAP explainability, expert feedback mechanisms, reproducible technical implementation details, and controlled comparative experiments with domain experts supported by rigorous statistical evaluation. presents the proposed HCAI-DSS architecture and its main components, including the ensemble model, explainability layer, human override mechanism, and feedback logging process. 2.6. Datasets and Preprocessing Healthcare Domain—Heart Disease Dataset: The healthcare experiment used the UCI Heart Disease dataset [ 18], which contains structured clinical records commonly used for cardiovascular prediction research. The dataset includes demographic, physiological, and diagnostic variables associated with heart disease classification. Standard preprocessing procedures were applied, including missing-value handling, categorical encoding, feature normalization, and stratified train–test splitting. Finance Domain—Credit Approval Dataset: The financial experiment used the German Credit dataset [ 19], a widely adopted benchmark for credit risk assessment. The dataset contains demographic and financial attributes used to classify creditworthiness outcomes. Preprocessing included categorical encoding, feature normalization, and stratified train–test splitting to ensure balanced evaluation across classes. For both datasets, standard preprocessing procedures were applied, including categorical encoding, missing-value handling, and stratified train–test splitting. The datasets were divided into 80% training and 20% testing subsets while preserving class distributions. All preprocessing and model-training procedures were applied consistently across experimental conditions. 2.7. Model Architecture and Implementation We implemented predictive models for each domain using ensemble tree-based algorithms: Random Forest and Gradient Boosting were selected due to their strong predictive performance, robustness, and compatibility with SHAP explainability techniques. Random Forest [ 20] combines multiple decision trees through bootstrap aggregation to improve predictive accuracy and reduce overfitting. Gradient Boosting follows the framework proposed by Friedman [ 21], where sequential models are trained to correct the residual errors of previous models, resulting in strong predictive performance for structured data. Random Forest Classifier: Random Forest models were implemented using the RandomForestClassifier from scikit-learn. Hyperparameter optimization was conducted using grid-search cross-validation. The final configuration used 200 estimators with a maximum tree depth of 12 and bootstrap aggregation enabled. XGBoost Gradient Boosted Trees: XGBoost classifiers were implemented using the XGBClassifier library [ 22]. Hyperparameter optimization was performed using grid-search cross-validation. The final tuned configuration used 300 boosting estimators with a maximum depth of 8 and learning-rate regularization. Model Training and Validation: Each model (RF and XGBoost) was trained on the 80% training split of its respective dataset. We used 5-fold cross-validation within the training set to tune hyperparameters, as noted above, optimizing for classification accuracy (or equivalently, minimizing classification error). The best parameters were then used to retrain on the full training set. Performance on the 20% test set was evaluated to ensure the models attained reasonable accuracy: for heart disease prediction, our models achieved approximately 85% accuracy on the test set, which is in line with or slightly above prior literature using this dataset. For credit risk prediction, the models achieved about 77–80% accuracy on the test set (with higher precision on the majority “good” class, reflecting the class imbalance). One model per domain was selected for the user study based on validation performance and interpretability considerations. Therefore, for consistency in the user experiment, we selected one model per domain to drive both the HCAI-DSS and black-box versions. Specifically, we chose the Random Forest model for the heart disease domain (as it had marginally better validation performance and simpler explanations) and the XGBoost model for the credit domain (for its slightly better performance on that dataset). All models were implemented using Python (version 3.11) and machine learning libraries with fixed random seeds and documented parameter configurations. 2.8. Model Predictive Performance Evaluation In addition to evaluating user-centered outcomes, the predictive performance of the underlying machine learning models was assessed using standard classification metrics. For the healthcare domain, the Random Forest model achieved an accuracy of 85.1%, a precision of 0.84, a recall of 0.86, and an F1-score of 0.85 on the held-out test dataset. The corresponding ROC-AUC value was 0.90, indicating strong discriminative capability in identifying heart disease risk cases. In the financial risk assessment domain, the XGBoost classifier achieved an overall accuracy of 78.4%, a precision of 0.76, a recall of 0.74, and an F1-score of 0.75. The ROC-AUC score of 0.82 suggests satisfactory predictive reliability despite class imbalance in the credit dataset. These results are consistent with previously reported performance on the same benchmark datasets. Therefore, observed improvements in expert decision outcomes can be attributed primarily to the integration of explainability and human-oversight mechanisms. 2.9. HCAI-DSS Prototype with Explainability and Feedback We prototyped a Human-Centered AI Decision Support System (HCAI-DSS) that connects model-interpretable features, explainable AI outputs, and a user feedback loop. The HCAI-DSS was developed as a web-based application with Streamlit (a free Python library for building interactive data apps). This interface was developed and designed to make interaction between domain experts and the AI model recommendations intuitive, show explanations for each recommendation, and allow real-time corrections/feedback. The HCAI-DSS interface presented domain-specific cases, AI-generated predictions, SHAP-based explanations, and override functionality within an interactive web-based environment. For each prediction, the HCAI-DSS presented SHAP-based explanations highlighting the most influential features contributing to the model output. Explanations were displayed through visual feature-importance summaries accompanied by concise textual descriptions to support interpretability and expert understanding. The system also provided a global feature-importance overview derived from aggregated SHAP values across the training data, enabling users to understand the overall decision logic of the predictive models. The HCAI-DSS incorporated a human-feedback mechanism allowing experts to confirm or override AI-generated recommendations. User corrections provided by qualified domain experts were logged for later analysis and potential future model refinement. The override functionality was intended to support expert supervision and collaborative human-AI decision-making rather than unrestricted modification of AI outputs by inexperienced users. To maintain experimental consistency, predictive models were not retrained during the study. The system was implemented as a Streamlit-based web application integrating predictive models, SHAP explanations, and user-feedback mechanisms, while all user interactions and decision outcomes were logged for subsequent analysis. 2.10. Baseline Black-Box System To provide a controlled baseline comparison, a black-box DSS version was implemented using the same predictive models, datasets, and interface structure as the HCAI-DSS. However, the black-box system did not provide explainability features, SHAP visualizations, confidence reasoning, or user-feedback mechanisms. Participants received only the final prediction output generated by the model, without additional interpretive support or override functionality. This controlled design enabled direct evaluation of the effects of explainability and human-centered interaction mechanisms on user trust, usability, and decision performance. 2.11. User Study Design A within-subject experimental study was conducted to compare the proposed HCAI-DSS with a conventional black-box AI system. The study evaluated differences in trust, trustworthiness, usability, decision accuracy, and decision time between explainable human-centered AI and opaque AI-assisted decision-making systems. A total of 30 domain experts participated in the study, including 15 healthcare professionals and 15 finance specialists. The healthcare group consisted of medical professionals with experience in cardiovascular diagnosis, while the finance group included credit analysts and risk-management professionals familiar with loan evaluation tasks. All participants had relevant professional experience in their respective domains and provided informed consent before participation. Given the within-subject experimental design, statistical power analysis was conducted to ensure sufficient sensitivity for detecting differences between system conditions. Using G*Power 3.1 [ 23], assuming a medium expected effect size (d = 0.5), α = 0.05, and desired statistical power of 0.80, the required minimum sample size was estimated at 24 participants. The final sample of 30 domain experts, therefore, exceeded this threshold and provided adequate statistical sensitivity for detecting moderate effects. Healthcare participants completed diagnostic tasks based on patient records derived from the UCI Heart Disease dataset, while finance participants completed credit-risk assessment tasks derived from the German Credit dataset. Each participant completed 10 decision-making tasks in total: five using the HCAI-DSS and five using the black-box system under counterbalanced conditions. Before the experiment, participants received a brief introduction to the system interfaces and experimental procedure. During each task, participants reviewed case information, examined the AI-generated recommendation, and submitted a final decision. In the HCAI-DSS condition, participants additionally received SHAP-based explanations and could override AI-generated recommendations when necessary. Following each experimental condition, participants completed questionnaires evaluating trust, trustworthiness, and usability. Objective performance measures, including decision accuracy and decision time, were also recorded for all tasks. 2.12. Measures and Evaluation Metrics We collected both subjective and objective measures to evaluate the two systems. The primary metrics of interest were trust, usability, decision accuracy, and decision time. All metrics were computed for each participant per system condition, allowing within-subject comparisons. Trust: Trust was measured using Likert-scale items adapted from established trust-in-automation instruments [ 16, 17]. Participants rated confidence, reliability, and trust in AI recommendations on a 5-point scale, and mean trust scores were computed for each condition. Trustworthiness: Trustworthiness was evaluated using multidimensional criteria derived from the EU Trustworthy AI Guidelines and HCAI principles, including transparency and understanding, usefulness of explanations, perceived reliability and consistency, perceived safety, and user agency and control. Each item was rated on a 5-point Likert scale and aggregated into an overall Trustworthiness Score. This multidimensional metric complements the trust and usability scores, offering a more complete assessment of the system’s alignment with Trustworthy AI requirements. Usability: Usability was evaluated using the System Usability Scale (SUS) [ 24, 25], producing standardized usability scores between 0 and 100 for each system condition. Decision Accuracy: Decision accuracy was calculated by comparing participant decisions with ground-truth dataset labels. Accuracy percentages were computed separately for each system condition. Decision Time: Decision time was measured as the duration between case presentation and final participant decision for each task. Qualitative participant feedback was additionally collected during post-experiment debriefing sessions to contextualize the quantitative findings. 2.13. Statistical Analysis Statistical analyses were conducted to compare the HCAI-DSS and black-box conditions across trust, trustworthiness, usability, decision accuracy, and decision time metrics. Given the within-subject experimental design, paired-sample t-tests were used to evaluate differences between conditions. Before hypothesis testing, normality of paired differences was assessed using Shapiro–Wilk tests and Q–Q plots, confirming that parametric testing assumptions were satisfied. Statistical significance was evaluated using a two-tailed significance level of α = 0.05. Effect sizes were calculated using Cohen’s d for paired samples, and 95% confidence intervals were computed for all mean differences. All analyses were conducted using Python libraries, including SciPy and Pingouin. The experimental data were organized in paired format, with each participant evaluated under both system conditions. This statistical approach enabled rigorous comparison of the impact of explainability, transparency, and human-oversight mechanisms on user trust, usability, and decision performance. 3.1. Comparative Evaluation Results The within-subject analysis comparing the HCAI-DSS with the black-box AI system revealed significant differences across all underpinning dimensions of decision support (trust, perceived usability, decision accuracy, and user experience). Quantitative results from statistical analysis are shown in this section and are supported by qualitative insights shared by expert participants. 30 domain experts (15 healthcare, 15 finance) completed 10 decision tasks each: five with HCAI-DSS and five with the black-box system. 3.2. User Trust As shown in , Participants reported significantly higher levels of trust in the HCAI-DSS compared to the black-box system. Trust scores were significantly higher for the HCAI-DSS, indicating that explainability and user-control mechanisms improved confidence in system recommendations. 3.3. Trustworthiness Evaluation Results presents the multidimensional trustworthiness evaluation. The HCAI-DSS scored significantly higher across all dimensions compared to the black-box system. The overall trustworthiness score was computed as the participant-level average across all trustworthiness dimensions. Because aggregation across multiple dimensions reduces variability at the participant level, the resulting overall standard deviation is lower than several individual dimension-level standard deviations. Explainability usefulness was evaluated only for the HCAI-DSS condition because the black-box system did not provide explanation features. 3.4. Usability Evaluation Results The HCAI-DSS demonstrated significantly better usability, despite offering more information and interaction. summarizes the SUS scores obtained for the HCAI-DSS and the black-box system. The HCAI-DSS achieved usability scores within the “excellent” range (>80), whereas the black-box system achieved only moderate usability scores (~70). Participants reported that the explainability features, interactive controls, and override functionality improved clarity and overall interaction quality despite the increased amount of presented information. Several participants noted that the HCAI-DSS felt more intuitive and logically structured, while the black-box system was perceived as less transparent and less informative. 3.5. Decision Accuracy Results The HCAI-DSS significantly improved decision-making accuracy across both healthcare and finance tasks. summarizes the decision accuracy achieved by the HCAI-DSS and the black-box AI system. Decision accuracy improved significantly under the HCAI-DSS condition, suggesting that explainability and override mechanisms supported more effective expert judgment and correction of model errors. 3.6. Decision Time Participants took slightly more time when using the HCAI-DSS, but the difference was not statistically significant. The decision time comparison results are presented in . Decision-making using the HCAI-DSS required slightly longer interaction times due to the presence of explainability information and additional user interaction features. However, the increase in reading and evaluation time did not substantially reduce overall efficiency. Several participants reported that the improved clarity and transparency reduced hesitation during decision-making, partially offsetting the additional time required to review explanations. 3.7. Use of Explanations and Feedback Usage logs indicated strong engagement with the explainability and feedback features of the HCAI-DSS. Experts consulted SHAP-based explanations in approximately 80% of decision-making tasks, particularly in situations where AI-generated recommendations contradicted professional intuition or when multiple risk factors appeared ambiguous or borderline. Model overrides occurred in approximately 30% of tasks, with 82% of overrides resulting in correct final decisions, suggesting effective collaboration between human expertise and AI recommendations. Qualitative feedback further indicated that transparency improved accountability, user control increased decision confidence, and explainability features helped reduce cognitive burden in uncertain cases. 4.1. Overview The findings of this study provide strong empirical evidence that integrating Human-Centered AI (HCAI) principles into Decision Support Systems (DSS) can significantly enhance user trust, usability, and decision accuracy without imposing a substantial time burden. Across two high-stakes application domains, healthcare diagnosis and financial risk assessment, experts consistently preferred the HCAI-DSS over a functionally identical black-box AI system. These results have direct implications for the design of transparent, trustworthy AI systems that align with emerging global regulations, such as the EU AI Act. 4.2. Impact of Transparency on Trust and Decision Quality Importantly, trust translated into better decision-making performance. The 10.2% increase in accuracy suggests that explanations did not merely give users confidence but also provided actionable insight. When the model erred, experts used explanations to detect mis-weighted features and overruled the model appropriately. This demonstrates the value of human-AI complementarity, where each compensates for the other’s weaknesses [ 16 These results support prior studies showing that explainability improves human override behavior, reduces automation bias, and helps experts detect algorithmic errors [ 16, 17]. Our findings extend this literature by confirming these effects in a realistic, domain-specific DSS, not a simulated or artificially simplified task. 4.3. Trustworthiness Beyond Trust While trust is a key outcome, our expanded evaluation shows that trustworthiness is a multidimensional construct encompassing transparency, reliability, safety, explainability quality, and user agency. The HCAI-DSS significantly outperformed the black-box system across all dimensions. These findings reinforce that trustworthiness arises not only from accurate predictions but from the user’s perception of clarity, control, and procedural reliability, core principles emphasized in the EU Trustworthy AI Guidelines [ 6 4.4. Usability Benefits Despite Added Complexity Surprisingly, the HCAI-DSS scored significantly higher on the System Usability Scale (SUS) than the black-box system. This contradicts a common assumption that adding explanations increases cognitive load and decreases usability. Two factors may explain this outcome. First, explanations helped experts interpret results using familiar domain concepts, making the interface feel more intuitive. Second, override functionality increased users’ sense of control and reduced uncertainty associated with opaque AI systems. 4.5. Human-in-the-Loop Feedback Enhances System Reliability Although model retraining was not part of the experiment, the feedback loop generated a rich dataset of expert corrections. The high correctness rate of overrides (82%) demonstrates that experts contribute valuable domain knowledge that could be used to improve the system over time. This finding reflects a central principle of HCAI: AI systems should adapt to human expertise and decision processes rather than forcing users to adapt to opaque algorithmic behavior. Our framework operationalizes this principle through explicit override mechanisms, structured feedback logging, and support for future model refinement using expert corrections. This approach aligns with continuous learning, post-deployment monitoring, and system-improvement requirements emphasized in the EU AI Act for high-risk AI systems. 4.6. Efficiency Trade-Offs: No Significant Decrease in Timeliness As expected, decision times were slightly higher in the HCAI-DSS condition; however, this difference was not statistically significant. Participants reported that explainability features helped accelerate evaluation in more complex or ambiguous cases, whereas the lack of transparency in the black-box system often increased hesitation and uncertainty during decision-making. These findings suggest that transparency and explainability can be integrated into DSS environments without substantially reducing operational efficiency, an important consideration for real-world deployment in time-sensitive domains. 4.7. Implications for the Design of Future DSS The findings provide several important design recommendations for future AI-assisted DSS platforms. First, explainability mechanisms should be integrated directly into decision workflows rather than treated as optional add-on components. Experts consistently relied on explanations, particularly in situations where AI-generated recommendations contradicted their professional intuition. Second, explicit override mechanisms are essential for maintaining human authority, accountability, and regulatory compliance in high-risk decision environments. Third, combining both local and global explanation techniques appears beneficial, as local SHAP explanations supported individual case evaluation while global feature-importance summaries helped users understand broader model behavior. Finally, structured feedback mechanisms can support long-term alignment between AI outputs and domain expertise by enabling future model refinement based on expert corrections. Collectively, these design principles align closely with transparency, accountability, auditability, and human-oversight requirements emphasized in the EU AI Act. 4.8. Generalizability Across Domains Similar performance improvements observed across both healthcare and financial decision-making tasks suggest that HCAI principles can generalize effectively across different high-risk domains. The findings indicate that transparency and explainability provide consistent benefits in expert-supported decision environments, even when predictive model accuracy is already relatively strong. These results strengthen the external validity of the proposed HCAI-DSS framework and support its broader applicability to other high-stakes decision-support contexts. 4.9. Limitations Several limitations should be acknowledged. First, although the sample size of 30 domain experts was statistically sufficient for detecting moderate-to-large effects within the paired experimental design, future studies involving larger and multi-institutional participant groups would improve external validity and enable subgroup analysis across specialties and experience levels. Second, the proposed system did not incorporate real-time adaptive learning, as predictive models were not retrained dynamically during the experiment. Future research should therefore investigate active-learning and online-adaptation mechanisms. Third, while the selected tasks were representative of healthcare and financial decision-support scenarios, they cannot fully capture the complexity of real-world operational workflows. Additionally, this study relied exclusively on SHAP-based explainability, whereas alternative methods such as counterfactual explanations, rule extraction, and LIME may provide complementary interpretability benefits. Finally, the experiment was conducted in a controlled setting, and real-world deployment may introduce additional organizational, legal, and workflow-related constraints not examined in the present study. 5.1. Conclusions This study demonstrated that integrating Human-Centered Artificial Intelligence (HCAI) principles into Decision Support Systems (DSS) significantly improves trust and perceived trustworthiness across two high-stakes domains: healthcare diagnosis and financial risk assessment. By combining interpretable ensemble models, SHAP-based explanations, explicit user-override mechanisms, and structured feedback logging, the proposed HCAI-DSS offers a transparent and user-aligned alternative to traditional black-box decision support tools. Quantitative results show that experts trusted the HCAI-DSS significantly more (d = 1.23), rated it as substantially more usable (SUS +15 points), and achieved higher decision accuracy (+10.2%) compared to the black-box system. Importantly, these benefits were achieved without a significant loss in efficiency, indicating that transparency and human control can be integrated without harming workflow speed. Qualitative feedback further emphasized experts’ preference for systems that reveal model reasoning and respect for human agency. Collectively, these findings offer empirical evidence to support the argument that HCAI principles contribute to improved AI-mediated decision-making by increasing system performance and acceptance from users. The findings also provide engineering design implications for organizations designing deployable trustworthy AI systems under new regulatory forms, like the EU AI Act, which demands transparency, human oversight, and accountability in high-risk AI applications. 5.2. Practical Implications The study provides practical implications for designers, developers, and decision-makers involved in the development of AI-assisted decision-support systems. The findings suggest that transparency should be integrated as a core architectural component rather than treated as an optional feature. Similarly, human oversight mechanisms, including override, audit, and feedback functionalities, are essential for responsible and trustworthy AI deployment. The results also demonstrate the value of expert-feedback data for supporting future model refinement and continuous learning processes. Furthermore, the consistent improvements observed across healthcare and financial domains indicate that HCAI-DSS approaches can generalize effectively across different high-risk decision-making environments. Collectively, these findings guide the development of AI systems that are not only technically accurate but also aligned with human-centered, ethical, and regulatory expectations. 5.3. Contributions and Novelty This study contributes to the growing body of Human-Centered AI research by presenting a technically implemented and empirically evaluated HCAI-DSS framework for high-stakes decision-making environments. The proposed system integrates explainable AI mechanisms, user-override functionality, and feedback logging within a unified DSS architecture. Experimental evaluation across healthcare and financial domains demonstrated that human-centered explainability features significantly improve trust, usability, and decision accuracy compared to conventional black-box systems. The findings further provide practical guidance for designing trustworthy AI systems aligned with emerging regulatory requirements, emphasizing transparency, accountability, and meaningful human oversight. 5.4. Future Work Future research should investigate real-time adaptive learning mechanisms in which expert feedback is integrated into continuous or periodic model refinement processes. Studying these dynamic interactions may provide a deeper understanding of trust calibration and long-term human-AI collaboration. Additional explainability approaches, including counterfactual explanations, rule extraction, and natural-language rationales, could also be incorporated alongside SHAP-based explanations to better address diverse user preferences and cognitive styles. Large-scale field deployments in operational environments such as hospitals and financial institutions would further support evaluation of organizational constraints, long-term adoption, and real-world decision impact. In addition, the proposed framework could be extended to other high-risk domains, including supply-chain planning, fraud detection, industrial operations, and public-sector decision support. Future longitudinal studies should also examine how users interact with and rely on explainable AI systems over extended periods of use to better understand trust dynamics, explainability fatigue, behavioral adaptation, and evolving human-AI reliance patterns. Figure 1. Proposed HCAI-DSS architecture integrating explainability, user feedback, and human oversight mechanisms. Figure 1. Proposed HCAI-DSS architecture integrating explainability, user feedback, and human oversight mechanisms. Trust Comparison (Likert Scale 1–5). Trust Comparison (Likert Scale 1–5). Metric HCAI-DSS (M ± SD) Black-Box (M ± SD) Mean Difference t (29) p-Value Cohen’s d 95% CI Trust Score ୪.୨୩ ବ୍ଦ ୦.୫୨ ୨.୯୮ ବ୍ଦ ୦.୬୧ +1.25 8.47 <0.001 1.23 (large) [0.93, 1.57] Multidimensional Trustworthiness Evaluation. Multidimensional Trustworthiness Evaluation. Dimension HCAI-DSS (M ± SD) Black-Box (M ± SD) Mean Diff p-Value Transparency/Understanding ୪.୪୧ ବ୍ଦ ୦.୫୧ ୨.୬୫ ବ୍ଦ ୦.୫୮ +1.76 <0.001 Explainability Usefulness ୪.୫୨ ବ୍ଦ ୦.୪୯ — — — Perceived Reliability ୪.୨୦ ବ୍ଦ ୦.୫୬ ୩.୦୫ ବ୍ଦ ୦.୬୦ +1.15 <0.001 Perceived Safety & Risk ୪.୧୮ ବ୍ଦ ୦.୬୩ ୩.୦୨ ବ୍ଦ ୦.୭୨ +1.16 <0.001 User Agency/Control ୪.୬୦ ବ୍ଦ ୦.୪୪ ୨.୨୦ ବ୍ଦ ୦.୬୬ +2.40 <0.001 Overall Trustworthiness Score ୪.୩୮ ବ୍ଦ ୦.୪୦ ୨.୭୩ ବ୍ଦ ୦.୫୨ +1.65 <0.001 Usability (SUS Scores). Usability (SUS Scores). Metric HCAI-DSS Black-Box Mean Difference t (29) p-Value Cohen’s d 95% CI SUS Score (0–100) ୮୪.୬ ବ୍ଦ ୭.୯ ୬୯.୨ ବ୍ଦ ୮.୭ +15.4 6.12 <0.001 1.01 [10.2, 20.8] Decision Accuracy. Decision Accuracy. Metric HCAI-DSS Black-Box Mean Difference t (29) p-Value Cohen’s d 95% CI Accuracy (%) 86.7% ± 9.4 76.5% ± 10.8 +10.2% 4.28 <0.001 0.78 [5.1%, 15.7%] Comparison of Decision Time Between HCAI-DSS and Black-Box AI. Comparison of Decision Time Between HCAI-DSS and Black-Box AI. Metric HCAI-DSS Black-Box Mean Difference t (29) p-Value Avg. Time per Task (seconds) ୨୦୧.୩ ବ୍ଦ ୩୪.୫ ୧୮୨.୭ ବ୍ଦ ୩୧.୨ +18.6 s 1.82 0.078

www.mdpi.com

Zum Originalartikel

Human-Centered AI for Decision Support Systems: Enhancing Usability and Trustworthiness

http://youtube.com/watch?v=fmgTBEgqayI#UgxrRWDw_dR-UhtXjFN4AaABAg

Die Mitglieder der «Mitte» Laufenburg haben getagt

Human-Centered AI for Decision Support Systems: Enhancing Usability and Trustworthiness

http://youtube.com/watch?v=fmgTBEgqayI#UgxrRWDw_dR-UhtXjFN4AaABAg

Die Mitglieder der «Mitte» Laufenburg haben getagt

Prometheus - Die linke Stimme der Schweiz