Study, year | Input features | Training/validation strategy | Performance | Comparative algorithms and scoring metrics | Key findings | Limitation |
---|---|---|---|---|---|---|
Akbulut et al. [2], 2023 (Turkey) | Neutrophil, WLR, NLR, CRP, WNR, PNR, PDW, and MCV | The persistence method was repeated 50 times with different seeds for model robustness. CatBoost model predicted AA and perforated AA, with optimized hyperparameters using grid search with tenfold cross-validation and 5 replicates | CatBoost model performance for classification: Sensitivity 84.2%, Specificity 93.2%, AUC 0.947, Accuracy 88.2%, F1-score 88.7% CatBoost model: Accuracy 0.92, F1-score 91.1%, Sensitivity 94.1%, Specificity 90.5%, and AUC 0.969 | NR | 1. First study to combine ML and XAI for AA and perforated AA estimation 2. Identified biochemical blood parameters that can predict AA and perforated AA | 1. The study is retrospective and lacks comprehensive clinical data 2. Radiological data are missing for approximately 11% of the patient sample 3. Conducted at a single institution |
Phan-Mai et al. [23], 2023 (Vietnam) | Demographic characteristics, blood tests, and ultrasound. Blood tests consisted of total WBC, granulocyte count, lymphocyte count, and CRP | Imbalanced data was addressed using SMOTE. Optimal parameters were selected using k-fold validation. The data of 1,950 patients were split randomly into 70% for training and 30% for testing | GB model (imbalanced unadjusted data): Accuracy: 81%, AUC: 0.753 GB model (imbalanced adjusted data): Accuracy: 82%, AUC: 0.890 KNN model (imbalanced unadjusted data): Accuracy: 77.6%, AUC: 0.672, KNN model (imbalanced adjusted data): Accuracy: 74.1%, AUC: 0.831 DT model (imbalanced unadjusted data): Accuracy: 70.3%, AUC: 0.601 DT model (imbalanced adjusted data): Accuracy: 73.8%, AUC: 0.738 ANN model (imbalanced unadjusted data): Accuracy: 80.5%, AUC: 0.734 ANN model (imbalanced adjusted data): Accuracy: 74.2%, AUC: 0.810 LR model (imbalanced unadjusted data): Accuracy: 80.3%, AUC: 0.714 LR model (imbalanced adjusted data): Accuracy: 72.9%, AUC: 0.789 SVM model (imbalanced unadjusted data): Accuracy: 75.2%, AUC: 0.711 SVM model (imbalanced adjusted data): Accuracy: 65.5%, AUC: 0.730 | NR | 1. High validity of ML models in classifying CA 2. GB model most valid 3. Models useful as screening tools | 1. Small sample size 2. Single-hospital data 3. Low rate of complicated cases 4. Insufficient qualitative data 5. Not for definitive diagnosis |
Li et al.[24], 2023 (China) | age, stage of pregnancy; symptom duration time, vital signs, physical examination findings; laboratory test results; and image findings (US) | NR | LR based score (Cutoff = 16) Sensitivity: 64%, Specificity: 84%, Accuracy: 75%, PPV: 73%, NPV: 77%, AUC: 0.80 (95% CI = 0.75–0.84) DT model: AUC: 0.78 | NR | 1. Higher premature birth and abortion rates in pregnant patients with CA 2. Treatment delay increases these rates 3. Models using LR and DT effectively distinguish CA from UCA 4. Models combine clinical and laboratory tests 5. Appendix diameter had an AUC of 0.68 in 116 cases | 1. Single-center study 2. No external validation 3. Limited patient number 4. Appendix diameter not included |
Lin et al. [25], 2023 (Taiwan) | CRP level, NLR, CT findings (fat-stranding sign, appendicolith, and ascites) | The data preprocessing involved standardizing independent variables AA patients to a scale of 0 to 1. Patients were then randomly divided into training and testing datasets at a 70:30 ratio. A single hidden layer with three neurons was chosen using a predefined value to avoid overfitting, as it was sufficient for the dataset | ANN model (MLP): AUC: 0.950, Sensitivity: 85.7%, Specificity: 91.7%, LR + : 10.36, LR-: 0.16 | NR | 1. A three-layer MLP with three hidden neurons performed well 2. Practical application would require an integrated system for immediate predictions after a CT scan | 1. Single-center study 2. Broad definition of complicated appendicitis 3. Potential variation in definitions across studies |
Eickhoff et al. [26], 2022 (Germany) | Age, gender, height, weight, and BMI, clinical-anamnestic data such as the ASA score, comorbidities, and perioperative data (time interval from admission to appendectomy, operative time, hemoglobin, CRP, WBC, platelets, INR, open surgery, laparoscopic surgery, conversion, extended surgical procedures during appendectomy, drains) as predictor variables | The dataset was split into 10 equal parts. 90% was used for training and 10% for validation. This process was repeated for all sections of the data, rotating the test sample. This was done 50 times for stable performance assessment | RF model: Need for ICU (Accuracy: 77.2%, Sensitivity: 77.9%, Specificity: 76.9% Longer stay > 24 h in ICU (Accuracy: 87.5%, Sensitivity: 88.4%, Specificity: 87.4%) Complications measured by Clavien-Dindo > 3 in new cases (Accuracy: 68.2%, Sensitivity: 61.6%, Specificity: 69.5%) Re-operation after initial appendectomy (Accuracy: 74.2%, Sensitivity: 47.5%, Specificity: 77.2% occurrence of surgical site infection (Accuracy: 66.4%, Sensitivity: 66.2%, Specificity: 66.4%) Need for oral antibiotic therapy after discharge (Accuracy: 78.8%, Sensitivity: 76.4%, Specificity: 79.1%) More than 7 days of hospital stay (Accuracy: 76.2%, Sensitivity: 74.3%, Specificity: 77.9%) More than 15 days of hospital stay (Accuracy: 83.6%, Sensitivity: 60%, Specificity: 85.1%) | NR | 1. Developed ML model for post-op outcomes in perforated appendicitis 2. The model predicts the need for intensive care 3. Suggests early transfer to higher-level care facilities | 1. Single-center, retrospective study 2. Small sample size |
Xia et al. [27], 2022 (China) | Gender, age, temperature, heart rate, WBC, lymphocytes, neutrophils, monocytes, eosinophils, hemoglobin, erythrocytes, platelets, urea nitrogen, blood sugar, creatinine, bilirubin, CRP | Used tenfold cross-validation for overall classification evaluation, and fivefold cross-validation for parameter optimization. Assessed using 12 benchmark functions | OBLGOA-SVM model: Accuracy: 83.6%, MCC: 67.3%, Sensitivity: 81.7%, Specificity: 85.3% | GOA-SVM model: Accuracy: 81%, MCC: 64%, Sensitivity: 78% Specificity: 84% GS-SVM model: Accuracy: 79%, MCC: 59%, Sensitivity: 72%, Specificity: 86% RF model: Accuracy: 82%, MCC: 65%, Sensitivity: 82%, Specificity: 82% ELM model: Accuracy: 77%, MCC: 55%, Sensitivity: 72%, Specificity: 81% KELM model: Accuracy: 78%, MCC: 57%, Sensitivity: 71%, Specificity: 84% BPNN model: Accuracy: 76%, MCC: 52%, Sensitivity: 75%, Specificity: 76% | 1. Proposed OBLGOA-SVM framework for CA vs. UCA 2. Improved GOA for SVM parameters 3. Method outperformed rivals in evaluations 4. CRP, heart rate, temp, and neutrophils predict CA | 1. No radiological findings (ultrasound, CT scans) 2. Insufficient cases from a single center 3. Uncontrolled, retrospective study |
Kang et al. [28], 2021 (China) | Age, gender, clinical signs and symptoms score, abdominal pain score, vomiting score, abdominal pain time, abdominal pain type, abdominal tenderness pain range, and the highest temperature. laboratory records: blood routine, coagulation function, blood biochemistry, WBC, NE, CD3 + T, CD4 + T, CD8 + T, CD19 + T, CD16 + 56, NK, total T cell counts, helper T cell counts, inhibitors T, B cell counts, NK cell counts, CD4 + /CD8 + ratio, CRP, PCT, and blood NLR ratio | LR models were created separately for SA/PA and PA/GPA groups using selected features from the training dataset. Clinical features were added to establish combined LR models. Models were then validated using testing sets | LR model: Acute SA vs. PA (based on T cell subsets alone): training set (AUC: 0.904, Accuracy: 87.5%, Sensitivity: 75%, Specificity: 100%), testing set (AUC: 0.910, Accuracy: 87.5%, Sensitivity: 75%, Specificity: 100%), Acute SA versus acute PA (based on T cell subsets and clinical signs and symptoms): training set (AUC: 0.921, Accuracy: 91%, Sensitivity: 81.9%, Specificity: 100%) testing set (AUC: 0.926, Accuracy: 90.6%, Sensitivity: 81.2%, Specificity: 100%), Acute PA vs. acute GPA (based on T cell subsets alone): training set (AUC: 0.834, Accuracy: 82.6%, Sensitivity: 81.9%, Specificity: 83.3%) testing set (AUC: 0.821, Accuracy: 80.6%, Sensitivity: 90.3%, Specificity: 71%), Acute PA vs. acute GPA (based on T cell subsets and clinical signs and symptoms) training set: (AUC: 0.867, Accuracy: 80.6%, Sensitivity: 73.6%, Specificity: 87.5%), testing set (AUC: 0.854, Accuracy: 77.4%, Sensitivity: 90.3%, Specificity: 64.5%) | NR | 1. Established a quick diagnosis model using peripheral blood biomarkers for AA pathology | 1. Limited cases 2. Single-center data source 3. The study could not fully prove biomarkers’ predictive value due to sample size and false positives |
Corinne Bunn et al. [29], 2021 (USA) | Demographic, comorbid conditions, preoperative laboratory results, days, and procedure-related information | The dataset is split into 80% training and 20% hidden testing. Missing data imputed using multivariable imputation for complete analysis | Postoperative sepsis prediction LR model: AUC: 0.69, Sensitivity: 62%, Specificity: 65% SVM model: AUC: 0.51 RFDT model: AUC: 0.69, Sensitivity: 67%, Specificity: 60% XGB model: AUC: 0.70, Sensitivity: 64%, Specificity: 66% Ensemble model (LR, RFDT, and XGB): AUC: 0.70, Sensitivity: 64%, Specificity: 60% Postoperative sepsis prediction as a risk factor for 30-day mortality: LR model: AUC: 0.92, Sensitivity: 82%, Specificity: 87% SVM model: AUC: 0.5 RFDT model: AUC: 0.96, Sensitivity: 93%, Specificity: 84% XGB model: AUC: 0.93, Sensitivity: 89%, Specificity: 85% Ensemble model (LR, RFDT, and XGB): AUC: 0.95, Sensitivity: 89%, Specificity: 89% | NR | 1. ML methods predict postoperative sepsis after appendectomy with moderate accuracy 2. Risk factors for postoperative sepsis: recent CHF exacerbation, acute renal failure, preoperative transfusion | 1. High false positive rates in clinical implementation 2. The study focuses on non-septic cases, isolating early-stage disease 3. Missing intraoperative findings data 4. ML is used on a national database, not EHR data 5. ML does not outperform LR due to dataset quality |