- Research article
- Open Access
Developing and validating of Ramathibodi Appendicitis Score (RAMA-AS) for diagnosis of appendicitis in suspected appendicitis patients
World Journal of Emergency Surgeryvolume 12, Article number: 49 (2017)
Diagnosis of appendicitis is still clinically challenging where resources are limited. The purpose of this study was to develop and externally validate Ramathibodi Appendicitis Score (RAMA-AS) in aiding diagnosis of appendicitis.
A two-phase cross-sectional study (i.e., derivation and validation) was conducted at Ramathibodi Hospital (for derivation) and at Thammasat University Hospital and Chaiyaphum Hospital (for validation). Patients with abdominal pain and suspected of having appendicitis were enrolled. Multiple logistic regression was applied to develop a parsimonious model. Calibration and discrimination performances were assessed. In addition, our RAMA-AS was compared with Alvarado’s score performances using ROC curve analysis.
The RAMA-AS consisted of three domains with seven predictors including symptoms (i.e., progression of pain, aggravation of pain, and migration of pain), signs (i.e., fever and rebound tenderness), and laboratory tests (i.e., white blood cell count (WBC) and neutrophil). The model fitted well with data, and it performed better discrimination than the Alvarado score with C-statistics of 0.842 (95% CI 0.804, 0.881) versus 0.760 (0.710, 0.810). Internal validation by bootstrap yielded Sommer’s D of 0.686 (0.608, 0.763) and C-statistics of 0.848 (0.846, 0.849). The C-statistics of two external validations were 0.853 (0.791, 0.915) and 0.813 (0.736, 0.892) with fair calibrations.
RAMA-AS should be a useful tool for aiding diagnosis of appendicitis with good calibration and discrimination performances.
Appendicitis is one of the most common causes of acute abdominal pain, with an incidence of 110/100,000 . Although, many attempts have been made to improve the diagnostic accuracy, false negative rates remain common with rates of negative appendectomy of 15 to 26% [2, 3] and perforated appendectomy of 10 to 30% .
The critical evaluation of appendicitis should balance between early operation to minimize complicated appendicitis (i.e., perforation, gangrene, and abscess) and a conservative approach reducing unnecessary operation. Several scores had been developed for screening of appendicitis, e.g., Alvarado , modified-Alvarado Fenyo , Eskelinen , etcetera. A systematic review of previous appendicitis scores was conducted to explore their methods used for developments, validations, and performances . Surprisingly, about two-thirds of those studies developed scores based on univariate analysis, and none had evaluated their impacts on health outcome in clinical practice . With poor methodology in previous score developments, we therefore conducted our study, which aimed to develop and externally validate Ramathibodi Appendicitis Score (RAMA-AS).
The design was a cross-sectional study consisting of derivation and validation phases. Derived data were collected at Ramathibodi Hospital (RH), whereas validated data were collected at Thammasat University Hospital (TH) and Chaiyaphum Hospital (CH) from January 2013 to May 2015. The RH and TH are the Schools of Medicine, whereas CH is a provincial hospital.
The study was conducted and reported according to Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD)  and STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) . Consecutive suspected appendicitis patients presenting with abdominal pain were included with following criteria: aged 15–60 years, right side abdominal pain within 7 days, had at least one of the following symptoms (i.e., right lower abdominal pain, migration of abdominal pain, anorexia, nausea, vomiting) and signs (i.e., raised body temperature, right lower quadrant tenderness, guarding, rebound tenderness, and decreased bowel sound), and willing to participate and gave consent. Exclusion criteria were patients who could not give the history of illness, had myocardial infarction or terminal illness, abdominal mass, tumor or malignancy of appendix.
Outcome and predictors
The interested outcome was acute appendicitis by histopathological diagnosis for operative patients. For those patients with conservative management, telephone was made to confirm the final diagnosis 6 weeks after visiting.
As for our literature review, a total of 8–10 variables were potentially included in the final risk prediction score. A simulation study indicated that a number of events per variable of at least 10 to 30 yielded less bias in coefficient estimation of logistic regression , which was known as a rule of thumb as per recommendation .Using a rule of thumb of at least 20 appendicitis patients per variable required 200 appendicitis patients for 10 variables. The prevalence of appendicitis in our setting was 62% from our pilot study. As a result, 355 patients were needed. Taking into account for missing data of 20%, at least 388 patients were finally required. In addition, an additional 100 subjects (i.e., about 30% of derived subjects) were enrolled from each of the external sites for external validation.
Multiple imputation was applied to predict missing variables using a simulation-based approach which assumed data were missing at random [14, 15]. A linear truncated regression was applied by regressing missing data on complete data with a number of 20 imputations as per recommendation . Performance of imputation can be assessed using relative variance increase (RVI) and fraction of missing information (FMI). The RVI refers to average relative increase in variances of estimates because of missing variables (i.e., mean of variance of all coefficients from missing data); and as this value closes to 0, missing data reflect less on estimates. The FMI refers to the largest fraction of missing information of coefficient estimates due to missing data. The number of imputations should be roughly estimated based on a rule of thumb, i.e., FMI×100. For instance, if FMI = 0.15, the number of imputations = 0.15 × 100, i.e., at least 15 imputations are required.
A simple logistic regression analysis was used to screen variables that might associate with appendicitis. Individual variables of 4 domains (i.e., demographic data, clinical symptoms, clinical signs, and laboratory tests) were fitted in a logit model, and a likelihood ratio (LR) test was used to select variables. Variables with p values < 0.20 were simultaneously considered in a multivariate logit model. Only significant variables were kept in a parsimonious-model. Goodness of fit was assessed whether the expected (E) or predicted and observed (O) values were close using chi-square Hosmer-Lemeshow test . In addition, a calibration coefficient (O/E) and its 95% confidence interval (CI) were also estimated. The coefficients of the final parsimonious- model were used to create the RAMA-AS. The receiver operating characteristic (ROC) curve, which plotted sensitivity versus 1- specificity, was used to calibrate the score cutoff. Diagnostic parameters (i.e., sensitivity, specificity, likelihood ratio positive (LR+) and negative) were estimated for each distinct value of the scores. The area under ROC, called C-statistic, was estimated, and value close to one reflected higher discrimination of appendicitis from non-appendicitis .
A bootstrap technique with 450 replications was applied for internal validation of the RAMA-AS . For each bootstrap sample, the RAMA-AS score was calculated and fitted in the logit model. For calibration, the correlation between the observed and expected values of appendicitis was assessed using the Somer’D coefficient for all bootstrap data (called Dboot) and derived data (called Dorg). Calibration of the model was then assessed by subtracting the Dorg from the mean Dboot, and lower value reflected less bias and thus better calibration. Likewise, the original C-statistic was compared to an average C-statistic from the bootstraps for discrimination performance.
Data from the two external hospitals were used to validate the performances of RAMA-AS. Calibration performance was explored as mentioned above. In addition, model re-calibrations were performed by recalibrating intercept (called M1) and overall coefficient (called M2) [20, 21] as follows (see Additional file 1: Table S1: The M1 was constructed by fitting RAMA-AS on appendicitis. The estimated intercept was then used to re-calibrate by adding it up with the original intercept. The estimated coefficient from the M1 was then used to calibrate coefficient by multiplying it with overall coefficients (M2). Four model revisions were additionally performed from the M2 [10, 21,22,23], (see Additional file 1: Table S1). The M3 was constructed by fitting M2 plus significant predictors by LR test. The M4 was similar to M3 but added significant predictors by stepwise selections. The M5 re-estimated all coefficients of predictors. Finally, the M6 re-selected only significant predictors among all predictors.
Finally, the Alvarado score  was compared with the RAMA-AS using ROC curve analysis.
All analyses were performed using STATA version 14 (Stata Corp, College Station, Texas, USA) under mi estimate commands. A p value of less than 0.05 was taken as a threshold for statistical significance.
A total of 396 suspected acute appendicitis patients were enrolled from RH. Among them, 132 patients (33.3%) were male, and mean age and BMI were 36.3 ± 14.6 and 22.8 ± 4.5, respectively. A total of 245/396 (61.8%; 95% CI 56.9%, 66.7%) patients were appendicitis, with a negative appendectomy rate of 4%.
Two variables (i.e., WBC > 10,000 cell/mm3 and neutrophil > 75%) contained missing data of 43 (10.9%) and 40 (10.1%), respectively and imputed data were filled in for both variables. Performances of imputation were assessed, and the FMI was < 0.0001 for both variables, indicating 20 imputations were sufficient to fill in missing data, see Additional file 2: Table S2. The diagnostic plot was constructed by comparing missing versus observed values, suggesting no difference between the two values, see Additional file 2: Figure S1.
A total of 16 out of 20 predictive variables were suggested from a univariate analysis that they might associate with appendicitis, see Table 1. These included eight symptoms (i.e., first location of pain, migration of pain, onset, progression of pain, right lower quadrant pain at presentation, nausea or vomiting, aggravation of pain by cough or movement, and fever), five signs (i.e., bowel sound, body temperature, tenderness at right lower quadrant of abdomen, rebound tenderness, and guarding), and two laboratory tests (i.e., WBC > 10,000 cell/mm3 and neutrophil > 75%).
These variables were simultaneously included in the logit model, in which only seven variables were remained in the final model. These were three symptoms (i.e., migration of pain, progression of pain, and aggravation of pain by cough or movement), two signs (i.e., body temperature ≥ 37.8 °C and rebound tenderness), two laboratory tests (i.e., WBC > 10,000 cell/mm3 and neutrophil > 75%), and odd ratios (OR) and 95% CI were reported, see Table 2. The predictive equation was
The estimated C-statistic was 0.842 (95% CI 0.804, 0.881), see (Additional file 3: Figure S2), indicating the model well discriminated appendicitis from non-appendicitis. Hosmer-Lemeshow goodness of fit test indicated the model fitted well with the data (chi-square test = 5.64, df = 8, p value = 0.687) with the O/E ratio of 0.95 (95% CI 0.83, 1.08).
The scoring scheme was constructed using the estimated 7 coefficients, which ranged from − 3.37 to 3.99 with a median of 0.86, see Table 2. The score cutoff was calibrated and stratified into four categories, i.e., very low (score < − 0.64), low (score − 0.64 to 0.84), moderate (score 0.85 to 1.74), and high risk (score > 1.74) groups, see Table 3. The estimated LR+ for these latter three groups were 1.98 (95% CI 1.65, 2.37), 5.25 (95% CI 3.39, 8.13), and 8.36 (95% CI 3.96 to 18.00) when compared to the lowest risk group. The post-test probabilities were 76.0, 89.0, and 93.0% for low, moderate, and high risk groups, respectively (see Fagan plot in Fig. 1).
The 450 bootstraps yielded estimated Dorg and Dbootcoefficients of 0.686 and 0.695 (95% CI 0.692, 0.698) for the derivative and bootstrap models, respectively. The bias was only − 0.009 (95% CI − 0.011, − 0.007), suggesting good calibration. The bootstrap C-statistics was 0.848 (95% CI 0.846, 0.849), with a bias of − 0.005 (95% CI − 0.006, − 0.004).
A total of 330 patients with suspected acute appendicitis (152 and 178 from TH and CH, respectively) were used to externally validate the RAMA-AS. Their characteristics were described in Table 4.
Thammasat University Hospital
Comparing with RH, prevalence of appendicitis was much lower in TH, i.e., 48.7 vs 61.8, %, but the mean age was quite similar (35.6 vs 36.3 years), although the male percentage was much lower (26.4 vs 35.8%), see Table 4. Among seven predictors, distributions of rebound tenderness (42.8 vs 48.5%), progression of pain (64.5 vs 84.8%), and aggravation of pain (51.4 vs 72.5%) were little to much lower, but migration of pain (48.0 vs 44.7%), body temperature (19.7 vs 18.7%) and WBC > 10,000 cell/mm3 (82.2 vs 79.6%) and neutrophil > 75% (75.7 vs 66 .2%) were little to much higher differences. These variables were also described by appendicitis groups, indicating higher prevalence for all symptoms and signs, but not for laboratory tests, see Additional file 1: Table S3.
The estimated RAMA-AS, which ranged from − 3.4 to 4.0, seemed to work well in TH with the estimated O/E ratio of 1.005 (95% CI 0.784, 1.225; Hosmer-Lemeshow = 8.219, (df = 4), p = 0.084). However, the calibration plot showed the predicted risk deviated from the reference line (see Additional file 4: Figure S3-A), i.e., under-estimated risk for lower score and over-estimated risk for higher scores. The intercept and overall coefficients were then calibrated (see Additional file 1: Table S4), and calibration plots were constructed (see Additional file 4: Figure S3-B-C) which suggested no improvement of calibrations.
Revision M3 models by LR test indicated that migration of pain, progression of pain, body temperature, WBC, and neutrophil were significant predictors, see Additional file 1: Table S4. Comparing coefficients of M3 versus coefficients of the original RH model in Table 2, coefficients of body temperature, WBC, and neutrophil were changed from positive to negative coefficients, whereas coefficients of the rest of the predictors increased. Only migration of pain, progression of pain, and rebound tenderness were significant by stepwise selection for M4. Of these, progression of pain and rebound tenderness were much lower but migration of pain was higher than in RH, see Table 2 and Additional file 1: Table S4.
Calibration coefficients of these models were estimated, which resulted in the O/E ratio for revision M3 model and M4 of 0.940 (95% CI 0.729, 1.150; Hosmer-Lemeshow = 2.683, df = 4, p = 0.612) and 1.006 (95% CI 0.743, 1.269; Hosmer-Lemeshow = 5.00, df = 4, p = 0.287), respectively, which were much improved when compared to the M0. Calibration plots also showed better fits with the reference lines when compared to the M0, see Additional file 4: Figure S3 A, D-E. The M5 which entered all seven predictors or stepwise selection in M6 yielded similar results as M4, in which only three predictors (i.e., migration of pain, progression of pain, and rebound tenderness) were significant. The O/E ratios were 0.870 (0.578, 1.612) and 0.947 (95% CI 0.684, 1.209) and calibration plots showed better fit than M0, see Additional file 4: Figure S3 F-G.
C-statistics were estimated for all models, see Additional file 1: Table S5. These suggested that the M0 could well discriminate appendicitis from non-appendicitis with the C-statistics of 0.853 (95% CI 0.790, 0.915), and they were little improved for M3, M4, and M6, but not for M5, see Additional file 1: Table S5.
Comparing with RH (see Table 4), prevalence of appendicitis in CH was much higher (76.9 vs 61.8%), and mean age (42.9 vs 36.3 years) and male percentage were higher (39.9 vs 35.8%). Migration of pain (70.2 vs 44.7%), body temperature (37.6% vs 18.7%), and rebound tenderness (71.3 vs 48.5%) were more present, but aggravation of pain was much lower (58.4 vs 72.5%), whereas progression of pain (82.6 vs 84.8%), WBC > 10,000 cell/mm3 (76.9 vs 79.6%) and neutrophil (63.5 vs 66.2%) were little lower than RH. Distribution of these predictors between appendicitis groups were described, and all except neutrophil were more prevalent in appendicitis than non-appendicitis groups, in Additional file 1: Table S3.
A median RAMA-AS was 1.6 (− 3.4, 4.0) with O/E ratio of 0.996 (95% CI 0.695, 1.333; Hosmer-Lemeshow = 6.640 (df = 4), p = 0.156), see Additional file 1: Table S5. Calibration models were constructed (see Additional file 1: Table S4) and plotted (see Additional file 5: Figure S4 A-G). These suggested that the M0 still deviated from the reference line particularly for low and high scores. M1 and M2 did not improve calibrations when compared to the original M0. Among revision models, M3-M6, M3-M4, and M6 were improved in calibrations, particularly the M6 was the best with O/E ratios of 1.021 (95% CI 0.905, 1.186), whereas the calibration plot of M5 showed quite poor performance.
The M0’s discrimination performance was good, although it was lower than the original model (C-statistic = 0.813; 0.736, 0.892). The C-statistics for M3 to M6 were a bit higher than M0, see Additional file 1: Table S5.
Comparison of RAMA-AS and previous score
Alvarado scores was calculated which ranged of 2 to 10 (mean = 7.04). The C-statistics was 0.752 (95% CI 0.710, 0.800) which was statistically lower than RAMA-AS (p value of < 0.001, see Fig. 2).
We developed and internally and externally validated a RAMA-AS, for classifying very low, low, moderate, and high risk of having appendicitis. Predictive domains including three symptoms, two signs, and two laboratory tests were included. Internal validation showed the RAMA-AS performed well for both calibration and discrimination. The external validation showed fair calibrations and good discrimination with the O/E ratios of 1.01 (0.78, 1.23) and 0.996 (0.659, 1.333), with the C-statistics of 0.853 (95% CI 0.791, 0.915) and 0.817 (95% CI 0.736, 0.892), respectively.
Although most predictors of clinical signs, symptoms, and laboratory tests used in the RAMA-AS were similar to the Alvarado score, which was the most commonly used in prospective studies [6, 24,25,26,27,28,29], our performances were better. This might be due to difference in weighting or scoring for each predictor, distribution of predictors, and also prevalence of appendicitis itself. Our score was derived based on proper model construction, following the recommendation by TRIPOD , and let the data suggest proper weighting. Our finding was consistent to the appendicitis inflammatory response (AIR) , developed in 2008, which externally performed better than the Alvarado score. This score did not consider WBC and neutrophil, but instead included leukocyte and CRP in the model [30, 31], in which the CRP may be not a routine laboratory test in some developing countries. Thus, it is not easily applied in the setting where resources are limited. Our RAMA-AS and also these scores could rule out well, but not rule in as per WSES Jarusalem guidelines , so high risk score may need confirmation by CT scan .
Calibration performance of RAMA-AS was fair in both external data sets. This could be explained as follows: first, prevalence of appendicitis in the derived RH and validated TH and CH’s were reasonably different, i.e., 61.8 vs 48.7 vs 76.9%, respectively. Therefore, the original model over-estimated risk of appendicitis in TH, but under estimated risk in CH. We then re-calibrated the intercept in M1 models by minus and plus the original intercept (i.e., baseline risk) with estimated intercepts for TH and CH, respectively. These models were still not well calibrated, we thus moved further to recalibrate overall coefficient (M2), but this did not much improve. Differences in distributions of predictors between appendicitis groups across data sources may also play a role. For instance, all symptoms and signs were more present in appendicitis than in non-appendicitis groups for both external hospitals, but not for WBC and neutrophil. The revisions of models showed much improvement, which could be M4 or M6 for both TH and CH. Only two symptoms and one sign contributed in predictions for both hospitals, therefore, the predictive score containing only three symptoms (migration of pain, progression of pain, aggravation of pain) and one sign (rebound tenderness) without laboratory test is proposed. Its performances in calibration and discrimination was very much similar to M6 (data were not shown). Although the RAMA-AS did not perform well in the external data when compared to the derived data, it could still well discriminate appendicitis from non-appendicitis in provincial setting (CH) and School of Medicine setting (TH).
Using the RAMA-AS in practice
Our RAMA-AS should be applied in general hospitals where resources are limited. Data of seven variables can be collected from physical examination, interview, and CBC test. Applying the RAMA-AS is easy by inputting data in the equation. Probability of appendicitis is then estimated for each risk stratification using Fagan nomogram. In addition, the score can be straight forwardly classified as very low (score < − 0.64), low (score − 0.64 to 0.84), moderate (score 0.85 to 1.74), and high risk (score > 1.74) of having appendicitis. As for the ROC analysis, these cut-off thresholds were objectively selected based on LR+ (i.e., sensitivity/(1- specificity)), which had less bias than subjective selection . Although our score could well discriminate appendicitis from non-appendicitis as for the C-statistics, clinical findings should also be incorporated for further decision making. Imaging investigation may be needed for moderate to high scores .
Counting number of positive of signs, symptoms, and laboratory results can be also applied. For instance, low risk appendicitis if having only positive for all items of signs, symptom, or laboratory tests; 1 positive item for each of 3 domains; 2 positive items among 3 domains (i.e., 1 symptom and sign, 1 symptom and laboratory test, 1 sign and 1 laboratory test); 3 symptoms with 1 laboratory test without sign; 3 symptoms plus one sign without laboratory test. The post-test probability would be 76.0%, so out-patient observation is recommended. The moderate risk requires three symptoms plus one sign of body temperature ≥ 37.8 °C, or three symptoms plus two laboratory tests without any sign. The post-test probability is from 85.0 to 93.0% for moderate risks, so other investigations such as ultrasound or CT scan may be needed for these patients.
The high risk group requires all symptoms and signs, or all symptoms plus one sign and laboratory test, all symptoms plus two signs plus any of laboratory test, or three symptoms plus two laboratory tests plus any of the signs. The post-test probability is about 93.0% and thus surgical treatment should be performed for high risk patients.
Our study has some strengths. We followed the recommendations for developing risk prediction score by Altman et al.  and TRIPOD . We developed and both internally and externally validated the scores using prospective data collections. Imputation of missing data was applied, even though it occurred only on a few variables, which should yield better performances of risk prediction model than analysis of complete case only . The RAMA-AS showed good performances for both calibration and discrimination in the derived setting, although one external setting had lower discrimination performance.
However, some limitations could not be avoided. The study was conducted at tertiary hospitals where the appendicitis prevalence was high. The RAMA-AS should be further validated in different populations and settings. In order to improve generalizability, big electronic health data or individual patient meta-analysis should be conducted . Clinical impact of the RAMA-AS should be also further assessed. For instance, applying the score in a routine clinical practice, which will let us know whether our score, can still well rule out and rule in suspected patients with and without appendicitis. These suspected patients may be only observed or treated with operation or even non-operative treatment such as antibiotics. Previous cohort study showed long-term success and safety of antibiotics in suspected appendicitis . However, this evidence was from observational study, which was prone to selection bias. Individual randomized controlled trial with appropriate methods should be conducted to test if non-operative treatment is non-inferior to operation .
Appendicitis is one of the most important clinical causes among acute abdominal pain. Several scoring systems had been developed for screening of appendicitis. Surprisingly, about two-thirds of studies developed prediction scores based on univariate analysis without applying statistical modeling. We have developed and internally/externally validated a clinical prediction score, called RAMA-AS, to classify risk of having appendicitis. The RAMA-AS showed good internal but fair external calibration, and it well discriminated for both internal and external validations. The RAMA-AS performed better than the Alvarado system (i.e., C-statistics 0.840 VS 0.710), which can suggest whether patients can be observed as out-patients, need further investigation or admit for appendectomy.
Expected relative variance increase
Fraction of missing information
Ramathibodi Appendicitis Score
receiver operating characteristic
Relative variance increase
STrengthening the Reporting of OBservational studies in Epidemiology
Thammasat University Hospital
Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis
Tepel J, Sommerfeld A, Klomp HJ, Kapischke M, Eggert A, Kremer B. Prospective evaluation of diagnostic modalities in suspected acute appendicitis. Langenbeck's Arch Surg. 2004;389(3):219–24.
Addiss DG, Shaffer N, Fowler BS, Tauxe RV. The epidemiology of appendicitis and appendectomy in the United States. Am J Epidemiol. 1990;132(5):910–25.
Horntrich J, Schneider W. Appendicitis from an epidemiological viewpoint. Zentralbl Chir. 1990;115(23):1521–9.
Temple CL, Huchcroft SA, Temple WJ. The natural history of appendicitis in adults. A prospective study. Ann Surg. 1995;221(3):278–81.
Alvarado A. A practical score for the early diagnosis of acute appendicitis. Ann Emerg Med. 1986;15(5):557–64.
Fenyo G, Lindberg G, Blind P, Enochsson L, Oberg A. Diagnostic decision support in suspected acute appendicitis: validation of a simplified scoring system. Eur J Surg. 1997;163(11):831–8.
Eskelinen M, Ikonen J, Lipponen P. The value of history-taking, physical examination, and computer assistance in the diagnosis of acute appendicitis in patients more than 50 years old. Scand J Gastroenterol. 1995;30(4):349–55.
Wilasrusmee C, Anothaisintawee T, Poprom N, McEvoy M, Attia J, Thakkinstian A. Diagnostic scores for appendicitis: a systematic review of scores’ performance. Br J Med Med Res. 2014;4(2):11–20.
Steyerberg EW, Moons KG, van der Windt DA, Hayden JA, Perel P, Schroter S, Riley RD, Hemingway H, Altman DG: Prognosis research strategy (PROGRESS) 3: prognostic model research. PLoS Med 2013, 10(2):e1001381.
Moons KG, Altman DG, Reitsma JB, Collins GS. New guideline for the reporting of studies developing, validating, or updating a multivariable clinical prediction model: the TRIPOD statement. Adv Anat Pathol. 2015;22(5):303–5.
Vandenbroucke JP, von Elm E, Altman DG, Gotzsche PC, Mulrow CD, Pocock SJ, Poole C, Schlesselman JJ, Egger M, Initiative S. Strengthening the reporting of observational studies in epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500–24.
Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9.
Royston P, Moons KG, Altman DG, Vergouwe Y. Prognosis and prognostic research: developing a prognostic model. BMJ. 2009;338:b604.
Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview and some applications. Stat Med. 1991;10(4):585–98.
White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30(4):377–99.
van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999, 18(6):681–94.
Hosmer DW, Lemeshow S. Assessing the fit of the model. In: Applied Logistic Regression. second edn. New York: Wiley; 2005. p. 143–202.
Steyerberg EW, Bleeker SE, Moll HA, Grobbee DE, Moons KG. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol. 2003;56(5):441–7.
Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–87.
Janssen KJ, Vergouwe Y, Kalkman CJ, Grobbee DE, Moons KG. A simple method to adjust clinical prediction models to local circumstances. Can J Anaesth. 2009;56(3):194–201.
Toll DB, Janssen KJ, Vergouwe Y, Moons KG. Validation, updating and impact of clinical prediction rules: a review. J Clin Epidemiol. 2008;61(11):1085–94.
Kappen TH, Vergouwe Y, van Klei WA, van Wolfswinkel L, Kalkman CJ, Moons KG: Adaptation of clinical prediction models for application in local settings. Med Decis Mak 2012, 32(3):E1-10.
Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD: Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 2004, 23(16):2567-2586.
Lamparelli MJ, Hoque HM, Pogson CJ, Ball AB. A prospective evaluation of the combined use of the modified Alvarado score with selective laparoscopy in adult females in the management of suspected appendicitis. Ann R Coll Surg Engl. 2000;82(3):192–5.
Tzanakis NE, Efstathiou SP, Danulidis K, Rallis GE, Tsioulos DI, Chatzivasiliou A, Peros G, Nikiteas NI. A new approach to accurate diagnosis of acute appendicitis. World J Surg. 2005;29(9):1151–6. discussion 1157
Kurane SB, Sangolli MS, Gogate AS. A one year prospective study to compare and evaluate diagnostic accuracy of modified Alvarado score and ultrasonography in acute appendicitis, in adults. Indian J Surg. 2008;70(3):125–9.
Chong CF, Thien A, Mackie AJ, Tin AS, Tripathi S, Ahmad MA, Tan LT, Ang SH, Telisinghe PU. Comparison of RIPASA and Alvarado scores for the diagnosis of acute appendicitis. Singap Med J. 2011;52(5):340–5.
de Castro SM, Unlu C, Steller EP, van Wagensveld BA, Vrouenraets BC: Evaluation of the appendicitis inflammatory response score for patients with acute appendicitis. World J Surg 2012, 36(7):1540-1545.
Watters JM. The appendicitis inflammatory response score: a tool for the diagnosis of appendicitis that outperforms the Alvarado score. World J Surg. 2008;32(8):1850.
Di Saverio S, Birindelli A, Kelly MD, Catena F, Weber DG, Sartelli M, Sugrue M, De Moya M, Gomes CA, Bhangu A, et al. WSES Jerusalem guidelines for diagnosis and treatment of acute appendicitis. World J Emerg Surg. 2016;11:34.
Bhangu A, Soreide K, Di Saverio S, Assarsson JH, Drake FT. Acute appendicitis: modern understanding of pathogenesis, diagnosis, and management. Lancet. 2015;386(10000):1278–87.
Soreide K, Korner H, Soreide JA. Diagnostic accuracy and receiver-operating characteristics curve analysis in surgical research and decision making. Ann Surg. 2011;253(1):27–34.
Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med. 2000;19(4):453–73.
Held U, Kessels A, Garcia Aymerich J, Basagana X, Ter Riet G, Moons KG, Puhan MA. Methods for handling missing variables in risk prediction models. Am J Epidemiol. 2016;184(7):545–51.
Riley RD, Ensor J, Snell KI, Debray TP, Altman DG, Moons KG, Collins GS. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140.
Di Saverio S, Sibilio A, Giorgini E, Biscardi A, Villani S, Coccolini F, Smerieri N, Pisano M, Ansaloni L, Sartelli M et al: The NOTA study (non operative treatment for acute appendicitis): prospective study on the efficacy and safety of antibiotics (amoxicillin and clavulanic acid) for treating patients with right lower quadrant abdominal pain and long-term follow-up of conservatively treated suspected appendicitis. Ann Surg 2014, 260(1):109-117.
Di Saverio S, Sartelli M, Catena F, Birindelli A, Tugnoli G: Renewed interest in acute appendicitis: are antibiotics non-inferior to surgery or possibly clinically superior? What is long-term follow-up and natural evolution of appendicitis treated conservatively with "antibiotics first"? Surg Infect 2016, 17(3):376-377.
Availability of data and materials
The data are available and provided under consideration of the corresponding author on reasonable request.
Ethics approval and consent to participate
This study was approved by Ethic Committee of Faculty of Medicine, Ramathibodi Hospital, Mahidol University. The number of published confirmation was ID 10-55-27.
Consent for publication
All authors declared that they had no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Re-calibration and revision of models for external validations. Table S2. Report number of missing data. Table S3. Distributions of predictors by appendicitis groups and developed/validated data. Table S4. Estimation of intercept and coefficients for external validations using different update models. Table S5. Estimations of calibration coefficients and C-statistics for external validations using different re-calibration and revision methods. (DOCX 57 kb)
Diagnosis plot between missing and observed values: A) WBC, B) Neutrophil. (PDF 157 kb)
Receiver operating characteristic (ROC) curves of RAMA-AS for diagnosis of appendicitis. (PDF 153 kb)
Calibration plots for external validations at Thammasat University Hospital using different update methods. (ZIP 298 kb)
Calibration plots for external validations at Chaiyapum Hospital using different update methods. (ZIP 298 kb)