Which appendicitis scoring system is most suitable for pregnant patients? A comparison of nine different systems

Background Acute appendicitis is the most common non-gynecological emergency during pregnancy. The diagnosis of appendicitis during pregnancy is challenging due to changes in both physiological and laboratory variables. Guidelines suggest patients with suspected acute appendicitis should be stratified based on clinical scoring systems, to optimize the use of diagnostic imaging and prevent unnecessary surgery. Surgeons require additional information beyond that provided by imaging studies before deciding upon exploratory laparoscopy in patients with a high suspicion of appendicitis. Various scoring methods have been evaluated for the diagnosis of acute appendicitis. However, there is no consensus on a method to use during pregnancy, and a detailed comparison of existing scoring methods for this purpose has not yet been conducted. The purpose of this study was to evaluate the efficacy of the most popular scoring systems applied to diagnose acute appendicitis during pregnancy. Methods This single-center retrospective study included 79 pregnant patients who were admitted to the emergency department with abdominal pain between May 2014 and May 2019. The patients were diagnosed with acute appendicitis and underwent an appendectomy. As a control group, the study also included 79 non-pregnant patients who underwent appendectomy within the last 1.5 years. To ensure that the groups were similar, women in the case group were stratified according to age, and the proportions of women in the strata were determined. The women in the control group were similarly stratified. Women were randomly selected from the strata to prevent bias. Both laboratory and examination findings required for each scoring method were obtained and assessed separately for each patient. Negative appendectomy rates were evaluated according to pathology results. Categorical variables were compared using the chi-square test. A p value < 0.05 was considered to indicate significance. Receiver operator characteristic curve analysis was used to identify the best threshold value and to assess the performance of the test scores in terms of diagnosing appendicitis. Results Among all scoring systems, the Tzanakis score was most efficacious at predicting appendicitis in non-pregnant women. The positive predictive value (PPV) of the Tzanakis score was 90.6%, whereas the negative predictive value (NPV) was 46.7%. The RIPASA score performed the best among the scoring systems in pregnant women. It was associated with a PPV of 94.40%, NPV of 44%, and sensitivity and specificity of 78.46% and 78.57%, respectively. Conclusion Although the RIPASA score can be used to efficaciously diagnose acute appendicitis in pregnant women, a specific scoring system is needed for diagnosis during the gestation period.


(Continued from previous page)
Conclusion: Although the RIPASA score can be used to efficaciously diagnose acute appendicitis in pregnant women, a specific scoring system is needed for diagnosis during the gestation period.
Keywords: Appendicitis, Pregnancy, Score, Predictive value Background Acute appendicitis is the most common cause of nonobstetric emergency surgery in pregnant women. Appendicitis occurs in one out of 1500 pregnant women [1]. Furthermore, negative appendectomy rates in females of reproductive age are reported to be up to 26% [2]. The differential diagnosis of acute abdominal pain during pregnancy is more complicated than in typical patients. In addition to symptoms such as nausea, vomiting, and abdominal pain, which are common during pregnancy, an increased white blood cell (WBC) count and limited radiological methods complicate the diagnosis of acute appendicitis [3][4][5]. Unfortunately, negative appendectomy rates remain relatively high regardless of the testing conducted [6][7][8][9][10].
The main goals in the diagnosis of appendicitis are to reduce negative appendectomy rates, avoid perforation, and protect the patient from unnecessary surgical intervention. A meta-analysis reported that it is essential to make an exact diagnosis and avoid any delay therein [11]. However, another article reported that a delay in the diagnosis of uncomplicated appendicitis of up to 24 h is safe [12]. That article stated that non-operative management through antibiotic treatment is safe in cases of uncomplicated appendicitis. Unfortunately, the rates of abortion and preterm labor are significantly higher in women with appendicitis [13,14]. Timely diagnosis is critical, as delays may lead to both maternal (1-4%) and fetal (1.5-35%) mortality due to perforation of the appendix [15]. To this end, various scoring methods have been developed based on imaging studies, clinical findings, and laboratory results [16][17][18][19][20][21][22][23][24]. The present study aimed to evaluate the extent to which these scoring methods are suitable for the diagnosis of appendicitis in pregnancy.

Methods
This study included 79 pregnant patients who were admitted to the Sakarya University Faculty of Medicine (Serdivan, Turkey) with abdominal pain between May 2014 and May 2019. The patients were diagnosed with acute appendicitis and underwent appendectomy. The study also included a control group of 79 non-pregnant patients who underwent appendectomy within the last 1.5 years. To ensure similar groups, women in the case group were stratified according to age, and the proportions of women in the strata were determined. The women in the control group were similarly stratified.
Women were randomly selected from the strata to avoid possible bias. Patients under the age of 20 or older than 45 were excluded from the control group, as well as those with chronic co-morbidities (e.g., hypertension, diabetes mellitus, chronic renal failure, or chronic pulmonary disease). All pregnant patients were examined by the obstetrician both before and after surgery.
Nine appendicitis clinical score methods were used for evaluating the patients. These clinical scoring systems (CSS) included the Alvarado, Eskelinen, Ohmann, AIR, RIPASA, Tzanakis, Lintula, Fenyo-Lindberg, and Karaman systems. Laboratory data such as WBC count, neutrophil count, and C-reactive protein level were collected, as well as examination findings and symptoms. In an Excel file, formulas were prepared separately for each CSS, and the data obtained from each patient were entered into the file so that CSS scores were calculated automatically. A visual analog scale pain score was calculated for all patients hospitalized with a diagnosis of acute appendicitis, who were classified into mild, moderate, and severe pain subgroups according to the literature [25]. This helped with calculating the score for the Lintula system, in which the pain system was graded at three different levels. Negative appendectomy rates were evaluated according to pathology results. The ethics committee of our university approved the study.

Statistical analysis
Descriptive analyses were performed to provide information on the general characteristics of the study population. The Kolmogorov-Smirnov test was used to evaluate whether the distributions of numeric variables were normal. Accordingly, either independent sample t tests or Mann-Whitney U tests were used to compare the numeric variables between pregnant and non-pregnant patients. The numeric variables are presented as the mean ± standard deviation or medians with interquartile ranges in square brackets. Categorical variables were compared using chi-square tests. Categorical variables are presented as counts and percentages. A p value < 0.05 was considered to indicate significance. Receiver operating characteristic curve analysis was used to identify the optimal threshold values and assess the diagnostic performance of test scores for appendicitis. Analyses were performed using SPSS 22.0 statistical software (IBM Corp., Armonk, NY, USA).

Results
The median ages in the pregnant and non-pregnant groups were 28 and 26 years, respectively. The mean WBC counts in the pregnant and non-pregnant groups were 14.07 ± 4.5 and 13.43 ± 4.5, and the median Creactive protein levels were 16.2 [55.03] and 7.64 [41.69] mg/dl, respectively. The left shift in neutrophils was significantly higher in the pregnant group than in the nonpregnant group; 47 (59.5%) vs. 23 (29.1%), respectively (p < 0.001). The median total bilirubin level was also significantly higher in the non-pregnant group than in the pregnant group; 0.59 [0.75] vs. 0.47 [0.33] mg/dl, respectively (p < 0.001; Table 1).
Based on the pathology results, 65 pregnant patients (82.3%) had appendicitis and 14 (17.7%) did not. In the non-pregnant group, 66 patients (83.5%) had appendicitis and 13 (16.5%) did not. There was a significant difference between the groups in terms of severity of pain; most individuals in the non-pregnant group (54; 68.4%) reported moderate pain, whereas most in the pregnant group (57; 72.2%) reported a high degree of pain (p < 0.001). Findings in both groups were similar in terms of nausea, vomiting, and anorexia. Pregnant patients typically visited the hospital less than 24 h from the onset of symptoms. Based on direct examination of pregnant patients, abdominal guarding and rebound tenderness were significantly more common than in non-pregnant patients (Table 2).
We compared the efficacy of scoring systems for both groups. The Tzanakis system was the most efficacious among the scoring systems used in non-pregnant women. The positive predictive value (PPV) of the Tzanakis system was 90.6%, whereas the negative predictive value (NPV) was 46.7%. The sensitivity associated with the Tzanakis score was 87.8% and the specificity was 53.8%. Based on the area under the curve (AUC) analysis of predictive power, the Tzanakis score had the highest power in the non-pregnant group, followed by the AIR and Alvarado scores (Table 3) (Fig. 1).
RIPASA performed the best among the scoring systems used in pregnant patients. The PPV of this scoring method was 94.40%, its NPV was 44%, and its sensitivity and specificity were 78.46% and 78.57%, respectively. The AIR and Tzanakis systems were the second-and third-most efficacious, respectively. The PPV of the AIR score was 92.9%, its sensitivity was 80%, and its specificity was 71.4%. The PPV of the Tzanakis score was 97.1%, its sensitivity was 52.3%, and its specificity was 92.8% (Table 4) (Fig. 2).

Discussion
Appendicitis is generally diagnosed based on clinical and laboratory findings, including the results of imaging analysis. However, the presence of numerous gynecological pathologies in female patients makes it challenging to diagnose acute appendicitis, particularly in pregnant patients [26]. For example, symptoms such as nausea and vomiting are common during both pregnancy and appendicitis, and laboratory findings also tend to be similar.
Radiological examination has high diagnostic value for acute appendicitis [27]. However, the main disadvantages of computed tomography are its teratogenic effect and high cost [28,29]. The presence of positive abdominal ultrasonography (USG) findings in pregnant women with suspected appendicitis is sufficient to confirm the condition. In cases where appendicitis cannot be diagnosed using USG, magnetic resonance imaging provides high diagnostic accuracy in pregnant patients [30][31][32]. When using new scoring systems that combine clinical and imaging features, 95% of patients with uncomplicated appendicitis can be diagnosed correctly [33].
Delays in diagnosis and treatment of appendicitis may result in more complicated illness and increased rates of preterm labor, perinatal morbidity, mortality, and fetal loss [6][7][8][9][10]. The use of scoring systems helps to support imaging methods [34,35]. CSS may be used in acute appendicitis to facilitate early diagnosis of the disease and prevent morbidity, and the Alvarado scoring system is one of the most commonly used systems for this situation [27]. Although the Alvarado system may be used in pregnant patients, its use has been extensively validated mainly in non-pregnant patients [36].  CSS aim to diagnose appendicitis by assessing signs, symptoms, and laboratory results. Systems use different variables. For example, the Tzanakis system includes USG results as a factor, the Lintula and Fenyo-Lindberg systems include gender, and the RIPASA and Ohmann systems include urinary symptoms. In our study, we assessed nine different scoring systems that are commonly used worldwide to diagnose appendicitis and compared their performance in pregnant and non-pregnant patients.
The presence of nausea, vomiting, and physiological leukocytosis during pregnancy makes it challenging to diagnose appendicitis, as does the fact that the position of the appendix changes during the gestation period [26]. In addition to our traditional knowledge, while rare publications are stating that the position of the appendix does not vary during pregnancy, publications with high volume are inevitably required to clarify this circumstance [37,38]. This may explain why the negative The data are shown in number and percentage format *Phi or Cramer V coefficient is given as the effect size measure appendectomy rate is approximately 35% in pregnant patients [39]. Currently, there is no scoring system that specifically evaluates appendicitis during pregnancy. Therefore, we evaluated the efficacy of existing scoring systems in pregnant patients. Based on AUC analysis, the scoring systems with the highest predictive power for non-pregnant women were the Tzanakis, AIR, and Alvarado systems, in that order. The Lintula and Fenyo-Lindberg scoring systems produced the lowest AUC values among the nine scoring systems. These results suggest that including USG findings (e.g., as in the Tzanakis system) is valuable for the diagnosis of appendicitis in non-pregnant patients. For pregnant women, the RIPASA score had the highest predictive value, followed by the AIR and Tzanakis scores. Scoring systems that were heavily based on signs and detailed laboratory findings had greater predictive power in the pregnant group. Although we determined that the RIPASA system performed best in pregnant patients, its sensitivity and specificity were below 80%. Based on our findings, we intend to conduct further CSS research focused on the gestation period.
The potential limitations of our study included its retrospective design, the small number of cases, and the retrospective control group, which may not have been representative of the main group.
In summary, systems that include variables such as changes in the neutrophil count, negative urinary findings, Rovsing's sign, rebound, severe abdominal defense, absence of bowel sounds, short symptom duration, and  severe abdominal pain perform well in pregnant patients ( Table 2). Systems that score gender and pain displacement are less efficacious for pregnant patients. Notably, with the progression of pregnancy, the gravid uterus may affect pain migration.

Conclusion
Among the CSS evaluated, the RIPASA system was found to be the most suitable for pregnant patients, and this system may also help guide the use of imaging methods for pregnant patients in the clinic. Scoring systems tailored for acute appendicitis during the gestation period will improve treatment outcomes for both the mother and fetus.