World Journal of Emergency Surgery Open Access Regional Variability in Use of a Novel Assessment of Thoracolumbar Spine Fractures: United States versus International Surgeons

Background: Considerable variability exists in clinical approaches to thoracolumbar fractures. Controversy in evaluation and nomenclature contribute to this confusion, with significant differences found between physicians, between different specialties, and in different geographic regions. A new classification system for thoracolumbar injuries, the Thoracolumbar Injury Severity Score (TLISS), was recently described by Vaccaro. No assessment of regional differences has been described. We report regional variability in use of the TLISS system between United States and non-US surgeons.

management was substantial in both US and non-US surgeons. The TLISS incorporates generally accepted features of spinal injury assessment into a simple patient evaluation tool. The management recommendation of the treatment algorithm component of the TLISS shows good inter-rater and substantial intra-rater reliability in both non-US and US based spine surgeons. The TLISS may improve communication between health providers and may contribute to more efficient management of thoracolumbar injuries.

Background
Controversy persists with regard to treatment of thoracolumbar injuries. The diagnosis and definition of clinically significant spinal instability remains unclear and poses a source of frequent disagreement in the literature. Some authors note good clinical outcomes with non-operative treatment of these injuries; prospective studies demonstrate that many thoracolumbar fractures may be successfully treated non-operatively, with no benefit gained from adding surgical stabilization [1][2][3]. Some patients, however, ultimately fail conservative treatment, developing symptomatic late deformity or instability. Modern operative techniques allow for restoration of normal spinal alignment, correction of instability, and decompression of neural elements. Determining prospectively which patients are prone to developing instability and hence might benefit from surgical treatment remains contentious [4].
Adding to confusion over patient selection, no consensus exists as to choice of treatment in thoracolumbar injuries. The therapeutic approach to these patients is hampered by lack of accepted nomenclature and of a useful and clinically valid classification system for these injuries. While numerous classification systems have been devised, each poses problems in implementation [5]. Many systems are overly complex, limiting their utility. Others omit important portions of standard clinical decision making. Most classification schema fail to suggest treatment options [6].
A recently described treatment algorithm may aid in treatment of these patients. The Thoracolumbar Injury Severity Score (TLISS) assesses injuries based on three criteria: the mechanism of injury based upon radiographic assessment, patient neurological status, and the integrity of the posterior ligamentous complex (PLC) [4]. TLISS was developed by a group of 40 spine experts from 15 trauma centers in the United States, Canada, Australia, Germany, Mexico, France, Sweden, India, and the Netherlands. Relevant literature on thoracolumbar trauma, classification, and treatment was reviewed. A classification scheme and treatment algorithm were described (Figures 1 and 2) [7][8][9].
The TLISS algorithm has shown high initial treatment decision validity, with greater than 92% of surveyed sur-geons agreeing with the algorithm's treatment recommendation (operative versus nonoperative) [10]. Initial testing showed poor reliability for the injury mechanism sub-score of the measure. Evaluation of early TLISS reliability assessments as well as surveys of the Spine Trauma Study Group led to modification of the protocol, with greater concentration upon injury morphology and deemphasis of injury mechanism. A separate system, the Thoracolumbar Injury Severity and Classification Score (TLICS), has been recently forwarded [10]. Good validity in different specialties and different training levels have been described, with similarly adequate reliability [2,11]. Validation across geographic boundaries has not been assessed.
Considerable geographic variation is consistently reported in surgical treatment of spinal disorders. A universal rating scale and treatment algorithm for thoracolumbar injuries must overcome these regional variabilities and demonstrate acceptable reliability and validity regardless of locale. We compare the inter-and intra-rater reliability of the TLISS within and between US and non-US surgeons. We demonstrate moderate to substantial reliability in the use of the scale and high treatment validity, as assessed by surgeon agreement with the algorithm's management recommendation. The TLISS is a promising tool in the evaluation of spine trauma patients.

Methods
Forty-eight trauma spine surgeons, including both neurosurgeons and orthopedic surgeons, participated in a comprehensive review and analysis of historical and contemporary thoracolumbar injury classification schemes, eventually leading to development of the TLISS clinical tool [8][9][10].
In order to validate the management recommendations of the TLISS algorithm, a booklet of 56 thoracolumbar traumatic injury case studies was prepared. The case vignettes detailed the patient's age, description of the traumatic injury, and neurological exam. Imaging studies, including plain radiographs, CT, and MR (sagittal T 2 -weighted images), were included. The cases were distributed to surgeons for classification and grading using the TLISS algorithm. The final severity score was used to determine the recommendation for nonoperative or operative treatment according to the treatment algorithm described in Table 1. Twenty eight surgeons completed the vignettes. The results were analyzed to determine inter-rater reliability, and percent agreement with the final treatment recommendations.
Three months later, the numerical order of the cases was scrambled and pamphlets were redistributed. Twenty-two of the original 48 surgeons who participated in the development of the classification system completed both surveys. Their results were analyzed to determine intra-rater reliability. To assess possible regional differences in the validity and reliability of this system, physicians were grouped into US and non-US cohorts. The US cohort included surgeons from a variety of trauma centers. The international group included surgeons from Canada, Australia, Germany, Mexico, France, Sweden, India, and the Netherlands. The data were then analyzed using SPSS ® and Analyze IT ® software to determine percent agreement, unweighted Cohen's kappa, kappa with linear weighting, and Spearman's rank order correlation. The Cohen's kappa value was defined as the observer agreement (Pa) minus the chance agreement (Pc) divided by the maximum possible agreement that is not related to chance (1-Pc): kappa = (Pa -Pc)/(1-Pc). The kappa values (Table 2) obtained may range from -1.0 (complete disagreement) through 0 (chance agreement) to 1.0 (perfect agreement) [23]. A guideline for interpreting Cohen's kappa values is summarized in Table 2. For significance tests, all unweighted coefficients were converted into Fisher's zscores, and the difference in z-scores was divided by standard error. A level was set at 0.05 (∆Z/SE ≥ 1.96).

Results
Inter-and intra-rater agreement between the cohorts is reviewed in Tables 3 and 4. General results of kappa scoring between the groups has been previously reviewed [10,[12][13][14]. Non-US spine surgeons had greater inter-rater reliability on mechanism sub-score (p < 0.05 as assessed by % agreement), while US surgeons had greater interrater reliability on neuro status (p < 0.05 as assessed by % agreement and Spearman's r) and PLC integrity (p < 0.05 as assessed by Spearman's r). Intergroup (between USA and international) reliability was similar to within group Illustrative case of TLISS use Figure 1 Illustrative case of TLISS use. Patient is an 18 y/o male who presents after a motor vehicle accident. Representative sagittal (A), coronal (B) and axial (C) computed tomography images were obtained. A compression fracture with angular deformity at T5 combined with a significant rotational injury is evident. Only the highest scoring injury, the translational/rotational score, is used for morphology (3 points). CT imaging suggests posterior ligamentous disruption due to severity of rotational deformity at the fracture site, and a palpable step between spinous processes on physical exam confirmed PLC injury (3 points). The patient was neurologically intact (0 points). The comprehensive score of 6 suggests operative therapy. An intact patient with disrupted PLC favors a posterior approach in the treatment algorithm [6]. The patient was treated with a multilevel posterior stabilization and fusion.
A B C inter-rater reliability in all parts of the TLISS scoring, indicating that the two groups agreed about as often as individual members within each cohort agreed amongst themselves.
Absolute inter-rater agreement among non-US surgeons on the final TLISS score was greater (p < 0.05), but total TLISS scores better correlated among US surgeons (p < 0.05) ( Table 3). With regard to the algorithm's final recommendation for treatment (operative vs. non-operative), inter-rater agreement within the US physician group and non-US group was 75.7% (Cohen's kappa .561) and 72.3% (Cohen's kappa .506), respectively (Table 3). Interrater agreement on management between the groups was 74.2% (Cohen's kappa .536). Between the two groups, greatest agreement was found in assessment of neurological status (96.3% agreement, Cohen's kappa .936) ( Table  3).
Intra-rater agreement was higher among US surgeons on mechanism, whereas intra-rater agreement was higher among non-US surgeons for neurological status, PLC integrity, and total TLISS score. These differences reached statistical significance as assessed by % agreement for mechanism and neurological status. Differences in intrarater correlation reached statistical significance on neurological status, PLC, and total TLISS ( Second illustrative case of TLISS use

Discussion
Thoracolumbar classification schema Initial attempts at thoracolumbar fracture classification were made by Bohler in 1930, who classified fractures into five injury types based on anatomic appearance and mechanism [12]. The modern era of fracture classification benefited greatly from availability of CT scanning; advances in imaging led Denis to develop a three-column model of spinal stability, modifying the two-column approach of Holdsworth and Louis [7,15]. Magerl et al. forwarded the AO classification, using a mechanistic approach to divide fractures into a total of 53 potential patterns based upon 3 injury categories and 3 tiers of subcatagorization [16]. A separate load-sharing classification of spinal injury has also been described [17].
The most commonly used systems are Denis' three-column model of spinal stability and the AO classification.
Both have significant problems. The AO system has poor inter-and intra-observer agreement [6,9]. Use of 53 different fracture patterns is unwieldy and appears counterintuitive. This makes routine clinical use of the scale impractical. The Denis system may oversimplify complex fractures, and may not accurately assess need for operative intervention [12].

TLISS clinical algorithm
The TLISS clinical algorithm assesses thoracolumbar injuries based upon three accepted clinical decision making criteria: 1. Mechanism of injury as determined by imaging studies, 2. Integrity of the PLC, and 3. Patient neurological status. These criteria were thought to be independent predictors of patient clinical outcome. Subgroups for scoring were developed within each component. Points are assigned in the treatment algorithm cumulatively for each criterion. Final recommendation for treatment is based upon final injury score (1 and 2).

Mechanism of injury
The mechanism of injury describes fracture pattern based upon three general descriptions, similar to the AO thoracolumbar injury classification: 1. compression, 2. translation/rotation, and 3. distraction (Table 5). Angulation at the fracture site for compression injuries indicates greater instability, and separately may add 1 point to final TLICS score. Complex fractures may combine more than one of the three basic morphologic elements. In these cases, only the highest category is scored.

Integrity of the PLC
The PLC is composed of the ligamentum flavum, the facet joint capsules, and the interspinous and superspinous ligaments. The PLC is quantified in the TLICS as intact, indeterminate, or disrupted (Table 5). Imaging via MRI, CT, plain films, and physical exam (detecting a palpable gap between spinous processes) are used to evaluate the PLC.

Neurological status
Presence or absence of neurological deficit is an independent indicator of the severity of thoracolumbar injury. More severe injuries merit higher scores, with incomplete spinal cord and cauda equina injuries scoring highest in the algorithm ( Table 5).
The injury score is obtained via summation of individual elements. A cumulative score of 3 or less suggests a nonoperative injury, while a score of 5 or greater suggests surgical intervention may be necessary ( Table 1). Scores of 4 are indeterminate, and may be treated surgically or conservatively [4,6,12]. Illustrative cases are reviewed in Figures 1 and 2.

Reliability and validity of the TLISS
The TLISS scale has been evaluated for both inter-and intra-rater reliability. Acceptable reliability was found and surgeons agreed with the algorithm's treatment recommendation in greater than 90% of cases [10]. These findings indicate the scale produces internally reliable ratings of injury severity and treatment recommendations that are valid with respect to the rating surgeons' clinical approaches [10]. Substantial reliability has been previously demonstrated within a variety of specialties and training levels, including spine fellows, attending spine surgeons, neurologists, and physiatry physicians [13].
Greater than 90% of surgeons in each specialty were found to agree with the TLISS management recommendations [14], and the same trend of outstanding construct validity is reported here when comparing US and non-US surgeons.
In contradistinction to other classification schemes, the TLISS has demonstrated acceptable intra-and inter-rater reliability and appears usable across specialty boundaries. This manuscript is the first to assess geographic differences in approach to thoracolumbar injuries using the TLISS assessment tool.

Geography and spine surgery
As noted by Seidenwurm, "medicine is evidently a local phenomenon" [18]. Geographic influences on choice of surgical and medical therapies are significant. Geographic location consistently predicts yearly rates of spine surgery [19]. Authors have noted that rates of back surgery in the United States are 40% higher than other countries, and five-fold higher than comparable rates in England and Scotland. Rates of surgery are noted to increase linearly with supply of orthopedic and neurosurgical spine surgeons [8].
Other authors have noted significant regional differences in availability and utilization of medical imaging, and correlated these findings with rates of elective spinal surgery [18,20]. Similar geographic variation occurs in coronary artery bypass graft procedures, general orthopedic procedures, and medical treatment of acute myocardial infarction [11]. Parallel findings in systems without financial incentive for clinical productivity would seem to indicate an intrinsic regional variability in health care use [21,22]. Geographic variation also is found in development and adoption of new technology [23].
Geographic differences persist in evaluation and management of traumatic injuries. In a multi-center review of traumatic spine injuries, no consensus was found as to optimal surgical timing [24]. Treatment approach, including imaging, seemed to vary by research site [24]. For the TLISS to be a useful paradigm for assessment and management of thoracolumbar injury, it must bridge these signif-icant geographic differences in approach to spinal pathology. The system must demonstrate adequate international reliability.

Domestic versus international reliability and validity of the TLISS
We compared reliability and validity of the TLISS in US and non-US surgeons, assessing both inter-and intra-rater reliability. Results are reviewed in Tables 3 and 4. Interrater agreement on management within the US physician group and non-US group was 75.7% and 72.3, respectively. Overall correlation on management between the groups was 74.2%, indicating adequate validity of the measure. The TLISS provides reliable and valid initial treatment recommendations, irrespective of rater geography.
These and similar results have lead to a recent modification of the TLISS system [6]. Inferring the mechanism of injury from initial imaging modalities may be difficult. In fact, this is the least reliable sub-score among both US and non-US surgeons. Hence, injury morphology has been substituted for injury mechanism. Injury morphology is based simply upon the appearance of the fracture or dislocation on imaging studies (plain film, CT, or MRI). The STSG has also endeavored to increase the reliability of the PLC sub-score. To this end, a series of studies have been undertaken to clearly define the principle indicators of PLC disruption on MRI. These definitions will be included in a revised classification system. This revised classification system is termed the Thoracolumbar Injury Classification and Severity Score (TLICS). Only the most severe (highest total points) injury morphology category is included in the scoring. Compression morphology garners 1 point, and an additional point is assigned for burst morphology. Three points are assigned for a translational/ rotational morphology and 4 points for a distraction morphology. The descriptive "distraction" is only applied if there is objective imaging evidence of distraction present. Scores of the morphology subgroups are not additive if multiple morphologies are present. Studies are underway to further develop and validate this classification system.

Conclusion
Controversy persists in management of thoracolumbar injuries. The TLISS clinical algorithm offers assessment of injury stability and aids in making treatment decisions. The scale has shown adequate reliability between and within different specialties. We show reliability and validity of the TLISS scale across geographic boundaries, comparing US and non-US surgeons. Differences between these broad geographic groups were subtle, with interrater reliability between groups similar to reliability within groups. This suggests that the TLISS may help unifying clinical decision making in thoracolumbar trauma. Thoracolumbar injury severity score (TLISS) illustrating three major categories of mechanism of injury, neurological involvement and posterior ligamentous complex with associated grading points. Taken from Harrop J, Vaccaro AR, et al [8].