Iowa Orthop J. 2003; 23: 51–56.
PMCID: PMC1888399
Interobserver Error in Interpretation of the Radiographs for Degeneration of the Lumbar Spine
This article has been cited by other articles in PMC.
INTRODUCTION
Degenerate disc disease affecting the lumbar vertebra shows up as morphological changes on plain radiograph as decreased disc height, osteophytes, Schmorl's nodes, vertebral end-plate sclerosis, and vacuum sign. These are usually late changes of a degenerate disc. Early changes of disc dehydration are not seen on a plain radiograph.
A MRI scan is the most accurate imaging modality to demonstrate gross intervertebral disc morphology 1,2. Dehydration and change in the proteoglycan content associated with early degeneration manifests as loss of signal intensity with MRI3,4. Decreased disc signal intensity on MRI are correlated to histological and macroscopic degenerative changes3,4,5. It is difficult to separate pathologic degenerative processes from age-related changes1,6. Sether et al5 showed that T2-weighted signal intensities do not decrease significantly with age if the disc is not pathologically degenerate. Their results suggested that age influences signal intensity less than pathologic degenerative process. Thus, changes in the disc signal intensity may correlate with the degree of degenerative change.
A radiologist's opinion of the lumbar spine radiograph may vary with that of the clinician due to the disadvantage of not seeing the patient. Interobserver studies have been done to quantify the level of disagreement between the radiologists on several variables7,8.
The aim of our study was to detect the disagreement between a General Orthopaedic Surgeon, a Spine Surgeon, a Spine Nurse Practitioner and Radiologists in the diagnosis of degenerative disc disease on plain radiographs of the lumbar spine. Unlike other studies7,8 we considered only one pathology, i.e. disc degeneration, to quantify the interobserver error. Also MRI scanning of the lumbar spine was used as a gold standard against which the plain radiograph interpretation was measured and disagreement quantified.
METHODS
Twenty-three consecutive patients with degenerative lumbar disc disease who had an MRI scan were selected for this study. There were 14 men and 9 women of average age 43.4 yrs (range:24-58). Plain AP and Lateral radiographs of the lumbar spine were examined independently by Consultant Orthopaedic Spine Surgeon (A), Consultant Orthopaedic Surgeon (B), Consultant Spine Radiologist (C), Consultant Radiologist (D), and Spine Nurse Practitioner (E). All the participants were blinded to the patient identification parameters.
They looked for the degenerative signs of osteophytes, subchondral vertebral end-plate sclerosis, Schmorl's nodes, disc space height and vacuum phenomena (Figure 1). 116 segments were studied in the 23 radiographs as one patient had lumbarization of the S1 vertebra. Visual appraisal is an accurate way to record the disc height9. Disc degeneration at each segment was classified as none, mild, moderate and severe. The following was the criteria for disc degeneration on plain radiographs2,10–12:
- Mild—Minimal loss of disc height; early osteophyte formation.
- Moderate—Loss of disc height > 25%, but < 50%; osteophyte formation; mild end-plate sclerosis.
- Severe—>50% loss of disc height; significant osteophytosis; obvious end-plate sclerosis; vacuum sign.
- None—None of the above changes.
116 lumbar disc levels were examined on the MRI scan by an independent Consultant Radiologist. The visual classification system used for the disc signal intensity was bright (high-signal) appearance for normal, gray (intermediate-signal) appearance for early degenerative change, and dark (low-signal) appearance for well established degenerative change2 (Figure 2).
Statistical Tests
We used the Landis and Koch's12 interpretation of kappa: <0.00 = poor agreement, 0.00-0.20 = slight agreement, 0.21-0.40 = fair agreement, 0.41-0.60 = moderate agreement, 0.61-0.80 = substantial agreement, and 0.81-1.00 = almost perfect agreement. Weighted kappa was used rather than simple kappa and the disagreements were weighted by degree. Systematic differences in the ratings, i.e. one observer repeatedly reporting a greater or lesser number of abnormal findings compared with the other observer, were examined by the Bowker's test 14.
RESULTS
The interobserver variation for all the five independent observers is summarised in Tables 1 and and2.2. These tables show the pairwise interobserver agreement. The range of agreement is from slight (almost no agreement to almost perfect agreement).
Pairwise interobserver agreement between the 5 independent observers (doctor vs doctor) (Weighted kappa in parentheses)
Pairwise interobserver agreement between the independent observers and MRI scan(a gold standard) (doctor vs MRI) (Weighted kappa in parentheses)
The five independent observers are
- —Consultant Orthopaedic Spine Surgeon
- —Consultant General Orthopaedic Surgeon
- —Consultant Spine Radiologist
- —Consultant General Radiologist
- —Spine Nurse Practitioner
The interobserver agreement was assessed at each segment from L1-S1 for all the 23 patients. There were significant differences at each segment level when it was compared to the MRI scan. The number of observations on plain radiographs classed as none were 318; mild were 162; moderate were 79; and severe were 29. Overall interobserver agreement was calculated on 114 segments (2 segments were omitted because of < 5 observer evaluations). The overall weighted kappa coefficient, comparing the plain radiographs (four doctors and one nurse practitioner) with the MRI scanning report at each segment level was 0.245 (95% CI= lower: 0.193, and upper: 0.298). This shows that there was a fair agreement between the independent observers and the MRI scanning report.
DISCUSSION
Lumbar spondylosis and degenerative disc disease were assessed on plain radiographs of the lumbar spine by two Orthopaedic Surgeons, a Spine Nurse Practitioner, and two Radiologists. Many previous studies have shown poor agreement between observers in the interpretation of the plain radiographs of the lumbar spine and the sacroiliac joints7,15–18. An interpretation of radiographs depends on the observers, patients and the disease. There can be variation in the observation due to the heterogeneity in the population, diagnostic strategy and preference of the observers, and the importance, presentation and frequency of the abnormality8. This has implications in diagnosis and treatment.
We found variation in agreement between the observers in the interpretation of the plain radiographs at each segment level (L1- S1) for the diagnosis of degenerative disc disease. There was significant variation between the Consultant General Orthopaedic Surgeon (B), and the Consultant Spine Radiologist (C) and the Spine Nurse Practitioner (E) (Table 1). There was substantial agreement between the Consultant Spine Surgeon (A), Consultant Spine Radiologist (C), and the Spine Nurse Practitioner (E). Also the same group of observers (A, C, and E) had moderate agreement with the MRI scan which was used as a gold standard. Consultant General Orthopaedic Surgeon (B) and Consultant General Radiologist (D) had significant variation with the MRI scan. This suggests that the observers who specialize in the lower back problem of degenerative disc disease are more likely to diagnose this condition than the general orthopaedic and general radiologists.
Although the spine specialists had moderate agreement with the MRI scan in diagnosing lumbar disc disease, it falls far short of the MRI scan result. Also the overall weighted kappa coefficient comparing the observations of the plain radiographs by the 5 doctors with the MRI scan was 0.245 (mean). Thus there is only fair agreement between the doctors and the MRI scan in diagnosing lumbar disc disease.
As was noted by Espeland et al8 when there was disagreement between the observers, one observer often diagnosed significantly more or less abnormality than the other colleague (Table 3). Consultant General Orthopaedic Surgeon (B) and Consultant General Radiologist (D) are less sensitive in diagnosing disc degeneration compared to the other observers. These two doctors and the Consultant Spine Surgeon (A) were more likely to give false negative report (less sensitive) than the MRI scans (Table 4). The Consultant Spine Radiologist (C) and the Spine Nurse Practitioner (E) were more likely to give false positive report (less specific) than the MRI scan in the diagnosis of lumbar disc degeneration (Table 4).The systematic differences between the observers took a consistent direction at each lumbar segment, indicating that the observers had different thresholds for abnormal rating for disc degeneration. Such diagnostic threshold for ambiguous objects may depend on the observer's 'response bias', i.e. tendency to prefer one or another response category, independent of the characteristics of the object19. Thus the response bias and the different thresholds for actually reporting the 'minor' or clinically insignificant findings that were observed, may be the factors that have contributed to the systematic differences found in our study. There was significant systematic difference between all the observers and the MRI scan, indicating that the scan is more accurate in diagnosing disc disease in the lumbar spine. It is difficult to identify specific reasons for different diagnostic thresholds between the observers, although eliminating this could improve the diagnostic capability of the doctors8,20.
In conclusion, we found in our study that there is wide variation in diagnosing lumbar disc disease between the Orthopaedic Surgeons and the Radiologists at our institution. There is also systematic difference in the interpretation between all observers. These variations and differences are particularly significant on comparing it to the MRI scan. The General Orthopaedic Surgeon, the General Radiologist, and the Spine Surgeon were diagnosing less degenerate discs. The Spine Radiologist, and the Spine Nurse Practitioner were diagnosing more degenerate discs. Due to this amount of inaccuracy, it is risky to comment on degenerate disc disease on a plain radiograph alone. Therefore MRI scan should only be used to comment on the disc diseases of the low back, as plain radiographs are unreliable.
ACKNOWLEDGMENT
The authors are indebted to Mr DS Barrett, Dr M Sampson, Dr SJ Birch, and Miss L Tarplett for their accurate assistance and endurance in helping with this study.
References
1. Modic MT, Masaryk TJ, Ross JS, Carter JR. Imaging of degenerative disk disease. Radiology.1988;168:177–186. [PubMed]
2. Marchiori DM, McLean I, Firth R, Tatum R. A Comparison of Radiographic Signs of Degeneration to Corresponding MRI Signal Intensities in the Lumbar Spine. Journal of Manipulative and Physiological Therapeutics. 1994;17:238–245. [PubMed]
3. Modic MT, Pavlicek W, Weinstein MA, Boumphrey F, Ngo F, Hardy R, Duchesneau PM. Magnetic resonance imaging of intervetebral disc disease. [Radiology]. 1984;152:103–111. [PubMed]
4. Schneiderman G, Flannigan B, Dingston S, Thomas J, Dillin WH, Watkins RG. Magnetic resonance imaging in the diagnosis of disc degeneration: correlation with discography. Spine. 1987;12:276–281.[PubMed]
5. Sether LA, Yu S, Haughton VM, Fischer ME. Intervetebral disk: normal age related changes in MR signal intensity. Radiology. 1990;177:385–388. [PubMed]
6. Paajanen H, Erkintalo M, Kuusela T, Dahlstrom S, Kormano M. Magnetic resonance study of disc degeneration in young low back pain patients. Spine. 1989;14:982–985. [PubMed]
7. Deyo RA, McNiesh LM, Cone RO. Observer variability in the interpretation of lumbar spine radiographs.Arthritis and Rheumatism. 1985;28:1066–1070. [PubMed]
8. Espeland A, Korsbrekke K, Albrektsen G, Larsen JL. Observer variation in plain radiography of the lumbosacral spine. The British Journal of Radiology. 1998;71:366–375. [PubMed]
9. Frymoyer JW, Newberg A, Pope MH, Wilder DG, Clements J, MacPherson B. Spine radiographs in patients with low back pain: an epidemeological study in men. J Bone Joint Surg. 1984;66-A:1048–1055.[PubMed]
10. Resnick D. Degenerative diseases of the vertebral column. Radiology. 1985;156:3–14. [PubMed]
11. Quinnell RC, Stockdale HR. The significance of osteophytes on lumbar vertebral bodies in relation to discographic findings. Clin Radiol. 1982;33:197–203. [PubMed]
12. Kellgren JH, Lawrence JS. Osteoarthrosis and disk degeneration in an urban population. Ann Rheum Dis. 1958;17:388–397. [PMC free article] [PubMed]
13. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics.1977;33:159–174. [PubMed]
14. Bowker AH. Bowker's test for symmetry. Journal of the American Statistical Association. 43:572–574.[PubMed]
15. Andersson GBJ, Schultz A, Nathan A, Irstam L. Roentgenographic measurement of lumbar intervetebral disc height. Spine. 1981;6:154–158. [PubMed]
16. Bellamy N, Hewhook L, Rooney PJ, et al. Perception-a problem in the grading of sacroiliac joint radiographs. Scand J Rheumatol. 1984;13:113–120. [PubMed]
17. Coste J, Paolaggi JB, Spira A. Reliability of interpretation of plain radiographs in benign, mechanical low-back pain. Spine. 1991;16:426–428. [PubMed]
18. Frymoyer JW, Phillips RB, Newberg AH, MacPherson BV. A comparative analysis of the interpretation of lumbar spine radiographs by chiropractors and medical doctors. Spine. 1986;11:1020–1023. [PubMed]
19. Ker M. Issues in the use of kappa. Invest Radiol. 1991;26:78–83. [PubMed]
20. Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures. Br Med J. 1992;304:1491–1494. [PMC free article] [PubMed]
Nenhum comentário:
Postar um comentário