1 Evidence-Based Outcomes on Diagnostic Accuracy of Quantitative Ultrasound for Assessment of Pediatric Osteoporosis - a Systematic Review AUTHORS: Kuan Chung Wang1; Kuan Chieh Wang1; Afsaneh Amirabadi1; Edward Cheung1; Elizabeth Uleryk2; Rahim Moineddin3; Andrea S. Doria1 INSTITUTIONS: 1. Department of Diagnostic Imaging, The Hospital for Sick Children, Toronto, ON, Canada. 2. Library Services, The Hospital for Sick Children, Toronto, ON, Canada. 3. Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada CORRESPONDING AUTHOR: Andrea S. Doria, The Hospital for Sick Children, Department of Diagnostic Imaging, University of Toronto, Toronto, ON, Canada,Phone: 416-813-6079, Fax: 416-813-7591, email: andrea.doria@sickkids.ca 2 List of Appendices APPENDIX 1 Definitions of standard terminology on bone characteristics and clinimetric properties of quantitative ultrasound (QUS) evaluated in this systematic review APPENDIX 2 Search strategy for identification of studies that fulfilled the inclusion criteria of this systematic review APPENDIX 3 Detailed criteria for the Standard for Reporting of Diagnostic Accuracy (STARD) assessment [33] APPENDIX 4 Detailed criteria for Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool assessment [34] APPENDIX 5 Categorization of study design according to the U.S. Preventive Services Task Force [39] APPENDIX 6 Levels of recommendation of results according to the guidelines of the U.S. Preventive Services Task Force [39] APPENDIX 7 Scanning method employed by the included studies APPENDIX 8 Assessment of Standard for Reporting of Diagnostic Accuracy (STARD) items of included studies [33] APPENDIX 9 Detailed Assessment of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS2) tool items of included studies [34] Appendix Fig. 1 Graphical display of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) assessment of methodological quality of selected studies. (a) risk of bias (b) applicability concern APPENDIX 10 Summary assessment of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool items of included studies [34] Appendix Reference 3 APPENDIX 1 Definitions of standard terminology on bone characteristics and clinimetric properties of quantitative ultrasound (QUS) evaluated in this systematic review “Bone Strength” refers conceptually to the overall mechanical competence of the bone, or its ability to withstand failure, such that bones fracture when applied loads exceed their strength [49]. It is the integration of two main features: bone density and “bone quality” [50]. Architectural parameters such as polar moment of inertia and section modulus have been devised to allow calculation of the strength of a structure from the amount and distribution of its raw material [51]. “Bone Quality” is a combination of factors including bone architecture, turnover, damage accumulation and mineralization play an important role in the assessment of fracture risk and are jointly referred to “bone quality” [52]. “Reliability” is the ability of a diagnostic tool to produce similar results under similar conditions but operated by different people and instruments at different times and places [53]. In this review, two types of reliability were evaluated for the data acquisition aspect. Intra-operator coefficient of variation for data acquired by the same operator. Inter-operator coefficient of variation for data acquired by more than two operators. “Construct Validity” is the extent to which a index test (diagnostic tool under investigation) is related to measures other than the gold/reference standard of a physical phenomenon [53]. Studies that looked for correlations between QUS' results and other measures' results, for instance, biochemical markers, were also evaluated for construct validity. In this review, due to the variety of different opinions on DXA being the gold/reference standard for detecting osteoporosis in children, the studies that looked for correlations between DXA's results and QUS' results but did not consider DXA as the gold/reference standard were also evaluated and analyzed for the construct validity of QUS. “Criterion Validity” is the extent to which a diagnostic tool under investigation is related to the gold/reference standard of measuring of a physical phenomenon [53]. In this context, the index test's abilities to detect osteoporosis and to predict skeletal fractures were investigated by comparing to the 4 gold/reference standard. Despite the lack of consensus on DXA being the gold/reference standard for diagnosing osteoporosis in pediatric patients. In this review, studies that considered DXA as the reference standard and tested the diagnostic accuracy or the predictive value of QUS were evaluated for the criterion validity of QUS due to the research team's belief on the proven abilities and merits of DXA. “Responsiveness” is the extent to which the results of a diagnostic tool correspond to the changes over time in the conditions of the physical phenomenon [53]. In this review three types of studies were considered for the evaluation of responsiveness of QUS: 1) Studies that compared results of QUS before and after treatment of any given intervention intended to improve the patient's bone status. 2) Studies that investigated the correlation between changes in QUS' results and changes in other measures' results longitudinally [48]. 3) Due to the lack of consensus in the literature on what constitutes responsiveness and how to quantify it [48], in this review, studies that qualitatively investigated the longitudinal changes in QUS measurements were also included. 5 APPENDIX 2 Search strategy for identification of studies that fulfilled the inclusion criteria of this systematic review The searches were run using the OvidSP search platform in the following databases: MEDLINE, EMBASE, and EBM Reviews – Cochrane Central Register of Controlled Trials (CCTR) to include articles indexed as of May 27, 2013. The search strategy retrieved a total of 262 references. All references were saved in an EndNote library used to identify the 47 duplicates. The remaining 215 unique references were reviewed against the inclusion criteria. The following tables record the search strategies and terms used in each of the databases. Search results were limited to evidence-based study design methodologies, and age group (children 0-18 years). 6 MEDLINE: The search strategy for OvidSP MEDLINE (1946 to May 27, 2013) retrieved 73 references of which 73 were unique and not duplicated in our other searches. I used a combination of MeSH and free text terms for: Set 1 History ("quantitative ultrason*" or (quantitative adj2 (sonograph* or ultrason* or ultrasoun*)) or qus or "quantitative sonograph*").mp. or exp ultrasonography/ Bone Density/ or exp "Bone and Bones"/ or (bone adj2 densit:).mp. bone diseases/ or bone diseases, metabolic/ or exp osteoporosis/ or osteopenia:.mp. 1 and 2 and 3 ("clinical trial, all" or clinical trial).pt. or clinical trials as topic/ or clinical trial, phase i.pt. or clinical trials, phase i as topic/ or clinical trial, phase ii.pt. or clinical trials, phase ii as topic/ or clinical trial, phase iii.pt. or clinical trials, phase iii as topic/ or clinical trial, phase iv.pt. or clinical trials, phase iv as topic/ or controlled clinical trial.pt. or controlled clinical trials as topic/ or meta-analysis.pt. or meta-analysis as topic/ or multicenter study.pt. or multicenter studies as topic/ or randomized controlled trial.pt. or randomized controlled trials as topic/ or evaluation studies.pt. or evaluation studies as topic/ or validation studies.pt. or validation studies as topic/ or "sensitivity and specificity"/ or predictive value of tests/ or roc curve/ or diagnostic errors/ or false negative reactions/ or false positive reactions/ or observer variation/ or likelihood functions/ or (likelihood or likelihood ratio:).tw. or predictive value of tests/ or cohort studies/ or longitudinal studies/ or follow-up studies/ or prospective studies/ or case-control studies/ or retrospective studies/ or cross-sectional studies/ Results 237263 Comments Quantitative ultrasound terms 491973 Bone density Terms Bone diseases terms Base Clinical Set Evidence-based study design terms 6 4 and 5 514 Evidence-based Filter results 7 limit 6 to "all child (0 to 18 years)" 73 FINAL Results 2 3 4 5 65776 899 2827508 7 EMBASE The search strategy for OvidSP Embase Classic+Embase <1947 to 2013 Week 21> retrieved 184 references of which 142 were unique and not duplicated in our other searches. I used a combination of EMBASE and free text terms for: Set 1 2 3 4 5 6 7 8 History ultrasound/ or exp echography/ or ("quantitative ultrason*" or (quantitative adj2 (sonograph* or ultrason* or ultrasoun*)) or qus or "quantitative sonograph*").mp. exp bone/ or bone density/ or (bone adj densit:).mp. Results 564441 bone disease/ or metabolic bone disease/ or exp osteoporosis/ or osteopenia:.mp. 1 and 2 and 3 117496 ct.fs. or phase 1 clinical trial/ or phase 2 clinical trial/ or phase 3 clinical trial/ or phase 4 clinical trial/ or controlled clinical trial/ or multicenter study/ or meta analysis/ or randomized controlled trial/ or clinical trial/ or crossover procedure/ or double blind procedure/ or single-blind procedure/ or triple blind procedure/ or (random* or (doubl* adj2 dummy) or ((singl* or doubl* or tripl* or trebl*) adj25 (mask* or blind*)) or rct or rcts or (control adj25 trial*) or multicent* or placebo* or metaanalys* or (meta adj5 analys*) or sham or effectiveness or efficacy or compar*).ti,ab. or (cochrane or medline or cinahl or embase or CCTR or scopus or "web of science" or lilacs).ti,ab. or "sensitivity and specificity"/ or diagnostic error/ or false negative result/ or false positive result/ or ((diagnostic adj5 error*) or (false adj5 negative*) or (false adj5 positive*)).mp. or "prediction and forecasting"/ or prediction/ or observer variation/ or receiver operating characteristic/ or ("roc curve*" or (roc adj2 curve*)).mp. or reproducibility/ or reliability/ or cronbach alpha coefficient/ or internal consistency/ or interrater reliability/ or intrarater reliability/ or item total correlation/ or kuder richardson coefficient/ or split half correlation/ or test retest reliability/ or laboratory diagnosis/ or abnormal laboratory result/ or likelihood functions/ or (likelihood or (likelihood adj2 ratio*)).mp. or ((evaluation or validation) adj2 (study or studies)).ti,ab. or validation study/ or cohort analysis/ or longitudinal study/ or prospective study/ or case control study/ or hospital based case control study/ or population based case control study/ or retrospective study/ 4 and 5 6749428 limit 6 to (infant <to one year> or child <unspecified age> or preschool child <1 to 6 years> or school child <7 to 12 years> or adolescent <13 to 17 years>) (infan* or neonat* or child* or adolescen* or teen* or girl* or boy* or youth* or tot or tots or toddler* or paediatric* or 133 678182 2165 1181 3354291 Comments Quantitative ultrasound terms Bone density Terms Bone diseases terms Base Clinical Set Evidence-based study design terms Evidence_based Filter results Age group Textword 8 9 pediatric*).mp. 7 or (6 and 8) 184 search terms FINAL Results 9 EBM Reviews - Cochrane Central Register of Controlled Trials The search strategy for OvidSP Cochrane Central Register of Controlled Trials <April 2013> retrieved 5 references of which 0 were unique and not duplicated in our other searches. This database consists exclusively of RCTs, no study design terms were used. I used a combination of primarily MeSH and free text terms for Set 1 History ("quantitative ultrason*" or (quantitative adj2 (sonograph* or ultrason* or ultrasoun*)) or qus or "quantitative sonograph*").mp. or exp ultrasonography/ or ultrasound/ or exp Ultrasonography/ (5983) Results 6127 Comments Quantitative ultrasound terms 2 Bone Density/ or exp "Bone and Bones"/ or exp "Bone and Bones"/ or (bone adj2 densit:).mp. bone diseases/ or bone diseases, metabolic/ or bone disease/ or metabolic bone disease/ or exp osteoporosis/ or osteopenia:.mp. 1 and 2 and 3 11154 (infan* or neonat* or child* or adolescen* or teen* or girl* or boy* or youth* or paediatric* or pediatric*).mp. 4 and 5 130945 Bone density Terms Bone diseases terms Base Clinical Set Age group textword terms Final Results 3 4 5 6 2848 42 5 10 APPENDIX 3 Detailed criteria for the Standard for Reporting of Diagnostic Accuracy (STARD) assessment [33] STARD Item # Item1 Item2 Item3 Item4 Item5 Description Identify the article as a study of diagnostic accuracy State the research questions or study aims (in abstract or introduction) Inclusion/exclusion criteria + location Describe participant recruitment Sampling (in method) Score 1 See words: Sensitivity and specificity, diagnostic accuracy, reliability, responsiveness, validity, comparison, correlation The research question helps the reader to predict the method used (including the tools and statistics they will be using). ("compare" is good enough) Must have at least one inclusion/exclusion criteria, AND location of study Show if patients underwent index test, reference standard, or neither before they were considered eligible. (Implicit indication is fine) Homogeneity (constitution of patient spectrum) AND sampling technique -- -- One of inclusion/exclusion criteria, OR location of study -- Only homogeneity OR only sampling technique Score 0 No key word found The research question misleads reader or leaves unanswered questions Missing inclusion/exclusion criteria AND location When readers cannot answer the above question Neither N/A -- -- -- -- -- Score 0.5 11 Item # Item6 Item7 Item8 Item9 Item10 Describe the tests and reference standard(just QUS and DXA) Describe definition of and rationale for the units, cutoffs and/or categories of the index tests and the reference standard Number and training of operators for both the index test and reference standard Description Describe data collection What's the reference standard and what's the rationale Score 1 If the readers understand whether it is a prospective or retrospective study (implicit indication is fine) DXA, OR clear indication of which patients are considered to have the target condition (osteoporosis) Details that enable reader to reproduce all tests, except for information about operators Parameter, unit, cutoff, and their rationales when necessary Both number and training for all scans Score 0.5 -- -- -- -- Only one of number or training Score 0 When readers cannot understand whether it is a prospective or retrospective study When there's reference standard, but no rationale of why they chose it When there is not enough details for readers to reproduce the scans -- When the research question does not include criterion validity (eg. comparison study where they don't indicate DXA as reference standard) -- N/A Missing both If n/a for item 7 -- 12 Item # Item11 Item12 Item13 Item14 Item15 Description If the operators are blinded Describe the statistics used Method of calculating reliability Beginning and ending dates of recruitment Demographic characteristics Exactly what statistical tests are used If we can reproduce the design (name of statistics, and how data was collected for this purpose) Note: vague names like “precision” are not acceptable Both dates Table of (height, weight, age, gender) -- Missing one of the name of the statistics used, or the data collection -- (age, gender) Score 1 Indicate whether the operators are blinded Score 0.5 Score 0 No indication of whether the operators are blinded No explanation of statistical methods Missing both Missing either date Not enough details to score 0.5 or 1. N/A -- -- -- -- -- 13 Item # Description Item16 Describe why some participants failed to receive the test Item17 Item18 Tme interval between all tests. Report distribution of severity of disease in those with the target condition (in results) Item19 Item20 Cross tabulation of index test results by the results of the reference standard NOT APPLICABLE because it is established that QUS has no adverse effect Enough to reproduce the statistics for diagnostic accuracy -- (spectrum bias) Score 1 Explain why some patients didn't get scan, or explicit statement that all recruited patients were successfully scanned Time interval between all testes is reported Spectrum of (primary) disease severity OR subtypes of the disease. Subgrouping is not necessary Score 0.5 Give the number of patients failed, but not rationale, OR when there is no difference in the number of patients recruited and scanned but there's no explicit statement -- -- -- -- Score 0 Not enough details to score 0.5 or 1 Time interval between all testes is not reported Not enough details to score 1 Not enough to reproduce the statistics for diagnostic accuracy -- N/A -- When there's only one test -- Item 7 = n/a -- 14 Item # Item21 Item22 NOT APPLICABLE because QUS only gives number, so there won't be any indeterminate results Description Uncertainty of result Score 1 Report the uncertainty of results pertaining to the main objective(CI) -- Score 0.5 -- -- Score 0 No uncertainty reported -- N/A -- -- Item23 Item24 Item25 Report difference of results in different subgroups of participants Result of reliability Discuss clinical applications Report difference of result (pertaining to the main objective) in different subgroups -Either results of different subgroups were not reported, or there was no subgrouping of participant -- There's reporting of the result of reliability (specify CV or ICC) -- 1. Report the limitation of the study 2. Interpretation of result in clinical setting Reported one of the two Rhere was no reporting of the result of reliability Neither was reported -- -- Modifications have been made to the STARD tool to best comply with the objectives of this review: 1) items (1, 2, 12, 23) that are strictly applicable to criterion validity were modified to include other clinimetric properties evaluated in this review; 2) items 12 and 17 were modified to be applicable to studies that did not have a reference standard. Items 20 and 22 were not included in the analysis because QUS is proven to have no adverse effect and since the output consists of continuous variables indeterminate results are unexpected for most studies. 15 APPENDIX 4 Detailed criteria for Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool assessment [34] QUADAS - Risk of Bias Aspect Domain Patient Selection Low Risk: all items need to be 1 Rules of the Overall Assessment of the Domain High risk: all items need to be 0/0.5 Unclear: Otherwise Item # Item 1 Item 2 Item 3 Description Describe methods of patients selection Describe methods of patients selection Describe methods of patients selection Signaling Questions 1: Was a consecutive or random sample of patients enrolled? 2: Was a case–control design avoided? 3: Did the study avoid inappropriate exclusions? Score 1=yes (low risk of bias) Randomized or consecutive enrollment Non-case-control study OR If the authors compare the test measurements with a reference (standard, or reference values) Appropriate exclusion Score 0.5=unclear If the sampling technique is not mentioned -- Not indicated the exclusion Score 0= no(high risk of bias) Accrue some patients and others not from a pre-selected group of patients Case-control study OR If they compare the test measurements with values from a control group Inappropriate exclusion OR inclusion(ex. Only included white patients) 16 Score N/A -- -- Domain -- Reference Standard Low Risk: all items need to be 1 High risk: all items need to be 0/0.5 Rules of the Overall Assessment of the Domain Unclear: Otherwise N/A: If both items N/A Item # Item 6 Item 7 Description Describe the reference standard and how it was conducted and interpreted Describe the reference standard and how it was conducted and interpreted Signaling Questions 1: Is the reference standard likely to correctly classify the target condition? 2: Were the reference standard results interpreted without knowledge of the results of the index test? Score 1=yes (low risk of bias) Yes 1. Reference standard results interpreted prior to index test OR; 2. Blinded to results of index test Score 0.5=unclear -- Not indicated Score 0= no(high risk of bias) No 1. Reference standard results interpreted after the index test OR; 2. Not blinded to results of index test Score N/A No reference standard No reference standard 17 Domain Flow and Timing Rules of the Overall Assessment of the Domain Item # Low Risk: all items need to be 1 High risk: all items need to be 0/0.5 Unclear: Otherwise Item 9 Item 8 Item 10 Description Describe the interval and any interventions between index tests and the reference standard Describe any patients who did not receive the index tests or reference standard Describe any patients who did not receive the index tests or reference standard Signaling Questions 1: Was there an appropriate interval between the index test and reference standard? 2: Did all patients receive the same reference standard? 3: Were all patients included in the analysis? Score 1=yes (low risk of bias) The delay between the tests (index tests/reference standard) are unlikely to cause the target condition to change status All patients, or a random selection of patients, who received the index test went on to receive verification of their disease status using the same refer reference standard 1. All participants recruited into the study should be included OR; 2. Give the rationale for withdrawal and withdrawals unlikely to affect the results Score 0.5=unclear 1. Does not state the length of delay between QUS and DXA If this information is not reported by the study Not specified whether have withdrawals or not Score 0= no(high risk of bias) The delay between the 2 tests are likely to cause the target condition to change status 1. If some of the patients who received the index test did not receive verification and the selection of patients to receive the reference standard was not random OR; 2. Patients did not receive verification of their 1. NOT all participants recruited into the study are included OR; 2. No rationale for withdrawal OR; 3. Inappropriate reasons for 18 Score N/A -- condition using the same reference standard withdrawals no reference standard -- QUADAS - Applicability Concern Aspect Domain Patient Selection Index Test Reference Standard Item # Item 11 Item 12 Item 13 Description Describe methods of patients selection Describe the index test and how it was conducted and interpreted Describe the reference standard and how it was conducted and interpreted Signaling Questions Do the included patients of the study match the targeted patients of the review study? Does the index test method/conduct match the research question of the review study? Does the condition defined by the reference standard match with the target condition of the review study? Score 1=yes (low risk of applicability concerns) Yes Yes Yes Score 0.5=unclear -- -- -- Score 0= no(high risk of applicability concerns) The patients included in the study does not match the targeted patients of the review study No No Score N/A -- -- No reference standard 19 Step-wise Assessment For QUADAS-2, studies were rated as having 'adequate' methodological quality if both the 'risk of bias' and the 'applicability concern' aspects were rated as 'low risk', or if one of the two was rated as 'low risk' and the other was rated as 'moderate risk'. All other studies were rated as 'inadequate' studies. Since the majority of studies did not investigate the criterion validity of QUS, the "reference standard" domain of the 'risk of bias' aspect was not applicable for most studies. Furthermore, the "patients’ selection" domain in the 'risk of bias' aspect played a minor role in the assessment process because it is applicable to randomized control trial which was not the main focus of this review. As a result, for studies that used an reference standard, studies were rated as having 'high' 'risk of bias' if 2/3 or all 3 ("flow and timing", "index test", and "reference standard")domains were rated as 'high risk' or 'unclear risk'. Studies were rated as 'low' 'risk of bias', if all three domains were rated as 'low risk'. All other studies were rated as 'moderate risk'. The same criteria were used for assessing the 'applicability concern' with regard to "patient selection", "index", "reference" domains. For studies that did not use a reference standard, studies were rated as having 'high' 'risk of bias' if both the "flow and timing " and "index test" domains were rated as 'high risk' or 'unclear risk’. Studies were rated as 'low' 'risk of bias' if both domains were rated as 'low risk'. All other studies were rated as 'moderate risk'. The same criteria were used for assessing the 'applicability concern' aspect with regard to "patient selection" and "index" domains. 20 APPENDIX 5 Categorization of study design according to the U.S. Preventive Services Task Force [39] I: II: III: IV: V: Level I: randomized control trial Level II-1: controlled trials without randomization Level II-2: cohort or case-control study Level II-3: time series study Level III: expert opinion APPENDIX 6 Levels of recommendation of results according to the guidelines of the U.S. Preventive Services Task Force [39] A: B: C: D: E: strong recommendation for clinicians to routinely provide the service to eligible patients. fair recommendation for clinicians to routinely provide the service to eligible patients no recommendation for or against routine provision of the service due to conflicting evidence recommendation against the provision of the service insufficient evidence to make recommendation 21 APPENDIX 7 Scanning method employed by the primarystudies First author’s last name / publication year Ahuja 2006 [3] QUS parameter Roggero 2007 [27] SOS Altuncu 2007 [40] SOS Azcona 2003 [5] SOS Baroncelli 2003 [14] SOS Cepollaro 2001 [30] SOS Falcini 2000 [4] BUA BUA Machine/Probe Scanning Method Patient position; anatomic area scanned QUS-2 (Quidel, San Diego, CA) Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Not described; acoustic edges at the back and bottom of the calcaneus Not described; The site of measurement on the tibia was determined by measuring the distance from knee to heel. The infant’s leg was marked at half of this distance Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) At the heel by Lunar Achilles Plus ultrasonometer (GE Lunar, Madison, WI, USA) and at phalanxes by DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) Not described; The measurement site was defined as the midpoint between the apex of the calcaneus and the proximal patellar apex. Not described; Distal metaphysis of the proximal halanxes of the last four fingers of the nondominant hand Not described; Speed of sound through the distal end of the first phalangeal diaphysis of the last four fingers of the hand Not described; Heel, phalanxes by instructions provided by the manufacturer 1 MHz transducers mounted in hand-held calipers linked to the pediatric contact US bone analyzer (McCue Ultrasonics Limited, Not described; Measured at calcaneal level using two 12.5 mm diameter, Probe placement Not described The probe was aligned along and parallel to the bone and moved in an arc over the circumference of the site of measurement until a reliable estimate of the SOS was obtained Not described Not described Condyles at the distal diaphysis Not described No described 22 Compton, Winchester, UK) First author’s last name / publication year Fewtrell 2008 [6] QUS parameter Machine/Probe Patient position; anatomic area scanned Probe placement SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Flexed foot and the dorsal aspect of the flexed knee; The point on the tibia mid way between the plantar aspect of the flexed foot and the dorsal aspect of the flexed knee The probe was aligned along and parallel to the bone and moved around in a small portion of an arc over the tibia until a measurement of the SOS was obtained Fielding 2003 [29] BUA, SOS Lunar Achilles Plus ultrasonometer (GE Lunar, Madison, WI, USA) Not described; The left heel was studied unless subjects reported a history of left foot trauma (two subjects). Not described Gianni 2007 [41] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Not described; Radius and tibia Not described Gonnelli 2008 [10] AD-SOS, BTT DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) Not described; The distal end of the first phalangeal diaphysis in the proximity of the condyles of the last four fingers of the hand Lateral and medial surfaces of each finger Hartman 2004 [18] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) patient seated and the knee flexed 90 degrees; The distal radius examination site corresponded to the point halfway between the edge of the olecranon and the tip of the distal phalanx of the outstretched third digit of the left hand. Midtibia SOS measurements were performed with the patient seated and the knee flexed 90 degrees. The site of examination was the point halfway between the edge of the heel and the A specialized pediatric transducer was placed on the marked site of measurement and rotated without lifting the transducer from the skin 23 proximal edge of the knee First author’s last name / publication year Lequin 2002 [9] QUS parameter Machine/Probe Patient position; anatomic area scanned Probe placement SOS SoundScan Compact (Myriad Ultrasound Systems Ltd Rehovot, Israel; software version1e) Not described; Right tibia at the mid-tibial point. The mid-tibial point was defined as the midpoint of the line between the apex of the medial malleolus and the distal patellar apex Not described Levine 2002 [2] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Patient seated and the knee flexed 90 degrees.; Midtibia and the distal third of the radius as follows: The point of placement for the radius reading was defined as the point halfway between the edge of the olecranon to the tip of the distal phalanx of the third digit of the left hand. The site was then marked. A specialized pediatric transducer was placed on the marked site of measurement and rotated without lifting the transducer from the skin The point of measurement was the point halfway between the edge of the heel and the proximal edge of the knee. Litmanovitz 2003 [25] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Not described; The measurement site was defined as the midpoint between the apex of the medial malleolus and the distal patellar apex Probe was moved across the mid-tibial plane, searching for the site with maximal reading 24 Litmanovitz 2007 [21] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Not described; The measurement site was defined as the midpoint between the apex of the medial malleolus and the distal patellar apex Probe was moved across the mid-tibial plane, searching for the site with maximal reading First author’s last name / publication year McDevitt 2007 [26] QUS parameter Machine/Probe Patient position; anatomic area scanned Probe placement SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel). Not described; Tibia Not described Mussa 2010 [8] AD-SOS, BTT DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) Not described; Distal metaphyses of the proximal phalanges of fingers II-V of the patient's hand Not described Njeh 2000 [1] SOS Myriad SoundScan 2000 Not described; This site is defined as the midpoint between the distal apex of the medial malleolus and the distal aspect of the patella Probe measures SOS along a defined and fixed longitudinal distance of the cortical layer of the midtibia, parallel to its long axis. The measurement is performed by placing the probe at the midtibial plane parallel to the longitudinal axis of the bone Oswiecimska 2007 [31] AD-SOS DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) Not described; Proximal phalanges of fingers II–V of the right hand Not described Tomlinson 2006 [23] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Flexed foot and the dorsal aspect of the Flexed Probe was aligned along and parallel to the bone and moved in a semi arc over the 25 Medical, Tel Aviv, Israel) knee; Measurement on the tibia was determined by identifying the midpoint between the plantar aspect of the flexed foot and the dorsal aspect of the flexed knee (mid shaft of the tibia) circumference of the site of measurement until a reliable 1estimate of the SOS was measured First author’s last name / publication year Christoforidis 2011 [16] QUS parameter Machine/Probe Patient position; anatomic area scanned Probe placement SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Not described; distal third of the radius and midshaft tibia Not described Pietkiewicz 2010 [42] SOS, BUA, Stiffness Lunar Achilles Plus ultrasonometer (GE Lunar, Madison, WI, USA) Not described Not described Mora 2012 [43] SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Flexed foot and the dorsal aspect of the flexed knee; Point on the tibia located midway between the plantar aspect of the Two probes supplied by the manufacturer and designed for use to measure SOS in neonatal mid-tibia location (CS probe with a contact surface of approximately 2.5×1 cm) and SOS in infants and children midtibia location (CM probe with a contact surface of approximately 4.2×1.3 cm) were used as recommended by the manufacturer . 26 Lam 2011 [24] SOS, BUA, Stiffness Heel QUS machine (Paris, Norland Medical System, Fort Atkinson, Not described Not described Wisconsin ) Mussa 2010 [44] BTT, ADSOS DBM sonic-bone profiler 1200 (IGEA, Carpi, Modena, Italy) Not described; Distal metaphyses of the proximal phalanges of fingers II to V of participants’ dominant hand The caliper has been placed on the distal metaphysis of the phalanx and scanned with rotational movement around the axis of the finger First author’s last name / publication year Pereira-da-Silva 2011 [7] QUS parameter Machine/Probe Patient position; anatomic area scanned Probe placement SOS Sunlight Omnisense 7000 PTM scanner (Sunlight Medical, Tel Aviv, Israel) Flexed foot and the dorsal aspect of the flexed knee; The midpoint between the plantar aspect of the flexed foot and the dorsal aspect of the flexed knee (midshaft of the tibia), Aligned along and parallel to the bone and moved in a semiarc over the circumference of the site of measurement until a reliable estimate of the SOS was measured Sani 2011 [11] SOS, BUA UBIS 5000 bone sonometer (Diagnostic Medical Systems, Montpellier, France) Not described; Automatic Automatic Abbreviations: AD-SOS, amplitude dependent speed of sound;SOS, speed of sound; BUA, broadband ultrasound attenuation; BTT, bone transmission time. 27 APPENDIX 8 Assessment of Standard for Reporting of Diagnostic Accuracy (STARD) items of primary studies [33] STARD First author’s last name / publication year Ahuja 2006 [3] Roggero 2007 [27] Altuncu 2007 [40] Azcona 2003 [5] Baroncelli 2003 [14] Cepollaro 2001 [30] Falcini 2000 [4] Fewtrell 2008 [6] Fielding 2003 [29] Gianni 2007 [41] Gonnelli 2008 [10] Hartman 2004 [18] Lequin 2002 [9] Levine 2002 STARD Item Number 12 13 14 15 16 Total % 1 14 60.1 1 0.5 15.5 81.6 1 0 1 13 65 n/a 1 1 0.5 15 65.2 0 n/a 1 1 1 12 60 n/a 0 n/a 1 1 1 13 65 n/a n/a 1 n/a 1 1 0.5 15.5 77.5 1 n/a n/a 0 n/a 1 1 0.5 13.5 67.5 1 1 0 n/a 1 n/a 1 1 1 17 73.9 0 0 0 n/a n/a 0 n/a 0 1 1 10.5 52.5 1 1 0 1 n/a n/a 1 n/a 1 1 1 16 80 0 1 1 1 1 n/a n/a 0 n/a 1 1 1 14.5 72.5 0 0 1 0.5 0 1 n/a n/a 0 n/a 0 0 0.5 9 45 0 0 1 0.5 0 1 n/a n/a 0 n/a 1 0 1 10 50 1 2 3 4 5 6 7 8 9 10 11 17 18 19 20 21 22 23 24 25 1 1 1 1 0.5 1 1 1 0 0.5 0 1 0 1 1 0 0 1 0 n/a 0 n/a 1 0 1 1 0.5 1 1 1 n/a 1 n/a 0.5 0 1 1 1 1 1 n/a 1 n/a n/a 0 n/a 1 1 1 0.5 1 0.5 1 n/a 1 n/a 0 0 1 0 1 1 0 1 1 n/a n/a 0 n/a 1 1 1 1 0 1 1 1 0 0 0 1 0.5 1 1 1 0 1 0 n/a 0 1 1 0.5 0 0.5 1 n/a 1 n/a 0 0 1 0.5 0 1 0.5 0 1 n/a n/a 1 0 1 1 1 1 n/a 0 n/a 0 0 1 0.5 1 1 0.5 0 1 n/a 1 1 0.5 1 1 1 n/a 1 n/a 0.5 0 1 1 0 1 1 0 1 0 1 1 1 0.5 1 n/a 1 n/a 0.5 0 1 1 1 0 1 0 1 1 0.5 1 0 1 1 1 0 0.5 0 1 1 0 1 1 1 1 1 1 0 1 n/a 1 n/a 0.5 0 1 0.5 0 0.5 0 0 1 1 0.5 1 n/a 1 n/a 0.5 1 1 1 1 1 1 0.5 1 0 1 n/a 1 n/a 0.5 0 1 0.5 0 1 0.5 1 0.5 1 n/a 1 n/a 0 0 1 1 0 0.5 1 0 1 n/a 1 n/a 0 0 1 28 [2] First author’s last name / publication year Litmanovitz 2003 [25] Litmanovitz 2007 [21] McDevitt 2007 [26] Mussa 2010 [8] Njeh 2000 [1] Oswiecimska 2007 [31] Tomlinson 2006 [23] Christoforidis 2011 [16] Pietkiewicz 2010 [42] Mora 2012 [43] Lam 2011 [24] Mussa 2010 [44] Pereira-daSilva 2011 [7] Sani 2011 [11] STARD Item Number 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 0 1 1 1 0 1 n/a 1 n/a 0.5 1 1 0 0 1 0 1 1 1 0 1 n/a 1 n/a 0.5 1 1 0 0 1 1 1 1 0 1 n/a 1 n/a 0.5 0 1 0.5 1 1 1 1 1 1 1 1 0 0.5 0 1 1 1 1 1 0.5 1 1 1 0 0 1 1 n/a n/a 1 1 n/a n/a 0 0 0 0 1 1 1 1 0 1 n/a 1 n/a 0.5 1 1 1 1 0.5 1 n/a 1 n/a 1 1 0.5 1 0.5 0 n/a 0 1 1 0.5 1 1 1 n/a 1 1 1 1 0.5 1 1 1 1 1 0.5 0 1 1 1 1 1 1 1 Total 17 18 19 20 21 22 23 24 25 1 1 0 n/a n/a 0 n/a 1 0 0.5 12 60 1 1 1 0 n/a n/a 0 n/a 1 0 0.5 12 60 1 1 0.5 0 1 n/a n/a 0 n/a 1 1 1 14.5 72.5 1 1 1 1 n/a 1 0 n/a 0 n/a 1 1 0.5 17 77.3 1 1 0.5 1 0 0 1 1 0.5 0.5 0 0 1 1 n/a n/a n/a n/a 0 1 n/a n/a 1 1 0.5 1 1 1 12 14.5 60 72.5 0 1 0.5 1 1 1 n/a 1 n/a n/a 0 n/a 1 1 0.5 14.5 76.3 0 0 1 1 0 1 1 0 1 n/a n/a 0 n/a 1 0 0.5 13 65 n/a 0 0 1 0 0 0 0.5 0 1 n/a n/a 0 n/a 1 0 0.5 8 40 1 n/a 0 0 1 1 0 1 0 0 0 n/a n/a 0 n/a 1 1 1 12.5 62.5 n/a 1 n/a 0 0 1 1 0 1 1 0 1 n/a n/a 1 n/a 1 1 0.5 15 75 1 n/a 1 n/a 0.5 0 1 1 1 1 1 n/a 1 n/a n/a 0 n/a 1 1 0.5 15.5 81.6 1 1 n/a 1 n/a 0.5 1 1 0.5 1 1 1 n/a 1 n/a n/a 0 n/a 1 1 1 16 84.2 0.5 1 n/a 1 n/a 0 0 1 1 0 1 0 0 0 n/a n/a 0 n/a 1 1 1 12.5 62.5 29 Eleven (39%) studies failed to report the geographic setting (i.e. where the study was conducted, inclusion/exclusion criteria) (item 3). The recruitment sampling technique was overall poorly reported. Only 6/28 (21%) studies provided information about both the sampling technique and the patient spectrum (item 5). Only 1(4%) study stated the type of data collection (i.e. prospective/retrospective) (item 6). For studies that provided a reference standard (n=4, 14%), no report on threshold for defining osteoporosis or cross-tabulation of index test results by the results of the reference standard was provided (item 9). No study reported the level of training of operators, and only 15 (54%) reported the number of operators (item 10). Only 4 (14 %) studies revealed blinding of operators (item 11), 1 of which also stated blinding of the clinician who interpreted the results. Sixteen out of 28 (57%) studies failed to report the beginning and ending dates of recruitment (item 16), and 18/28 (64%) failed to report the time interval between tests (item 17). Presence or absence of uncertainty of results (item 21) was very poorly reported, reported in only 5/28 (18%) studies. Lastly, 13/28 (46%) studies failed to discuss limitations of the study design. The remainder items were adequately addressed. 30 APPENDIX 9 Detailed Assessment of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool items of primary studies [34] QUADAS-2 Risk of Bias REFERENCE FLOW AND INDEX TEST STANDARD TIMING First author’s last name / publication year PATIENT SELECTION Ahuja 2006 [3] UNCLEAR UNCLEAR UNCLEAR Roggero 2007 [27] UNCLEAR LOW RISK Altuncu 2007 [40] HIGH RISK Azcona 2003 [5] Applicability Concern PATIENT SELECTION INDEX TEST REFERENCE STANDARD UNCLEAR LOW RISK LOW RISK LOW RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE UNCLEAR UNCLEAR UNCLEAR UNCLEAR LOW RISK LOW RISK LOW RISK Baroncelli 2003 [14] UNCLEAR LOW RISK NOT APPLICABLE HIGH RISK LOW RISK LOW RISK NOT APPLICABLE Cepollaro 2001 [30] UNCLEAR LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE UNCLEAR UNCLEAR UNCLEAR LOW RISK LOW RISK LOW RISK LOW RISK Falcini 2000 [4] Fewtrell 2008 [6] Fielding 2003 [29] 31 Gianni 2007 [41] First author’s last name / publication year UNCLEAR PATIENT SELECTION HIGH RISK NOT APPLICABLE Risk of Bias REFERENCE INDEX TEST STANDARD LOW RISK NOT APPLICABLE UNCLEAR LOW RISK FLOW AND TIMING PATIENT SELECTION INDEX TEST REFERENCE STANDARD Applicability Concern Gonnelli 2008 [10] UNCLEAR LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE Hartman 2004 [18] UNCLEAR HIGH RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE HIGH RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK NOT APPLICABLE HIGH RISK LOW RISK LOW RISK NOT APPLICABLE Litmanovitz 2003 [25] UNCLEAR LOW RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE Litmanovitz 2007 [21] UNCLEAR LOW RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE McDevitt 2007 [26] UNCLEAR LOW RISK NOT APPLICABLE HIGH RISK LOW RISK LOW RISK NOT APPLICABLE Mussa 2010 [8] UNCLEAR UNCLEAR UNCLEAR UNCLEAR LOW RISK LOW RISK LOW RISK UNCLEAR HIGH RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE UNCLEAR HIGH RISK HIGH RISK LOW RISK LOW RISK Lequin 2002 [9] Levine 2002 [2] Njeh 2000 [1] Oswiecimska NOT NOT 32 2007 [31] Tomlinson 2006 [23] First author’s last name / publication year APPLICABLE UNCLEAR PATIENT SELECTION LOW RISK NOT APPLICABLE Risk of Bias REFERENCE INDEX TEST STANDARD APPLICABLE LOW RISK NOT APPLICABLE LOW RISK LOW RISK FLOW AND TIMING PATIENT SELECTION INDEX TEST REFERENCE STANDARD Applicability Concern Christoforidis 2011 [16] UNCLEAR LOW RISK NOT APPLICABLE UNCLEAR LOW RISK LOW RISK NOT APPLICABLE Pietkiewicz 2010 [42] UNCLEAR LOW RISK NOT APPLICABLE HIGH RISK LOW RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK NOT APPLICABLE HIGH RISK LOW RISK LOW RISK NOT APPLICABLE UNCLEAR HIGH RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE UNCLEAR HIGH RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK NOT APPLICABLE LOW RISK LOW RISK LOW RISK NOT APPLICABLE UNCLEAR LOW RISK NOT APPLICABLE HIGH RISK LOW RISK LOW RISK NOT APPLICABLE Mora 2012 [43] Lam 2011 [24] Mussa 2010 [44] Pereira-da-Silva 2011 [7] Sani 2011 [11] 33 Appendix Fig. 1 Graphical display of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) assessment of methodological quality of primary studies. (a) risk of bias (b) applicability concern 34 35 36 Regarding the 'risk of bias' aspect: Within the "patient selection" domain, the research designs of 26/28 (93%) studies demonstrated 'unclear risk' and the other 2/28 (7%) studies were rated as 'high risk'. Within the "index test" domain, more than half of the studies were rated as 'low risk'(18/28; 64%) and only 6/28 (21%) as 'high risk'. The "reference standard" domain was only applicable to 4 studies, all of which demonstrated 'unclear risk'. Within the "flow and timing" domain, 8/28 (29%) studies demonstrated 'low risk', 7/28 (25%) demonstrated 'high risk' and the majority (13/28; 46%) of studies demonstrated 'unclear risk'. Overall, the research designs of 17/28 (61%) studies demonstrated 'high risk of bias', 5/28 (18%) 'moderate risk of bias', and 6/28 (21%) studies, 'low risk of bias' (Appendix 10). Regarding the 'applicability concern' aspect of this tool, the "reference standard" domain was applicable to only 4 studies that investigated criterion validity. The research design of all 28 studies had 'low risk’ of applicability concern (Appendix 10). 37 APPENDIX 10 Summary assessment of Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool items of primary studies [34] QUADAS-2 (Overall) Author Ahuja 2006 [3] Risk of Bias HIGH RISK Applicability Concern LOW RISK Overall Quality INADEQUATE QUALITY Roggero 2007 [27] LOW RISK LOW RISK ADEQUATE QUALITY Altuncu 2007 [40] MODERATE LOW RISK ADEQUATE QUALITY Azcona 2003 [5] HIGH RISK LOW RISK INADEQUATE QUALITY Baroncelli 2003 [14] HIGH RISK LOW RISK INADEQUATE QUALITY Cepollaro 2001 [30] MODERATE LOW RISK ADEQUATE QUALITY Falcini 2000 [4] MODERATE LOW RISK ADEQUATE QUALITY Fewtrell 2008 [6] LOW RISK LOW RISK ADEQUATE QUALITY Fielding 2003 [29] HIGH RISK LOW RISK INADEQUATE QUALITY Gianni 2007 [41] HIGH RISK LOW RISK INADEQUATE QUALITY Gonnelli 2008 [10] MODERATE LOW RISK ADEQUATE QUALITY Hartman 2004 [18] HIGH RISK LOW RISK INADEQUATE QUALITY Lequin 2002 [9] HIGH RISK LOW RISK INADEQUATE QUALITY Levine 2002 [2] HIGH RISK LOW RISK INADEQUATE QUALITY 38 Litmanovitz 2003 [25] LOW RISK LOW RISK ADEQUATE QUALITY Author Litmanovitz 2007 [21] Risk of Bias LOW RISK Applicability Concern LOW RISK Overall Quality ADEQUATE QUALITY McDevitt 2007 [26] HIGH RISK LOW RISK INADEQUATE QUALITY Mussa 2010 [8] HIGH RISK LOW RISK INADEQUATE QUALITY Njeh 2000 [1] HIGH RISK LOW RISK INADEQUATE QUALITY Oswiecimska 2007 [31] HIGH RISK LOW RISK INADEQUATE QUALITY Tomlinson 2006 [23] LOW RISK LOW RISK ADEQUATE QUALITY Christoforidis 2011 [16] MODERATE LOW RISK ADEQUATE QUALITY Pietkiewicz 2010 [42] HIGH RISK LOW RISK INADEQUATE QUALITY Mora 2012 [43] HIGH RISK LOW RISK INADEQUATE QUALITY Lam 2011 [24] HIGH RISK LOW RISK INADEQUATE QUALITY Mussa 2010 [44] HIGH RISK LOW RISK INADEQUATE QUALITY Pereira-da-Silva 2011 [7] LOW RISK LOW RISK ADEQUATE QUALITY Sani 2011 [11] HIGH RISK LOW RISK INADEQUATE QUALITY