Southville International School and Colleges (SISC) Graduate School of Psychology Advanced Psychological Assessment Dr. Ruel A. Cajili Article Critique: By: Bezaleel Arenas 23 March 2023 The Validity and Reliability of Online Testing for the Assessment of Spatial Ability Author/s: Jeffrey Buckley, Niall Seery and Dr. Donal Canty Department of Design and Manufacturing Technology University of Limerick Source / Reference: https://www.researchgate.net/publication/309376982_The_validity_and_reliability_of_online_tes ting_for_the_assessment_of_spatial_ability SUMMARY The constant innovation and the ever-expanding scope of modern technology have prompted digital transformation into the many facets of society and proliferates across different industries including the education sector. The prospect of replacing traditional paper- based assessments with an online based assessment is being explored as part of a natural progression as technology becomes increasingly integrated into modern society. Buckley, Sheery & Cantal (2016), in The Validity and Reliability of Online Testing for the Assessment of Spatial Skills, acknowledged the role of testing spatial skills into gaining significant understanding into an individual’s cognitive capacities associated with technical vocations such as graphics, engineering and design while also highlighting the impact of modern technology and digital transformation have in testing and in evaluating these spatial skills, of which traditional operationalization is seen in a paper and pencil format. The use of online ICT (Information, Communication and Technology) infrastructures for assessment affords opportunities such as enhanced collaborative peer assessment, a more diverse range of potential test items and a customization of feedback mechanism and tools. (Buckley, et al., 2016). Researchers investigates the validity and reliability of online testing in the assessment of spatial skills with the aim to initiate the development of an online test center for spatial skills in conjunction with the Spatial Ability in Education or SPACE project, a national project examining spatial skills across all stages of the Irish Educational System which is ultimately envisioned to have the capacity to capture a person’s spatial profile (Buckley & Seery, 2016). A study cohort of 162 1st year primary pupils piloted the test center which consisted of digital versions of three spatial ability tests; the Purdue Spatial Visualization Test: Visualization of Rotations, (PSVT: R) (Guay,1977), The Mental Cutting Test (MCT) (CEEB, 1939), and the Space Relations subtest of the Differential Aptitude Test (DAT:SR) (Benneth, Seashore & Wesman, 1973). Performance scores of the study cohort were compared with a national sample from a similar demographic who took the manual version of the same tests. While the researchers advocate for the many advantages of computer based online assessment, it also recognized and identified a number of limitations to the approach. Transitioning from or between paper based and online assisted assessment presents a variety of considerations and limitations, particularly when the intent of the assessment architecture is to elicit levels of cognitive factors independent of semantic knowledge. (Buckley et al., 2016). The impact of individual differences such as computer anxiety and experiences were purported to be a factor influencing performance on computer-based assessment. McDonald (2002), however, argues that while these individual differences can have a negative effect, he also acknowledges that such factors are not constant, suggesting potential mitigation with the increases of computer integration into the society. This is in line with the author’s resolute recommendation that “educational systems align with societal progression” as computers transition from being novel to becoming a standard convention. (Buckley et al., 2016). With this under consideration, when comparing performance between paper based and computer assessments, researchers advocated that mean scores and variances of tests taken should be identical across modalities (Wilson, Genco, & Yager, 1985). This is especially relevant as differences were found in specific spatial abilities such as cognitive speed (speed rotation) and cognitive power (spatial relations and visualizations) when assessment is administered online. To investigate these existing findings and measure if there’s any significant difference in mean scores and variances, the researchers also extracted 10 specific items from the full tests and created a digital version for the study cohort and compared the scores from the national population who took the full paper and pencil test corresponding to the 10 items. Results indicate “no statistically significant difference between modalities and suggest the applicability of the expedited tests for larger cohorts while the full test appear more suitable for individual results”. CRITIQUE I find that the research’s premise and the researcher’s ultimate objective in advocating for computer-based testing for spatial skills are grounded on practical, well-meaning intention, however, the initiative is too wide-ranging and that the concluded results of “no statistically significant difference between modalities and suggest the applicability of the expedited tests for larger cohorts while the full test appear more suitable for individual results” – is not only unreliable but also misleading. The conclusion was chiefly derived from the researcher’s statistical analysis of their data, which is not only incomplete and potentially, wrong, which renders the lack of data and the data in and of itself as unreliable and therefore invalid, as a fundamental measures of reliability and validity after all, is a characteristic of data and not the characteristic of the instrument that is being measured, hence, in this case, the reliability and validity of online testing. For one, the sample size of 162 study cohort is not representative of the more or less 300,000 first year postprimary students in 2016 in Ireland (CSO, 2018), the same year when the research paper was published. In the concept of content validity, where a domain is being measured, it is important that all sub-domains are represented, this is also complimentary to the domain sampling theory when measuring reliability. The researcher’s ultimate goal is to provide evidence in advocating for the use of computer-based spatial skills testing across all schools in Ireland and to create a spatial profile of first year post-primary students (study cohort) who test pilot the initiative. Domain being, all students across the country, sub-domain would be the study cohort, both of which are vastly underrepresented. Another area of concern is the 10 specific items extracted from the full test that is converted to a digital version given to the study cohort, there were no details given to these specific tests, how are they selected and why. This could be a case of a construct underrepresentation, or failure to capture important components of the construct. The researchers are measuring the reliability and validity of online testing for the assessment of spatial skills, and it is imperative that the items of the full test given be the same with the online-based to be able to run a measure of reliability in terms of test-retest or to find consistency of data across both modalities, or to use the parallel or alternative forms to find consistency of data across two versions of the same test. The study run a test-retest citing the number of participants who have multiple attempts on the three spatial reasoning tests with 6 participants on PSVT: R, 3 on MCT and 21 on DAT:SR, calculating the mean number of attempts which is not only grossly underrepresented of the study cohort but also suggests a face validity as it only gives off an appearance of measuring validity when it is in fact, irrelevant to the purpose of the study. Moreover, the t-scores done between the full and 10-item assessments results showing no significant differences, PSVT: R (M = 31.29, SD=17.32) and the 10-item PSVT: (M=32.93, SD=20.60), t (115) = -1.587 p= .117, or between MCT (M=23.35, SD=8.97) and the 10item MCT (M=26.00, SD=15.23), t (54) = -1.914, p=.061. High and statistically significant correlations were also found between the full PSVT: R and the 10 extracted items (r=.840, p <.001 and between the full MCT and the 10 extracted items. These results which represents the paper based and computer -based scores are not reliable given the lack of data or even, potentially wrong data discussed above and for the lack of content validity. Lastly, the Cronbach alpha found for the three spatial skills assessment scores, PSVT: R (a=.802), 10 item PSVT: R (a=.554), for MCT (a=.147). 10 item MCT (a=.236) and DAT: SR a=.452) with the full PSVT: R as the only sufficiently reliable measure with this cohort (Kline, 2000) is suggests dubious reliability of the test as the researchers contend that the low reliability of the study to the “pertinent spatial skills being at a malleable stage” of the age of the study cohort which is irrelevant to the thing being measured, which is the reliability and validity of online testing for the assessment of spatial ability. Hence, with all the data considered, this study is un-reliable, invalid and inconclusive. RECOMMENDATIONS • Define the domain, sub-domain – With the study’s ultimate aim of generating a spatial profile across all stages of the Irish educational system, it is just rudimentary to aim to gather the larger corresponding demographic. • Create an accurate and representative sample size – The aim of the initial phase of the study is to generate a national spatial profile of pupils entering postprimary education. Researchers must include all the pupils in this stage which is about 300 according to Ireland’s Central Statistics Office (CSO) in 2016 when the paper is published. Researchers must include all or a better representation of the sample size. • Test-Retest - Must be administered to a more representative sample size. Also, the more re-test with the same results, the better the reliability. • Parallel / Alternative Forms – Data must be consistent or items must be the same across both versions of the tests. • Review Influencing Factors - It is noted that individual differences such as computer anxiety and experience can have a negative effect on the performance on the computer -based assessment and though it was also suggested that this is not constant and can decrease as students become more computer- integrated. Shermis and Lombard (1998) found that restrictions on computer- based tests such as not being able to review back answers can induced anxiety hence, emulating paper-based to be reviewed while still within the time limit is administered. However, computer anxiety and experience are possibly in a kind of spectrum therefore might be a potential error in the study that would inadvertently jeopardize the integrity of the study. A survey perhaps to the whole domain must be administered to get a gauge of how computer-integrated or how comfortable is a student in partaking an assessment that has been traditionally administered with a paper and pencil not only because that has been the mode for a long time but also because of the very nature of the test itself. REFERENCES: Cohen, R. J., Swerdlik, M.E., & Phillips, S.M. (2022). Psychological Testing and Assessment: An Introduction to Tests and Measurement. McGraw Hill LLC. CSO. (2018, November 14). Census 2016 Reports – CSO- Central Statistics Office. https://www.cso.ie/en/census/census2016reports/