Uploaded by Bezzalel Arenas

Article Critique Bez Arenas

advertisement
Southville International School and Colleges (SISC)
Graduate School of Psychology
Advanced Psychological Assessment
Dr. Ruel A. Cajili
Article Critique:
By: Bezaleel Arenas
23 March 2023
The Validity and Reliability of Online Testing for the Assessment of Spatial Ability
Author/s: Jeffrey Buckley, Niall Seery and Dr. Donal Canty
Department of Design and Manufacturing Technology
University of Limerick
Source / Reference:
https://www.researchgate.net/publication/309376982_The_validity_and_reliability_of_online_tes
ting_for_the_assessment_of_spatial_ability
SUMMARY
The constant innovation and the ever-expanding scope of modern technology have
prompted digital transformation into the many facets of society and proliferates across
different industries including the education sector. The prospect of replacing traditional
paper- based assessments with an online based assessment is being explored as part of
a natural progression as technology becomes increasingly integrated into modern society.
Buckley, Sheery & Cantal (2016), in The Validity and Reliability of Online Testing for the
Assessment of Spatial Skills, acknowledged the role of testing spatial skills into gaining
significant understanding into an individual’s cognitive capacities associated with
technical vocations such as graphics, engineering and design while also highlighting the
impact of modern technology and digital transformation have in testing and in evaluating
these spatial skills, of which traditional operationalization is seen in a paper and pencil
format. The use of online ICT (Information, Communication and Technology)
infrastructures for assessment affords opportunities such as enhanced collaborative peer
assessment, a more diverse range of potential test items and a customization of feedback
mechanism and tools. (Buckley, et al., 2016). Researchers investigates the validity and
reliability of online testing in the assessment of spatial skills with the aim to initiate the
development of an online test center for spatial skills in conjunction with the Spatial Ability
in Education or SPACE project, a national project examining spatial skills across all
stages of the Irish Educational System which is ultimately envisioned to have the capacity
to capture a person’s spatial profile (Buckley & Seery, 2016). A study cohort of 162 1st
year primary pupils piloted the test center which consisted of digital versions of three
spatial ability tests; the Purdue Spatial Visualization Test: Visualization of Rotations,
(PSVT: R) (Guay,1977), The Mental Cutting Test (MCT) (CEEB, 1939), and the Space
Relations subtest of the Differential Aptitude Test (DAT:SR) (Benneth, Seashore &
Wesman, 1973). Performance scores of the study cohort were compared with a national
sample from a similar demographic who took the manual version of the same tests. While
the researchers advocate for the many advantages of computer based online
assessment, it also recognized and identified a number of limitations to the approach.
Transitioning from or between paper based and online assisted assessment presents a
variety of considerations and limitations, particularly when the intent of the assessment
architecture is to elicit levels of cognitive factors independent of semantic knowledge.
(Buckley et al., 2016). The impact of individual differences such as computer anxiety and
experiences were purported to be a factor influencing performance on computer-based
assessment. McDonald (2002), however, argues that while these individual differences
can have a negative effect, he also acknowledges that such factors are not constant,
suggesting potential mitigation with the increases of computer integration into the society.
This is in line with the author’s resolute recommendation that “educational systems align
with societal progression” as computers transition from being novel to becoming a
standard convention. (Buckley et al., 2016). With this under consideration, when
comparing performance between paper based and computer assessments, researchers
advocated that mean scores and variances of tests taken should be identical across
modalities (Wilson, Genco, & Yager, 1985). This is especially relevant as differences were
found in specific spatial abilities such as cognitive speed (speed rotation) and cognitive
power (spatial relations and visualizations) when assessment is administered online. To
investigate these existing findings and measure if there’s any significant difference in
mean scores and variances, the researchers also extracted 10 specific items from the full
tests and created a digital version for the study cohort and compared the scores from the
national population who took the full paper and pencil test corresponding to the 10 items.
Results indicate “no statistically significant difference between modalities and suggest the
applicability of the expedited tests for larger cohorts while the full test appear more
suitable for individual results”.
CRITIQUE
I find that the research’s premise and the researcher’s ultimate objective in advocating
for computer-based testing for spatial skills are grounded on practical, well-meaning
intention, however, the initiative is too wide-ranging and that the concluded results of “no
statistically significant difference between modalities and suggest the applicability of the
expedited tests for larger cohorts while the full test appear more suitable for individual
results” – is not only unreliable but also misleading. The conclusion was chiefly derived
from the researcher’s statistical analysis of their data, which is not only incomplete and
potentially, wrong, which renders the lack of data and the data in and of itself as unreliable
and therefore invalid, as a fundamental measures of reliability and validity after all, is a
characteristic of data and not the characteristic of the instrument that is being measured,
hence, in this case, the reliability and validity of online testing. For one, the sample size
of 162 study cohort is not representative of the more or less 300,000 first year postprimary students in 2016 in Ireland (CSO, 2018), the same year when the research paper
was published. In the concept of content validity, where a domain is being measured, it
is important that all sub-domains are represented, this is also complimentary to the
domain sampling theory when measuring reliability. The researcher’s ultimate goal is to
provide evidence in advocating for the use of computer-based spatial skills testing across
all schools in Ireland and to create a spatial profile of first year post-primary students
(study cohort) who test pilot the initiative. Domain being, all students across the country,
sub-domain would be the study cohort, both of which are vastly underrepresented.
Another area of concern is the 10 specific items extracted from the full test that is
converted to a digital version given to the study cohort, there were no details given to
these specific tests, how are they selected and why. This could be a case of a construct
underrepresentation, or failure to capture important components of the construct. The
researchers are measuring the reliability and validity of online testing for the assessment
of spatial skills, and it is imperative that the items of the full test given be the same with
the online-based to be able to run a measure of reliability in terms of test-retest or to find
consistency of data across both modalities, or to use the parallel or alternative forms to
find consistency of data across two versions of the same test. The study run a test-retest
citing the number of participants who have multiple attempts on the three spatial
reasoning tests with 6 participants on PSVT: R, 3 on MCT and 21 on DAT:SR, calculating
the mean number of attempts which is not only grossly underrepresented of the study
cohort but also suggests a face validity as it only gives off an appearance of measuring
validity when it is in fact, irrelevant to the purpose of the study. Moreover, the t-scores
done between the full and 10-item assessments results showing no significant
differences, PSVT: R (M = 31.29, SD=17.32) and the 10-item PSVT: (M=32.93,
SD=20.60), t (115) = -1.587 p= .117, or between MCT (M=23.35, SD=8.97) and the 10item MCT (M=26.00, SD=15.23), t (54) = -1.914, p=.061. High and statistically significant
correlations were also found between the full PSVT: R and the 10 extracted items (r=.840,
p <.001 and between the full MCT and the 10 extracted items. These results which
represents the paper based and computer -based scores are not reliable given the lack
of data or even, potentially wrong data discussed above and for the lack of content
validity. Lastly, the Cronbach alpha found for the three spatial skills assessment scores,
PSVT: R (a=.802), 10 item PSVT: R (a=.554), for MCT (a=.147). 10 item MCT (a=.236)
and DAT: SR a=.452) with the full PSVT: R as the only sufficiently reliable measure with
this cohort (Kline, 2000) is suggests dubious reliability of the test as the researchers
contend that the low reliability of the study to the “pertinent spatial skills being at a
malleable stage” of the age of the study cohort which is irrelevant to the thing being
measured, which is the reliability and validity of online testing for the assessment of spatial
ability. Hence, with all the data considered, this study is un-reliable, invalid and
inconclusive.
RECOMMENDATIONS
•
Define the domain, sub-domain – With the study’s ultimate aim of generating a
spatial profile across all stages of the Irish educational system, it is just rudimentary
to aim to gather the larger corresponding demographic.
•
Create an accurate and representative sample size – The aim of the initial
phase of the study is to generate a national spatial profile of pupils entering postprimary education. Researchers must include all the pupils in this stage which is
about 300 according to Ireland’s Central Statistics Office (CSO) in 2016 when the
paper is published. Researchers must include all or a better representation of the
sample size.
•
Test-Retest - Must be administered to a more representative sample size. Also,
the more re-test with the same results, the better the reliability.
•
Parallel / Alternative Forms – Data must be consistent or items must be the same
across both versions of the tests.
•
Review Influencing Factors - It is noted that individual differences such as
computer anxiety and experience can have a negative effect on the performance
on the computer -based assessment and though it was also suggested that this is
not constant and can decrease as students become more computer- integrated.
Shermis and Lombard (1998) found that restrictions on computer- based tests
such as not being able to review back answers can induced anxiety hence,
emulating paper-based to be reviewed while still within the time limit is
administered. However, computer anxiety and experience are possibly in a kind of
spectrum therefore might be a potential error in the study that would inadvertently
jeopardize the integrity of the study. A survey perhaps to the whole domain must
be administered to get a gauge of how computer-integrated or how comfortable is
a student in partaking an assessment that has been traditionally administered with
a paper and pencil not only because that has been the mode for a long time but
also because of the very nature of the test itself.
REFERENCES:
Cohen, R. J., Swerdlik, M.E., & Phillips, S.M. (2022). Psychological Testing and
Assessment: An Introduction to Tests and Measurement. McGraw Hill LLC.
CSO. (2018, November 14). Census 2016 Reports – CSO- Central Statistics Office.
https://www.cso.ie/en/census/census2016reports/
Download