Experimental Evaluation of Microsoft Kinect’s Accuracy and Capture Rate

advertisement
Experimental Evaluation of Microsoft Kinect’s Accuracy and Capture Rate
for Stroke Rehabilitation Applications
David Webster∗
Ozkan Celik†
Department of Computer Science
San Francisco State University
Department of Mechanical Engineering
Colorado School of Mines
A BSTRACT
To meet the challenges of ubiquitous computing for stroke rehabilitation, researchers have been trying to break away from traditional therapist-based modes of assessment. In this paper, the suitability of the Kinect to this end is experimentally evaluated. A set
of thirteen gross movements, derived from common clinical stroke
impairment level assessments (Wolf Motion Function Test, Action
Research Arm Test, and Fugl-Meyer Assessment) were utilized to
explore the Normalized Root Mean Squared Error (NRMSE) in position for data captured by Kinect as compared to a research-grade
OptiTrack motion capture system. The specific joints of interest
were the shoulder, elbow and wrist. A latency and capture rate estimation of the Kinect and its effects on data quality was also conducted. The NRMSE in position varied between 0.53cm to 1.74cm
per data point among all axes and joints on average, when initial
calibration was conducted via the OptiTrack system. The mean
capture period was measured as 33.3ms with 3.86ms standard deviation, and the latency was observed to be on the order of two
capture periods (66.6ms on average). Our results summarize the capabilities as well as limitations of Kinect in gross movement-based
impairment assessment, in game-based rehabilitation paradigms, as
well as in full-body motion capture applications in general.
Keywords: Kinect accuracy, stroke rehabilitation, serious games,
exercise games.
1
I NTRODUCTION
There has been increasing interest in the use of commercially available game controllers, especially the Microsoft Kinect, as an interface for home-based rehabilitation protocols involving game-like
movement exercise tasks for stroke survivors. The benefits of such
interfaces are clear: making therapy financially accessible to a large
population of patients, enabling objective evaluation and remote
tracking of patient progress, and increasing patient motivation to
complete repetitive movement tasks integral to motor function recovery. An experimental evaluation of the spatial accuracy, latency
and capture rate of the motion capture data obtained from the Kinect
in comparison with a research grade motion capture device is a critical validation step for these applications. In this paper, we report
results of experimental evaluation of Kinect as a motion capture interface for gross movements relevant to activities of daily living,
stroke rehabilitation and full-body motion capture applications.
As the age of the general population significantly rises in the
upcoming years [2], the necessity of efficient and improved stroke
rehabilitation methods will grow. The Kinect is arguably the forerunner in commercially available hardware which shows potential
to enhance stroke rehabilitation while simultaneously retaining an
affordable cost required for large-scale disbursement [10].
∗ e-mail:
† e-mail:
dcello@mail.sfsu.edu
ocelik@mines.edu
IEEE Haptics Symposium 2014
23-26 February, Houston, Tx, USA
978-1-4799-3130-9/14/$31.00 ©2014 IEEE
Motor function rehabilitation exercises after stroke largely revolve around strengthening muscles and retraining sensorimotor
function via repetitive movements. Engaging interfaces improve
outcomes of stroke rehabilitation regimens by enabling longer periods of compliance with otherwise dreary or demanding rehabilitation regimens [11]. Ideally, therapist-assisted daily practice is
the best exercise route; however, it is often logistically infeasible or
cost-prohibitive. Utilization of Kinect in rehabilitation may be one
way to overcome this impracticality. Virtualized therapists offering guided interactive rehabilitation could make pseudo-therapist
assisted home-based rehabilitation a reality [6]. The Kinect could
contain the potential to reduce the current barriers to rehabilitation
through innovative applications of markerless motion capture, but
in order to develop robust serious games, a solid foundation of functionality need be established.
Serious games may hold great potential for two things: increasing patient motivation and accurate completion of rehabilitation exercises, as well as enhancing record keeping and future medical
diagnostic paradigms; however, an important initial step into both
lines of research is an experimental evaluation of stroke impairment related diagnostic potential of the motion capture data obtained from the Kinect in comparison with a research grade motion capture device. One initial attempt at achieving such a goal
can be seen in a study by Obdrzalek et al. which examined a subset of movements specifically targeted at coaching the elderly [8],
which concluded with a rather large error in Kinect readings of 10
cm. Another study by Chang et al. [1], focused on spinal cord
injury patients, examined the accuracy of the Kinect through the
employment of an arbitrary set of movements developed for an
in-development rehabilitation game. While a statistical analysis
of data captured was not offered, a visual representation of data
trends for both systems showed that, in regards to hand and elbow
readings, competitive movement tracking performance is possible,
whereas shoulder readings were widely inconsistent due to differing methods of motion capture and joint estimation between the
OptiTrack and the Kinect. Even so, the results of this study left the
authors with the impression that the Kinect has strong potential as
a clinical and home-based rehabilitation tool.
The accuracy of the Kinect has also been examined in the realm
of full-body postural control utilizing tests of balance and reach [3].
Once again, the outcome was positive, with results validating the
ability of the Kinect to accurately assess postural control kinematic strategies based off of three postural control tests: a forward
reach, a lateral reach, and a one leg standing balance test examining distance reached and trunk flexion angle (sagittal and coronal). The balance test focused on spatio-temporal changes in the
sternum, pelvis, knee and ankle as well as the angle of lateral and
anterior trunk flexion. The Kinect readings had very similar intertrial reliability and excellent concurrent validity; however, proportional biases occasionally occurred in pelvis and sternum readings.
This study offers very detailed joint-by-joint quantitative results
and based on these findings proposes that the Kinect is successfully
able to assess kinematic strategies of postural control. In regards to
joint-by-joint comparison, Fern et al. [5] also conducted a detailed
455
comparison of the Kinect with a Vicon system including: 1) knee
flexion and extension; 2) hip flexion and extention on the sagittal
plane; 3) hip adduction and abduction on the coronal plane with
knee extended; 4) shoulder flexion and extension on the sagittal
plane with elbow extended; 5) shoulder adduction and abduction
on the coronal plane with elbow extended, and 6) shoulder horizontal adduction and abduction on the transverse plane with elbow
extended. Mean Error and Mean Error relative to Range of motion
was calculated resulting in a maximum error noted for knee, hip and
shoulder joint of 13◦ . A large part of this error was attributed to the
mechanical complexity of the shoulder joint, with the knee and hip
fairing significantly better at a maximum error of 9.92◦ . This level
of precision also lead to the conclusion that Kinect accuracy is sufficient for most of the current clinical rehabilitation treatments.
Focused more precisely on the specific accuracy of Kinectgathered data pertaining to upper extremity rehabilitation, Loconsole et al. [7] made use of the Kinect and an L-Exos upper extremity exoskeleton to monitor moveable objects and a participant to
enable various different scenarios to be testable: 1) light variation:
very intensive, medium, and low illumination. No substantial differences were noted; 2) occlusions: two objects moved to occlude
each-other. No adverse effect - as well as correct recognization of
items post occlusion - was noted; 3) object roto-traslation: rotation
and movement of two tracked objects. The authors note that the
Kinect robustly tracked the objects; however no quantitative data
was presented, and 4) accuracy: while some variation was noted
depending on the range on the Z and X axes of the object all tests
resulted in a reading within a negligible 2 cm, which was concluded
to be well within the limits of 11 specific rehabilitation needs. On
this same vein of thought, Pedro et al. [9] attempted to simplify the
method of verifying the Kinect’s accuracy for rehabilitation purposes by also using a mechanical arm-like device for quantification
and instead utilized an extraction of points of interest method rather
than full body kinematic analysis. In this study, the Kinect was
noted to have good repeatability in both the X and Y axes, whereas
repeatability worsens as the distance to the point of interest (Z axis)
grows. Data gathered during the study shows that the average of the
standard deviation increases quadratically with distance; however,
even with this limitation Pedro et al. note that based on application
requirements of rehabilitation, the operational range of the Kinect,
while retaining a sufficient level of accuracy for rehabilitation, can
be considered to be from 0.5m to 2m.
Clinical use of the Kinect for stroke impairment assessment;
however, utilizes a distinctly unique movement set with various required occlusion points not yet examined in the previously mentioned studies. The purpose of this study is to assess the validity
of data gathered from the Kinect during specific stroke assessmentbased movements as well as the data’s usefulness in quantifying
a stroke patient’s level of motor recovery. The methods section
outlines participant population; study protocol; study environment;
and data collection, processing, analysis and record comparison
methodology. Results and Discussion section discusses the spatial
error introduced by inherent data offset between two motion capture systems as well as the effect of the removal of the offset, and
latency and capture rate characterization of the Kinect.
2
M ETHODS
The following section contains information regarding participant
meta-data, experimental environment and protocol, system specific hardware, coordinate frame transformation, data preprocessing, and a description of the method utilized to match Kinectgathered records with their OptiTrack counterparts.
2.1
Participants
A total of ten participants completed the experiment in the
Biomechatronics Research Laboratory at San Francisco State Uni-
456
Figure 1: A photograph of a subject with attached markers. Marker
clusters are used to triangulate the participant’s humorous head, lateral epicondyle, and ulna head. Views of both the real-world experimental markers and their corresponding representation in OptiTrack Tracking Tools software environment can be seen.
versity. Informed consent was given by all participants which was
approved by the San Francisco State University Institutional Review Board. Participants’ ages ranged from 18 to 27; heights
ranged from 5’5” to 6’3”; and weights ranged from 110 to 245 lbs.
No participant suffered from any movement, nervous system, or
neurological disorders which would affect their dominant limb.
2.2 Experiment Protocol
Subjects were asked to wear a common motion capture vest which
enabled three custom-made motion capture marker clusters to be
positioned in a manner conducive to the triangulation of the humorous head, lateral epicondyle, and ulna head (Figure 1). Participants completed several basic movements/tasks which were derived
from a subset of the Wolf Motion Function Test, Action Research
Arm Test, and Fugl-Meyer Assessment, which are standardized and
commonly used tests for evaluation of motor function impairment
level in stroke patients [4]. The movements utilized in this study
were selected due to their Kinect-required gross nature, their specific diagnostic potential, and their level of potential integration into
future schemas of automated impairment evaluation. The following
thirteen movements were used:
Wolf Motion Function Test (WMFT):
1. Forearm to table (side) - shoulder abduction.
2. Forearm to box on table (side) - shoulder abduction.
3. Extend elbow on table (side) - elbow extension.
4. Hand to table (forward) - shoulder flexion.
5. Hand to box on table (forward) - shoulder flexion.
Action Research Arm Test (ARAT):
6. Hand behind head.
7. Hand on top of head.
8. Hand to mouth.
Fugl-Meyer Assessment (FMA):
9. Flexor synergy - shoulder abduction (0-90◦ )
10. Shoulder outward rotation.
11. Hand to lumbar spine.
12. Shoulder flexion (0-90◦ ).
13. Shoulder flexion (90-180◦ ).
The subjects were allowed to complete all movements at a selfdetermined pace utilizing their dominant limbs, and each movement
was recorded three times to improve data robustness.
Experiment Environment
All infra-red (IR) emitting devices and reflective surfaces within
the capture volume were removed or covered prior to OptiTrack
calibration. The Kinect was then placed inside the OptiTrack capture volume and its IR projector was activated. The extraneous IR
emissions emitted by the Kinect were masked prior to OptiTrack
calibration procedures. The OptiTrack system was then calibrated
with a three marker wand, resulting in the maximum possible accuracy level of the OptiTrack calibration system (a sub-mm precision
level). The ground plane was then set, and immediately afterwards
a rigid body trackable was created at the origin of the OptiTrack
coordinate frame. This trackable was physically attached to the
Kinect as close to the point of origin in the Kinect coordinate frame
as possible – directly above the RGB camera. This trackable’s coordinates and orientation was used to obtain both the position vector
between the OptiTrack and Kinect coordinate frames as well as the
orientation differences between the two coordinate frames. This
information was used to create a homogeneous transformation matrix relating the Kinect and Optitrack’s coordinate frames with each
other.
2.4
Data Collection
OptiTrack
1.25
Kinect
1.2
1.15
Y− axis (m)
2.3
1.1
1.05
1
0.95
0.9
0.85
1.5
1.4
1.3
0
1.2
−0.1
1.1
−0.2
−0.3
1
Z−axis (m)
−0.4
0.9
−0.5
0.8
0.7
Motion capture data was acquired with an OptiTrack motion capture system and a Microsoft Kinect. The Skeletal Viewer application – from the Developer Toolkit (1.7.0) – was notably modified
and used as the sole engine for both Kinect and OptiTrack raw data
capture. The computer used for both systems was running a 64-bit
Windows 7 operating system on an Intel core i3-2120 CPU at 3.30
GHz using 8 GB of DDR2 RAM.
The OptiTrack, an isometric, passive marker-based optical motion capture system, consisted of eight V100:R2 cameras used
for marker position tracking, two OptiTrack OptiHubs controlling
hardware modules for handling communication, synchronization,
and control of data flow between cameras and computer, and the
OptiTrack Tracking Tools 2.5.0 data processing software API, enabling real-time capture feedback and rigid body trackables creation. The marker clusters used in this experiment were created using three sets of markers 7/16” in diameter positioned to triangulate
the center point of the targeted joints. All three Cartesian coordinates of shoulder (humorous head), elbow (lateral epicondyle), and
wrist (ulna head) were recorded in mm at a sampling rate of 100
Hz. Both the OptiTrack system and the Microsoft Kinect sensor
were connected to the computer via USB 2.0 cables.
2.4.1
Kinect Skeletal Viewer
The Skeletal Viewer program’s skeletalization algorithm, which is
part of the Kinect for windows SDK, was used as the tool for
Kinect-based joint position data collection, Graphical User Interface development for the experimenter, and development of an OptiTrack motion capture engine. To enable these three functionalities, four main modifications to the Skeletal Viewer were completed
a) an engine thread was spawned and initialized to real-time priority in order to interface with the OptiTrack API; b) the priority of
the main skeletalization thread of the Skeletal Viewer application
was set to real-time; c) the Skeletal Viewer GUI was equipped with
a trigger to insert synchronization pulses simultaneously into both
data sets; d) and Cartesian coordinate frame data for both systems
was extracted and written to disk with an associated time stamp.
2.5
Coordinate Frame Transformation
A homogeneous transformation matrix was used to align the coordinate frames of the Kinect and the OptiTrack. H01 , the homogeneous transformation matrix, with coordinate frame 0 being that of
the OptiTrack and coordinate frame 1 being that of the Kinect, was
constructed using three rotation matrices and a translation matrix:
−0.6
−0.7
X−axis (m)
Figure 2: An artificial movement test for testing Kinect to OptiTrack coordinate frame transformation was completed by drawing two perpendicular
boxes along the OptiTrack X and Z axes in the air with the wrist joint tracker.
⎡
Rx,θ
1
⎢0
= ⎣
0
0
0
cos θ
sin θ
0
⎡
Ry,ψ
Rz,φ
cos ψ
⎢ 0
= ⎣
− sin ψ
0
⎡
cos φ
⎢ sin φ
= ⎣
0
0
⎡
1
⎢0
Tx,y,z = ⎣
0
0
0
1
0
0
0
− sin θ
cos θ
0
⎤
0
0⎥
0⎦
1
sin ψ
0
cos ψ
0
⎤
0
0⎥
0⎦
1
− sin φ
cos φ
0
0
0
1
0
0
0
0
1
0
0
0
1
0
⎤
0
0⎥
0⎦
1
⎤
x
y⎥
z⎦
1
H10 = Tx,y,z × Rz,φ × Ry,ψ × Rx,θ
This transformation can be conceptualized as the rotation of the
axes of the Kinect coordinate frame into alignment with the axes
of the OptiTrack coordinate frame. To describe where, and at what
orientation, a point captured in the Kinect frame (P1 ) would be with
respect to the OptiTrack frame (P0 ), the point position vector needs
to be multiplied with H01 :
P0 = H10 × P1
In order to test the numerical correctness of this homogeneous
transformation matrix, an artificial movement was constructed
where two perpendicular boxes aligned along the X and Z axes
of the OptiTrack coordinate system were drawn in the air utilizing
the wrist joint marker. Figure 2 contains the resulting movement
trajectories gathered simultaneously by the Kinect and OptiTrack
confirming accurate and correct matching of the coordinate frames.
457
Table 1: Summary of the average normalized root mean squared error (NRMSE) between Kinect and OptiTrack capture trajectory data for
the 1st movement of all ten participants. Without removal of the inherent offset between two systems, the error introduced is non-uniform
and significant.
2.6
1st Movement
Participant 1:
Participant 2:
Participant 3:
Participant 4:
Participant 5:
Participant 6:
Participant 7:
Participant 8:
Participant 9:
Participant 10:
x (m)
0.2776
0.1433
0.2689
0.1782
0.2669
0.2299
0.2501
0.2702
0.3042
0.2109
Shoulder
y (m)
0.1488
0.0419
0.1689
0.0997
0.1558
0.1155
0.0957
0.1536
0.1424
0.1216
Average:
0.2400
0.1244
z (m)
0.2692
0.0318
0.2599
0.2577
0.2684
0.1911
0.1891
0.2639
0.3386
0.2189
0.2289
x (m)
0.1928
0.0893
0.1754
0.1292
0.2090
0.1884
0.1871
0.2245
0.2194
0.1533
Elbow
y (m)
0.2163
0.1314
0.2290
0.1667
0.2112
0.1686
0.1573
0.1956
0.1970
0.1767
z (m)
0.2524
0.0210
0.2584
0.2491
0.2552
0.1707
0.1756
0.2446
0.3232
0.2029
0.1769
0.1850
0.2153
Data Preprocessing
Raw data for both system was captured in ASCII format text files
(.csv). Marker coordinates were recorded in meters, and synchronization pulses began and ended each unique movement. These
files were imported into MATLAB and the Kinect data was transformed into alignment with the OptiTrack data through the use of
the homogeneous transformation matrix H01 . Individual movements
were extracted from both system’s raw data with any extraneous
data unrelated to a defined movement being discarded. A gap-filling
method using cubic splines was applied in any cases where a small
portion of data within a trial, for one or more joint positions, was
occluded. Out of the 780 trials, 15 were excluded from data analysis due to excessive occlusion. From the remaining 765 trials, 63
required interpolation (gap-filling) of data; however, interpolated
records only constituted 0.2% of total records. Both data sets were
filtered using a zero phase-shift second order low-pass Butterworth
filter with a cut-off frequency of 10 Hz.
2.7
Position Error Calculation Methodology
In order to compare the trajectories captured by the two motion capture systems running at different capture (sampling) rates, custom
MATLAB code was developed to, in essence, down-sample the 100
Hz OptiTrack data into a 30 Hz data set. This was accomplished
by iterating through the Kinect data, parsing the time stamp, and
matching the result to the closest possible OptiTrack record. Due
to various constraints on CPU scheduling control that operating in
a Windows 7 non-real-time operating system environment entails,
records between systems had the potential to be mismatched by a
few milliseconds, even though they should have had identical timestamps. In order to ensure that the each OptiTrack data point would
indeed have a matching record with the Kinect data in terms of time,
the Optitrack data was written to disk at a rate of 1000 Hz, even
though data values only refreshed at approximately 100 Hz. This
was done so that even if CPU cycles were required for operating
system critical processes, resulting in a temporary OptiTrack CPU
down-scheduling, the data would still be written to disk, at a minimum of 100 Hz. This methodology allowed for a successful comparison with each unique Kinect record, while the extra recorded
duplicate OptiTrack data was ignored.
The Normalized Root Mean Squared Error (NRMSE) between
the trajectories captured by Kinect and OptiTrack is used as the
458
x (m)
0.2072
0.1107
0.1989
0.1439
0.2206
0.1901
0.1905
0.2256
0.2478
0.1636
Wrist
y (m)
0.2064
0.1308
0.2115
0.1605
0.2061
0.1542
0.1503
0.1934
0.1766
0.1650
z (m)
0.2769
0.0699
0.2851
0.2848
0.2455
0.2131
0.1800
0.2327
0.3261
0.2009
0.1899
0.1755
0.2315
measure to quantify Kinect’s spatial accuracy. Essentially it is the
L2-norm for the error between the trajectories, normalized by the
number of data points in the trajectory, providing a unit of m that
can be interpreted as average error per data point.
There was an unavoidable base offset between the two datasets
due to differing methods of joint position capture. The Kinect system utilizes image processing algorithms based on the silhouette
of the participant for the X and Y coordinates and relies on an IR
depth sensor for the Z coordinate. The skeletalization algorithm of
the Kinect is, at its core, only an estimation of joint position, due to
the markerless capture algorithm; whereas the OptiTtack triangulates the center point of the marker clusters placed on joint locals.
While Kinect-based joint position trajectories was not observed to
contain significant distortions, once a participant was repositioned
for a different movement, or a new participant began, the offset was
affected. NRMSE was calculated both without and with an inherent
offset between the two systems removed. Inclusion of both results
into the study enables a way to observe average spatial error values
that should be expected when Kinect is the only available motion
capture system (corresponding to errors without offset removal) or
when an initial calibration step can be completed using Kinect and
a more accurate motion capture system (corresponding to error values after offset removal).
3
3.1
R ESULTS
AND
D ISCUSSION
Spatial Accuracy of Kinect without Offset Removal
Table 1 demonstrates the large variance of offsets among participants based on a representative movement. This drastic variance
leads to the conclusion that a meaningful comparison cannot be garnered from the two data sets, with the offset in place.
3.2
Spatial Accuracy of Kinect After Offset Removal
Once the offset was removed by aligning the average points of each
record set, the data gathered from both systems is comparable and
leads to a useful overarching NRMSE calculation (Table 2). Representative data, with offset removed, for all joint positions during
a single movement (Figure 3) demonstrate that, in general readings from both systems are in close agreement. The Kinect utilizes different hardware and techniques for calculating the X and
Y axes than the Z axis. Due to the differing technology utilized
to derive Z axis information, the data gathered on Z axis contain
Table 2: Summary of the average NRMSE results for all movements across all 10 participants. The reported values were calculated by first
averaging the NRMSE of each participant’s three recorded trials for each unique movement, and then averaging across the participants. After
an initial calibration (removal of offset) by using OptiTrack, error in measurements is significantly smaller and more uniform.
Movement
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
11th
12th
13th
x (m)
0.0094
0.0060
0.0044
0.0043
0.0179
0.0166
0.0073
0.0090
0.0039
0.0059
0.0053
0.0162
0.0098
Shoulder
y (m)
0.0058
0.0056
0.0054
0.0066
0.0080
0.0078
0.0079
0.0054
0.0020
0.0059
0.0066
0.0157
0.0060
Average:
0.0089
0.0068
z (m)
0.0043
0.0037
0.0055
0.0044
0.0065
0.0054
0.0076
0.0039
0.0030
0.0063
0.0063
0.0072
0.0046
0.0053
x (m)
0.0113
0.0128
0.0107
0.0095
0.0117
0.0125
0.0052
0.0104
0.0126
0.0129
0.0121
0.0114
0.0141
Elbow
y (m)
0.0139
0.0089
0.0118
0.0077
0.0168
0.0179
0.0114
0.0133
0.0075
0.0125
0.0159
0.0190
0.0148
z (m)
0.0095
0.0066
0.0092
0.0088
0.0101
0.0084
0.0098
0.0082
0.0087
0.0180
0.0092
0.0129
0.0086
0.0113
0.0132
0.0098
more frequent and significant deviations from the baseline, as compared to data gathered for other axes; however, these deviations do
not persist very long and ultimately lead to a comparable NRMSE.
With a maximum NRMSE observed throughout all movements being 0.0275m – even including movements with obvious occlusion
for the Kinect (e.g., 6 and 11)– we conclude that the Kinect accuracy level is mostly sufficient for targeted clinical and in-home use.
It is important to note that the NRMSE between Kinect and
OptiTrack systems, for any movement without offset removal, is
strongly affected by more than one variable; including angle and
distance from the Kinect sensor; girth and length of the participant’s torso and limb; and a particular movement’s relationship to
the Z plane. Hence, if an initial calibration step is to be used for
offset removal, then the experiment conditions need to remain fairly
identical to ensure accurate readings from Kinect.
3.3 Latency and Capture Rate Characterization
To validate the latency effects of the Kinect sensor, as well as the
capture rate distribution, the following two tests were completed.
3.3.1 Latency Evaluation
An artificial, and not directly stroke-related, movement was used
to calculate the latency introduced by the Kinect. The participant
was instructed to extend their arm in a 90◦ manner to the side utilizing shoulder abduction and then immediately lower it. The latency
between systems was then calculated based on the maximum distance point of the wrist’s arc on the y axis in both data sets. The
difference between the Kinect and the OptiTrack peak point was
within approximately two capture periods of the Kinect (≈ 70 millisecond difference); however, it should be noted that this result
contains inaccuracies due to the differences in system refresh rates.
30 Hz capture rate of the Kinect and the 100 Hz capture rate of
the OptiTrack system have a strong potential to create a peak point
reading of the trajectory not exactly overlapping in time. For example, if the absolute peak point of the movement happens to rest
exactly positioned in between two Kinect sampling instants, then
the Kinect’s peak point would be slightly lower and slightly earlier
than the actual peak point. However, due to the fact that the results
of the latency test even with these limitations were within a two
x (m)
0.0097
0.0095
0.0177
0.0133
0.0169
0.0238
0.0115
0.0166
0.0109
0.0263
0.0099
0.0112
0.0129
Wrist
y (m)
0.0114
0.0079
0.0188
0.0100
0.0224
0.0223
0.0226
0.0152
0.0069
0.0194
0.0169
0.0220
0.0170
z (m)
0.0214
0.0146
0.0152
0.0084
0.0199
0.0182
0.0195
0.0102
0.0162
0.0275
0.0179
0.0206
0.0161
0.0146
0.0164
0.0174
capture-period time frame in regards to the Kinect, we conclude
that the latency introduced by the Kinect is non-negligible but not
prohibitive for use in rehabilitation interfaces with real-time feedback components either. In order to further expand this conclusion,
the Kinect capture rate distribution was also examined. To evaluate
the capture rate distribution of the Kinect, a full data recording session (approx. 10 minutes in length) was analysed. The difference
in time stamps per record was transformed into a histogram and fit
with a Gaussian Curve. The results indicated a mean difference between records of 33.3 ms with a standard deviation of 3.86ms, as
summarized in Figure 4.
3.4 Participants
The participants of this study were healthy, whereas the target group
of users suffer from neurological injury, thus future studies and systems should address the comparison and support of both a larger
sample of participants as well as participants who suffer from stroke
related neurological injury.
4 C ONCLUSION
In this paper we evaluated the accuracy and latency of the Kinect
by using OptiTrack, a research grade motion capture system to establish the baseline. A specific set of movements derived from
commonly used stroke impairment level tests were examined. This
comparison resulted in an acceptable level of accuracy (maximum
average NRMSE of 1.74cm) and a level of latency within approximately two capture periods (66.66ms). The results of this study
support the Kinect as a sufficiently accurate and responsive sensor
for gross movement-based impairment assessment and rehabilitation progress tracking system for both clinical and in-home settings.
Various parameters (angle and distance from the Kinect; girth and
length of the body parts; movements along the Z plane) have the
potential to significantly affect accuracy of Kinect-based readings,
hence, in a stand-alone experiment, conditions must remain fairly
identical to the calibration setting to ensure accurate data capture.
5 ACKNOWLEDGMENTS
This work was supported in part by a Center for Computing for Life
Sciences (CCLS) research grant at San Francisco State University.
459
X−axis (m)
Shoulder
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2
−0.2
−0.2
−0.3
−0.3
0.5
1
1.5
2
0.5
1
1.5
2
0
0.5
1
1.5
2
0
0.5
1
1.5
2
0
0.5
1
1.5
2
1.1
1.1
1.05
1.05
1.05
1
1
1
0.95
0.95
0.95
0.9
0.9
0.9
0.85
0.85
0.85
0.8
0.8
0.8
0.75
0.75
0.75
0
0.5
1
1.5
2
0
0.5
1
1.5
2
1.15
1.15
1.1
1.1
1.1
1.05
1.05
1.05
1
1
1
0.95
0.95
0.95
0.9
0.9
0.9
0.85
0.85
0.85
0
0.5
1
time (seconds)
1.5
2
Kinect
OptiTrack
−0.3
0
1.1
1.15
Z−axis (m)
Wrist
0.2
0
Y−axis (m)
Elbow
0.2
0
0.5
1
time (seconds)
1.5
2
time (seconds)
Figure 3: The presented charts are representative data, in all three axes, of both Kinect and OptiTrack readings of movement 2 - shoulder abduction (Figure
1). The specific method of calculating the X and Y axes employed by the Kinect are based on different technologic methods than the Z axis and due to this,
the trends gathered on Z plane contain notably more frequent deviation from the baseline; however, these deviations fluctuate rapidly and ultimately lead to a
comparable NRMSE
Number of data points
2500
2000
1500
1000
500
0
15
20
25
30
35
40
Capture period (ms)
45
50
55
Figure 4: Histogram of Kinect capture period measurements in milliseconds. The distribution of the capture periods is approximated by a Gaussian
with 33.3ms mean value and 3.86ms standard deviation.
R EFERENCES
[1] C.-Y. Chang, B. Lange, M. Zhang, S. Koenig, P. Requejo, N. Somboon, A. A. Sawchuk, and A. A. Rizzo. Towards pervasive physical
rehabilitation using microsoft kinect. In Proc. 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), pages 159–162. IEEE, 2012.
[2] K. Christensen, G. Doblhammer, R. Rau, and J. W. Vaupel. Ageing populations: the challenges ahead. The Lancet, 374(9696):1196–
1208, 2009.
[3] R. A. Clark, Y.-H. Pua, K. Fortin, C. Ritchie, K. E. Webster,
L. Denehy, and A. L. Bryant. Validity of the microsoft kinect for
assessment of postural control. Gait & Posture, 36(3):372–377, 2012.
[4] E. Croarkin, J. Danoff, and C. Barnes. Evidence-based rating of upperextremity motor function tests used for people following a stroke.
Physical therapy, 84(1):62–74, 2004.
460
[5] A. Fernandez-Baena, A. Susin, and X. Lligadas. Biomechanical validation of upper-body and lower-body joint movements of kinect motion capture data for rehabilitation treatments. In Proc. 4th International Conference on Intelligent Networking and Collaborative Systems (INCoS), pages 656–661. IEEE, 2012.
[6] M. John, S. Klose, G. Kock, M. Jendreck, R. Feichtinger, B. Hennig,
N. Reithinger, J. Kiselev, M. Gövercin, E. Steinhagen-Thiessen, et al.
Smartseniors interactive trainer-development of an interactive system
for a home-based fall-prevention training for elderly people. In Ambient Assisted Living, pages 305–316. Springer, 2012.
[7] C. Loconsole, F. Banno, A. Frisoli, and M. Bergamasco. A new kinectbased guidance mode for upper limb robot-aided neurorehabilitation.
In Proc. IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), pages 1037–1042. IEEE, 2012.
[8] S. Obdrzalek, G. Kurillo, F. Ofli, R. Bajcsy, E. Seto, H. Jimison, and
M. Pavel. Accuracy and robustness of kinect pose estimation in the
context of coaching of elderly population. In Proc. Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), pages 1188–1193. IEEE, 2012.
[9] L. M. Pedro and G. A. de Paula Caurin. Kinect evaluation for human body movement analysis. In Proc. 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics
(BioRob), pages 1856–1861. IEEE, 2012.
[10] K. Tanaka, J. Parker, G. Baradoy, D. Sheehan, J. R. Holash, and
L. Katz. A comparison of exergaming interfaces for use in rehabilitation programs and research. Loading...,The Journal of the Canadian
Game Studies Association, 6(9):69–81, 2012.
[11] J. Wiemeyer and A. Kliem. Serious games in prevention and rehabilitationa new panacea for elderly people? European Review of Aging
and Physical Activity, 9(1):41–50, 2012.
Download