Experimental Evaluation of Microsoft Kinect’s Accuracy and Capture Rate for Stroke Rehabilitation Applications David Webster∗ Ozkan Celik† Department of Computer Science San Francisco State University Department of Mechanical Engineering Colorado School of Mines A BSTRACT To meet the challenges of ubiquitous computing for stroke rehabilitation, researchers have been trying to break away from traditional therapist-based modes of assessment. In this paper, the suitability of the Kinect to this end is experimentally evaluated. A set of thirteen gross movements, derived from common clinical stroke impairment level assessments (Wolf Motion Function Test, Action Research Arm Test, and Fugl-Meyer Assessment) were utilized to explore the Normalized Root Mean Squared Error (NRMSE) in position for data captured by Kinect as compared to a research-grade OptiTrack motion capture system. The specific joints of interest were the shoulder, elbow and wrist. A latency and capture rate estimation of the Kinect and its effects on data quality was also conducted. The NRMSE in position varied between 0.53cm to 1.74cm per data point among all axes and joints on average, when initial calibration was conducted via the OptiTrack system. The mean capture period was measured as 33.3ms with 3.86ms standard deviation, and the latency was observed to be on the order of two capture periods (66.6ms on average). Our results summarize the capabilities as well as limitations of Kinect in gross movement-based impairment assessment, in game-based rehabilitation paradigms, as well as in full-body motion capture applications in general. Keywords: Kinect accuracy, stroke rehabilitation, serious games, exercise games. 1 I NTRODUCTION There has been increasing interest in the use of commercially available game controllers, especially the Microsoft Kinect, as an interface for home-based rehabilitation protocols involving game-like movement exercise tasks for stroke survivors. The benefits of such interfaces are clear: making therapy financially accessible to a large population of patients, enabling objective evaluation and remote tracking of patient progress, and increasing patient motivation to complete repetitive movement tasks integral to motor function recovery. An experimental evaluation of the spatial accuracy, latency and capture rate of the motion capture data obtained from the Kinect in comparison with a research grade motion capture device is a critical validation step for these applications. In this paper, we report results of experimental evaluation of Kinect as a motion capture interface for gross movements relevant to activities of daily living, stroke rehabilitation and full-body motion capture applications. As the age of the general population significantly rises in the upcoming years [2], the necessity of efficient and improved stroke rehabilitation methods will grow. The Kinect is arguably the forerunner in commercially available hardware which shows potential to enhance stroke rehabilitation while simultaneously retaining an affordable cost required for large-scale disbursement [10]. ∗ e-mail: † e-mail: dcello@mail.sfsu.edu ocelik@mines.edu IEEE Haptics Symposium 2014 23-26 February, Houston, Tx, USA 978-1-4799-3130-9/14/$31.00 ©2014 IEEE Motor function rehabilitation exercises after stroke largely revolve around strengthening muscles and retraining sensorimotor function via repetitive movements. Engaging interfaces improve outcomes of stroke rehabilitation regimens by enabling longer periods of compliance with otherwise dreary or demanding rehabilitation regimens [11]. Ideally, therapist-assisted daily practice is the best exercise route; however, it is often logistically infeasible or cost-prohibitive. Utilization of Kinect in rehabilitation may be one way to overcome this impracticality. Virtualized therapists offering guided interactive rehabilitation could make pseudo-therapist assisted home-based rehabilitation a reality [6]. The Kinect could contain the potential to reduce the current barriers to rehabilitation through innovative applications of markerless motion capture, but in order to develop robust serious games, a solid foundation of functionality need be established. Serious games may hold great potential for two things: increasing patient motivation and accurate completion of rehabilitation exercises, as well as enhancing record keeping and future medical diagnostic paradigms; however, an important initial step into both lines of research is an experimental evaluation of stroke impairment related diagnostic potential of the motion capture data obtained from the Kinect in comparison with a research grade motion capture device. One initial attempt at achieving such a goal can be seen in a study by Obdrzalek et al. which examined a subset of movements specifically targeted at coaching the elderly [8], which concluded with a rather large error in Kinect readings of 10 cm. Another study by Chang et al. [1], focused on spinal cord injury patients, examined the accuracy of the Kinect through the employment of an arbitrary set of movements developed for an in-development rehabilitation game. While a statistical analysis of data captured was not offered, a visual representation of data trends for both systems showed that, in regards to hand and elbow readings, competitive movement tracking performance is possible, whereas shoulder readings were widely inconsistent due to differing methods of motion capture and joint estimation between the OptiTrack and the Kinect. Even so, the results of this study left the authors with the impression that the Kinect has strong potential as a clinical and home-based rehabilitation tool. The accuracy of the Kinect has also been examined in the realm of full-body postural control utilizing tests of balance and reach [3]. Once again, the outcome was positive, with results validating the ability of the Kinect to accurately assess postural control kinematic strategies based off of three postural control tests: a forward reach, a lateral reach, and a one leg standing balance test examining distance reached and trunk flexion angle (sagittal and coronal). The balance test focused on spatio-temporal changes in the sternum, pelvis, knee and ankle as well as the angle of lateral and anterior trunk flexion. The Kinect readings had very similar intertrial reliability and excellent concurrent validity; however, proportional biases occasionally occurred in pelvis and sternum readings. This study offers very detailed joint-by-joint quantitative results and based on these findings proposes that the Kinect is successfully able to assess kinematic strategies of postural control. In regards to joint-by-joint comparison, Fern et al. [5] also conducted a detailed 455 comparison of the Kinect with a Vicon system including: 1) knee flexion and extension; 2) hip flexion and extention on the sagittal plane; 3) hip adduction and abduction on the coronal plane with knee extended; 4) shoulder flexion and extension on the sagittal plane with elbow extended; 5) shoulder adduction and abduction on the coronal plane with elbow extended, and 6) shoulder horizontal adduction and abduction on the transverse plane with elbow extended. Mean Error and Mean Error relative to Range of motion was calculated resulting in a maximum error noted for knee, hip and shoulder joint of 13◦ . A large part of this error was attributed to the mechanical complexity of the shoulder joint, with the knee and hip fairing significantly better at a maximum error of 9.92◦ . This level of precision also lead to the conclusion that Kinect accuracy is sufficient for most of the current clinical rehabilitation treatments. Focused more precisely on the specific accuracy of Kinectgathered data pertaining to upper extremity rehabilitation, Loconsole et al. [7] made use of the Kinect and an L-Exos upper extremity exoskeleton to monitor moveable objects and a participant to enable various different scenarios to be testable: 1) light variation: very intensive, medium, and low illumination. No substantial differences were noted; 2) occlusions: two objects moved to occlude each-other. No adverse effect - as well as correct recognization of items post occlusion - was noted; 3) object roto-traslation: rotation and movement of two tracked objects. The authors note that the Kinect robustly tracked the objects; however no quantitative data was presented, and 4) accuracy: while some variation was noted depending on the range on the Z and X axes of the object all tests resulted in a reading within a negligible 2 cm, which was concluded to be well within the limits of 11 specific rehabilitation needs. On this same vein of thought, Pedro et al. [9] attempted to simplify the method of verifying the Kinect’s accuracy for rehabilitation purposes by also using a mechanical arm-like device for quantification and instead utilized an extraction of points of interest method rather than full body kinematic analysis. In this study, the Kinect was noted to have good repeatability in both the X and Y axes, whereas repeatability worsens as the distance to the point of interest (Z axis) grows. Data gathered during the study shows that the average of the standard deviation increases quadratically with distance; however, even with this limitation Pedro et al. note that based on application requirements of rehabilitation, the operational range of the Kinect, while retaining a sufficient level of accuracy for rehabilitation, can be considered to be from 0.5m to 2m. Clinical use of the Kinect for stroke impairment assessment; however, utilizes a distinctly unique movement set with various required occlusion points not yet examined in the previously mentioned studies. The purpose of this study is to assess the validity of data gathered from the Kinect during specific stroke assessmentbased movements as well as the data’s usefulness in quantifying a stroke patient’s level of motor recovery. The methods section outlines participant population; study protocol; study environment; and data collection, processing, analysis and record comparison methodology. Results and Discussion section discusses the spatial error introduced by inherent data offset between two motion capture systems as well as the effect of the removal of the offset, and latency and capture rate characterization of the Kinect. 2 M ETHODS The following section contains information regarding participant meta-data, experimental environment and protocol, system specific hardware, coordinate frame transformation, data preprocessing, and a description of the method utilized to match Kinectgathered records with their OptiTrack counterparts. 2.1 Participants A total of ten participants completed the experiment in the Biomechatronics Research Laboratory at San Francisco State Uni- 456 Figure 1: A photograph of a subject with attached markers. Marker clusters are used to triangulate the participant’s humorous head, lateral epicondyle, and ulna head. Views of both the real-world experimental markers and their corresponding representation in OptiTrack Tracking Tools software environment can be seen. versity. Informed consent was given by all participants which was approved by the San Francisco State University Institutional Review Board. Participants’ ages ranged from 18 to 27; heights ranged from 5’5” to 6’3”; and weights ranged from 110 to 245 lbs. No participant suffered from any movement, nervous system, or neurological disorders which would affect their dominant limb. 2.2 Experiment Protocol Subjects were asked to wear a common motion capture vest which enabled three custom-made motion capture marker clusters to be positioned in a manner conducive to the triangulation of the humorous head, lateral epicondyle, and ulna head (Figure 1). Participants completed several basic movements/tasks which were derived from a subset of the Wolf Motion Function Test, Action Research Arm Test, and Fugl-Meyer Assessment, which are standardized and commonly used tests for evaluation of motor function impairment level in stroke patients [4]. The movements utilized in this study were selected due to their Kinect-required gross nature, their specific diagnostic potential, and their level of potential integration into future schemas of automated impairment evaluation. The following thirteen movements were used: Wolf Motion Function Test (WMFT): 1. Forearm to table (side) - shoulder abduction. 2. Forearm to box on table (side) - shoulder abduction. 3. Extend elbow on table (side) - elbow extension. 4. Hand to table (forward) - shoulder flexion. 5. Hand to box on table (forward) - shoulder flexion. Action Research Arm Test (ARAT): 6. Hand behind head. 7. Hand on top of head. 8. Hand to mouth. Fugl-Meyer Assessment (FMA): 9. Flexor synergy - shoulder abduction (0-90◦ ) 10. Shoulder outward rotation. 11. Hand to lumbar spine. 12. Shoulder flexion (0-90◦ ). 13. Shoulder flexion (90-180◦ ). The subjects were allowed to complete all movements at a selfdetermined pace utilizing their dominant limbs, and each movement was recorded three times to improve data robustness. Experiment Environment All infra-red (IR) emitting devices and reflective surfaces within the capture volume were removed or covered prior to OptiTrack calibration. The Kinect was then placed inside the OptiTrack capture volume and its IR projector was activated. The extraneous IR emissions emitted by the Kinect were masked prior to OptiTrack calibration procedures. The OptiTrack system was then calibrated with a three marker wand, resulting in the maximum possible accuracy level of the OptiTrack calibration system (a sub-mm precision level). The ground plane was then set, and immediately afterwards a rigid body trackable was created at the origin of the OptiTrack coordinate frame. This trackable was physically attached to the Kinect as close to the point of origin in the Kinect coordinate frame as possible – directly above the RGB camera. This trackable’s coordinates and orientation was used to obtain both the position vector between the OptiTrack and Kinect coordinate frames as well as the orientation differences between the two coordinate frames. This information was used to create a homogeneous transformation matrix relating the Kinect and Optitrack’s coordinate frames with each other. 2.4 Data Collection OptiTrack 1.25 Kinect 1.2 1.15 Y− axis (m) 2.3 1.1 1.05 1 0.95 0.9 0.85 1.5 1.4 1.3 0 1.2 −0.1 1.1 −0.2 −0.3 1 Z−axis (m) −0.4 0.9 −0.5 0.8 0.7 Motion capture data was acquired with an OptiTrack motion capture system and a Microsoft Kinect. The Skeletal Viewer application – from the Developer Toolkit (1.7.0) – was notably modified and used as the sole engine for both Kinect and OptiTrack raw data capture. The computer used for both systems was running a 64-bit Windows 7 operating system on an Intel core i3-2120 CPU at 3.30 GHz using 8 GB of DDR2 RAM. The OptiTrack, an isometric, passive marker-based optical motion capture system, consisted of eight V100:R2 cameras used for marker position tracking, two OptiTrack OptiHubs controlling hardware modules for handling communication, synchronization, and control of data flow between cameras and computer, and the OptiTrack Tracking Tools 2.5.0 data processing software API, enabling real-time capture feedback and rigid body trackables creation. The marker clusters used in this experiment were created using three sets of markers 7/16” in diameter positioned to triangulate the center point of the targeted joints. All three Cartesian coordinates of shoulder (humorous head), elbow (lateral epicondyle), and wrist (ulna head) were recorded in mm at a sampling rate of 100 Hz. Both the OptiTrack system and the Microsoft Kinect sensor were connected to the computer via USB 2.0 cables. 2.4.1 Kinect Skeletal Viewer The Skeletal Viewer program’s skeletalization algorithm, which is part of the Kinect for windows SDK, was used as the tool for Kinect-based joint position data collection, Graphical User Interface development for the experimenter, and development of an OptiTrack motion capture engine. To enable these three functionalities, four main modifications to the Skeletal Viewer were completed a) an engine thread was spawned and initialized to real-time priority in order to interface with the OptiTrack API; b) the priority of the main skeletalization thread of the Skeletal Viewer application was set to real-time; c) the Skeletal Viewer GUI was equipped with a trigger to insert synchronization pulses simultaneously into both data sets; d) and Cartesian coordinate frame data for both systems was extracted and written to disk with an associated time stamp. 2.5 Coordinate Frame Transformation A homogeneous transformation matrix was used to align the coordinate frames of the Kinect and the OptiTrack. H01 , the homogeneous transformation matrix, with coordinate frame 0 being that of the OptiTrack and coordinate frame 1 being that of the Kinect, was constructed using three rotation matrices and a translation matrix: −0.6 −0.7 X−axis (m) Figure 2: An artificial movement test for testing Kinect to OptiTrack coordinate frame transformation was completed by drawing two perpendicular boxes along the OptiTrack X and Z axes in the air with the wrist joint tracker. ⎡ Rx,θ 1 ⎢0 = ⎣ 0 0 0 cos θ sin θ 0 ⎡ Ry,ψ Rz,φ cos ψ ⎢ 0 = ⎣ − sin ψ 0 ⎡ cos φ ⎢ sin φ = ⎣ 0 0 ⎡ 1 ⎢0 Tx,y,z = ⎣ 0 0 0 1 0 0 0 − sin θ cos θ 0 ⎤ 0 0⎥ 0⎦ 1 sin ψ 0 cos ψ 0 ⎤ 0 0⎥ 0⎦ 1 − sin φ cos φ 0 0 0 1 0 0 0 0 1 0 0 0 1 0 ⎤ 0 0⎥ 0⎦ 1 ⎤ x y⎥ z⎦ 1 H10 = Tx,y,z × Rz,φ × Ry,ψ × Rx,θ This transformation can be conceptualized as the rotation of the axes of the Kinect coordinate frame into alignment with the axes of the OptiTrack coordinate frame. To describe where, and at what orientation, a point captured in the Kinect frame (P1 ) would be with respect to the OptiTrack frame (P0 ), the point position vector needs to be multiplied with H01 : P0 = H10 × P1 In order to test the numerical correctness of this homogeneous transformation matrix, an artificial movement was constructed where two perpendicular boxes aligned along the X and Z axes of the OptiTrack coordinate system were drawn in the air utilizing the wrist joint marker. Figure 2 contains the resulting movement trajectories gathered simultaneously by the Kinect and OptiTrack confirming accurate and correct matching of the coordinate frames. 457 Table 1: Summary of the average normalized root mean squared error (NRMSE) between Kinect and OptiTrack capture trajectory data for the 1st movement of all ten participants. Without removal of the inherent offset between two systems, the error introduced is non-uniform and significant. 2.6 1st Movement Participant 1: Participant 2: Participant 3: Participant 4: Participant 5: Participant 6: Participant 7: Participant 8: Participant 9: Participant 10: x (m) 0.2776 0.1433 0.2689 0.1782 0.2669 0.2299 0.2501 0.2702 0.3042 0.2109 Shoulder y (m) 0.1488 0.0419 0.1689 0.0997 0.1558 0.1155 0.0957 0.1536 0.1424 0.1216 Average: 0.2400 0.1244 z (m) 0.2692 0.0318 0.2599 0.2577 0.2684 0.1911 0.1891 0.2639 0.3386 0.2189 0.2289 x (m) 0.1928 0.0893 0.1754 0.1292 0.2090 0.1884 0.1871 0.2245 0.2194 0.1533 Elbow y (m) 0.2163 0.1314 0.2290 0.1667 0.2112 0.1686 0.1573 0.1956 0.1970 0.1767 z (m) 0.2524 0.0210 0.2584 0.2491 0.2552 0.1707 0.1756 0.2446 0.3232 0.2029 0.1769 0.1850 0.2153 Data Preprocessing Raw data for both system was captured in ASCII format text files (.csv). Marker coordinates were recorded in meters, and synchronization pulses began and ended each unique movement. These files were imported into MATLAB and the Kinect data was transformed into alignment with the OptiTrack data through the use of the homogeneous transformation matrix H01 . Individual movements were extracted from both system’s raw data with any extraneous data unrelated to a defined movement being discarded. A gap-filling method using cubic splines was applied in any cases where a small portion of data within a trial, for one or more joint positions, was occluded. Out of the 780 trials, 15 were excluded from data analysis due to excessive occlusion. From the remaining 765 trials, 63 required interpolation (gap-filling) of data; however, interpolated records only constituted 0.2% of total records. Both data sets were filtered using a zero phase-shift second order low-pass Butterworth filter with a cut-off frequency of 10 Hz. 2.7 Position Error Calculation Methodology In order to compare the trajectories captured by the two motion capture systems running at different capture (sampling) rates, custom MATLAB code was developed to, in essence, down-sample the 100 Hz OptiTrack data into a 30 Hz data set. This was accomplished by iterating through the Kinect data, parsing the time stamp, and matching the result to the closest possible OptiTrack record. Due to various constraints on CPU scheduling control that operating in a Windows 7 non-real-time operating system environment entails, records between systems had the potential to be mismatched by a few milliseconds, even though they should have had identical timestamps. In order to ensure that the each OptiTrack data point would indeed have a matching record with the Kinect data in terms of time, the Optitrack data was written to disk at a rate of 1000 Hz, even though data values only refreshed at approximately 100 Hz. This was done so that even if CPU cycles were required for operating system critical processes, resulting in a temporary OptiTrack CPU down-scheduling, the data would still be written to disk, at a minimum of 100 Hz. This methodology allowed for a successful comparison with each unique Kinect record, while the extra recorded duplicate OptiTrack data was ignored. The Normalized Root Mean Squared Error (NRMSE) between the trajectories captured by Kinect and OptiTrack is used as the 458 x (m) 0.2072 0.1107 0.1989 0.1439 0.2206 0.1901 0.1905 0.2256 0.2478 0.1636 Wrist y (m) 0.2064 0.1308 0.2115 0.1605 0.2061 0.1542 0.1503 0.1934 0.1766 0.1650 z (m) 0.2769 0.0699 0.2851 0.2848 0.2455 0.2131 0.1800 0.2327 0.3261 0.2009 0.1899 0.1755 0.2315 measure to quantify Kinect’s spatial accuracy. Essentially it is the L2-norm for the error between the trajectories, normalized by the number of data points in the trajectory, providing a unit of m that can be interpreted as average error per data point. There was an unavoidable base offset between the two datasets due to differing methods of joint position capture. The Kinect system utilizes image processing algorithms based on the silhouette of the participant for the X and Y coordinates and relies on an IR depth sensor for the Z coordinate. The skeletalization algorithm of the Kinect is, at its core, only an estimation of joint position, due to the markerless capture algorithm; whereas the OptiTtack triangulates the center point of the marker clusters placed on joint locals. While Kinect-based joint position trajectories was not observed to contain significant distortions, once a participant was repositioned for a different movement, or a new participant began, the offset was affected. NRMSE was calculated both without and with an inherent offset between the two systems removed. Inclusion of both results into the study enables a way to observe average spatial error values that should be expected when Kinect is the only available motion capture system (corresponding to errors without offset removal) or when an initial calibration step can be completed using Kinect and a more accurate motion capture system (corresponding to error values after offset removal). 3 3.1 R ESULTS AND D ISCUSSION Spatial Accuracy of Kinect without Offset Removal Table 1 demonstrates the large variance of offsets among participants based on a representative movement. This drastic variance leads to the conclusion that a meaningful comparison cannot be garnered from the two data sets, with the offset in place. 3.2 Spatial Accuracy of Kinect After Offset Removal Once the offset was removed by aligning the average points of each record set, the data gathered from both systems is comparable and leads to a useful overarching NRMSE calculation (Table 2). Representative data, with offset removed, for all joint positions during a single movement (Figure 3) demonstrate that, in general readings from both systems are in close agreement. The Kinect utilizes different hardware and techniques for calculating the X and Y axes than the Z axis. Due to the differing technology utilized to derive Z axis information, the data gathered on Z axis contain Table 2: Summary of the average NRMSE results for all movements across all 10 participants. The reported values were calculated by first averaging the NRMSE of each participant’s three recorded trials for each unique movement, and then averaging across the participants. After an initial calibration (removal of offset) by using OptiTrack, error in measurements is significantly smaller and more uniform. Movement 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th x (m) 0.0094 0.0060 0.0044 0.0043 0.0179 0.0166 0.0073 0.0090 0.0039 0.0059 0.0053 0.0162 0.0098 Shoulder y (m) 0.0058 0.0056 0.0054 0.0066 0.0080 0.0078 0.0079 0.0054 0.0020 0.0059 0.0066 0.0157 0.0060 Average: 0.0089 0.0068 z (m) 0.0043 0.0037 0.0055 0.0044 0.0065 0.0054 0.0076 0.0039 0.0030 0.0063 0.0063 0.0072 0.0046 0.0053 x (m) 0.0113 0.0128 0.0107 0.0095 0.0117 0.0125 0.0052 0.0104 0.0126 0.0129 0.0121 0.0114 0.0141 Elbow y (m) 0.0139 0.0089 0.0118 0.0077 0.0168 0.0179 0.0114 0.0133 0.0075 0.0125 0.0159 0.0190 0.0148 z (m) 0.0095 0.0066 0.0092 0.0088 0.0101 0.0084 0.0098 0.0082 0.0087 0.0180 0.0092 0.0129 0.0086 0.0113 0.0132 0.0098 more frequent and significant deviations from the baseline, as compared to data gathered for other axes; however, these deviations do not persist very long and ultimately lead to a comparable NRMSE. With a maximum NRMSE observed throughout all movements being 0.0275m – even including movements with obvious occlusion for the Kinect (e.g., 6 and 11)– we conclude that the Kinect accuracy level is mostly sufficient for targeted clinical and in-home use. It is important to note that the NRMSE between Kinect and OptiTrack systems, for any movement without offset removal, is strongly affected by more than one variable; including angle and distance from the Kinect sensor; girth and length of the participant’s torso and limb; and a particular movement’s relationship to the Z plane. Hence, if an initial calibration step is to be used for offset removal, then the experiment conditions need to remain fairly identical to ensure accurate readings from Kinect. 3.3 Latency and Capture Rate Characterization To validate the latency effects of the Kinect sensor, as well as the capture rate distribution, the following two tests were completed. 3.3.1 Latency Evaluation An artificial, and not directly stroke-related, movement was used to calculate the latency introduced by the Kinect. The participant was instructed to extend their arm in a 90◦ manner to the side utilizing shoulder abduction and then immediately lower it. The latency between systems was then calculated based on the maximum distance point of the wrist’s arc on the y axis in both data sets. The difference between the Kinect and the OptiTrack peak point was within approximately two capture periods of the Kinect (≈ 70 millisecond difference); however, it should be noted that this result contains inaccuracies due to the differences in system refresh rates. 30 Hz capture rate of the Kinect and the 100 Hz capture rate of the OptiTrack system have a strong potential to create a peak point reading of the trajectory not exactly overlapping in time. For example, if the absolute peak point of the movement happens to rest exactly positioned in between two Kinect sampling instants, then the Kinect’s peak point would be slightly lower and slightly earlier than the actual peak point. However, due to the fact that the results of the latency test even with these limitations were within a two x (m) 0.0097 0.0095 0.0177 0.0133 0.0169 0.0238 0.0115 0.0166 0.0109 0.0263 0.0099 0.0112 0.0129 Wrist y (m) 0.0114 0.0079 0.0188 0.0100 0.0224 0.0223 0.0226 0.0152 0.0069 0.0194 0.0169 0.0220 0.0170 z (m) 0.0214 0.0146 0.0152 0.0084 0.0199 0.0182 0.0195 0.0102 0.0162 0.0275 0.0179 0.0206 0.0161 0.0146 0.0164 0.0174 capture-period time frame in regards to the Kinect, we conclude that the latency introduced by the Kinect is non-negligible but not prohibitive for use in rehabilitation interfaces with real-time feedback components either. In order to further expand this conclusion, the Kinect capture rate distribution was also examined. To evaluate the capture rate distribution of the Kinect, a full data recording session (approx. 10 minutes in length) was analysed. The difference in time stamps per record was transformed into a histogram and fit with a Gaussian Curve. The results indicated a mean difference between records of 33.3 ms with a standard deviation of 3.86ms, as summarized in Figure 4. 3.4 Participants The participants of this study were healthy, whereas the target group of users suffer from neurological injury, thus future studies and systems should address the comparison and support of both a larger sample of participants as well as participants who suffer from stroke related neurological injury. 4 C ONCLUSION In this paper we evaluated the accuracy and latency of the Kinect by using OptiTrack, a research grade motion capture system to establish the baseline. A specific set of movements derived from commonly used stroke impairment level tests were examined. This comparison resulted in an acceptable level of accuracy (maximum average NRMSE of 1.74cm) and a level of latency within approximately two capture periods (66.66ms). The results of this study support the Kinect as a sufficiently accurate and responsive sensor for gross movement-based impairment assessment and rehabilitation progress tracking system for both clinical and in-home settings. Various parameters (angle and distance from the Kinect; girth and length of the body parts; movements along the Z plane) have the potential to significantly affect accuracy of Kinect-based readings, hence, in a stand-alone experiment, conditions must remain fairly identical to the calibration setting to ensure accurate data capture. 5 ACKNOWLEDGMENTS This work was supported in part by a Center for Computing for Life Sciences (CCLS) research grant at San Francisco State University. 459 X−axis (m) Shoulder 0.2 0.1 0.1 0.1 0 0 0 −0.1 −0.1 −0.1 −0.2 −0.2 −0.2 −0.3 −0.3 0.5 1 1.5 2 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 1.1 1.1 1.05 1.05 1.05 1 1 1 0.95 0.95 0.95 0.9 0.9 0.9 0.85 0.85 0.85 0.8 0.8 0.8 0.75 0.75 0.75 0 0.5 1 1.5 2 0 0.5 1 1.5 2 1.15 1.15 1.1 1.1 1.1 1.05 1.05 1.05 1 1 1 0.95 0.95 0.95 0.9 0.9 0.9 0.85 0.85 0.85 0 0.5 1 time (seconds) 1.5 2 Kinect OptiTrack −0.3 0 1.1 1.15 Z−axis (m) Wrist 0.2 0 Y−axis (m) Elbow 0.2 0 0.5 1 time (seconds) 1.5 2 time (seconds) Figure 3: The presented charts are representative data, in all three axes, of both Kinect and OptiTrack readings of movement 2 - shoulder abduction (Figure 1). The specific method of calculating the X and Y axes employed by the Kinect are based on different technologic methods than the Z axis and due to this, the trends gathered on Z plane contain notably more frequent deviation from the baseline; however, these deviations fluctuate rapidly and ultimately lead to a comparable NRMSE Number of data points 2500 2000 1500 1000 500 0 15 20 25 30 35 40 Capture period (ms) 45 50 55 Figure 4: Histogram of Kinect capture period measurements in milliseconds. The distribution of the capture periods is approximated by a Gaussian with 33.3ms mean value and 3.86ms standard deviation. R EFERENCES [1] C.-Y. Chang, B. Lange, M. Zhang, S. Koenig, P. Requejo, N. Somboon, A. A. Sawchuk, and A. A. Rizzo. Towards pervasive physical rehabilitation using microsoft kinect. In Proc. 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth), pages 159–162. IEEE, 2012. [2] K. Christensen, G. Doblhammer, R. Rau, and J. W. Vaupel. Ageing populations: the challenges ahead. The Lancet, 374(9696):1196– 1208, 2009. [3] R. A. Clark, Y.-H. Pua, K. Fortin, C. Ritchie, K. E. Webster, L. Denehy, and A. L. Bryant. Validity of the microsoft kinect for assessment of postural control. Gait & Posture, 36(3):372–377, 2012. [4] E. Croarkin, J. Danoff, and C. Barnes. Evidence-based rating of upperextremity motor function tests used for people following a stroke. Physical therapy, 84(1):62–74, 2004. 460 [5] A. Fernandez-Baena, A. Susin, and X. Lligadas. Biomechanical validation of upper-body and lower-body joint movements of kinect motion capture data for rehabilitation treatments. In Proc. 4th International Conference on Intelligent Networking and Collaborative Systems (INCoS), pages 656–661. IEEE, 2012. [6] M. John, S. Klose, G. Kock, M. Jendreck, R. Feichtinger, B. Hennig, N. Reithinger, J. Kiselev, M. Gövercin, E. Steinhagen-Thiessen, et al. Smartseniors interactive trainer-development of an interactive system for a home-based fall-prevention training for elderly people. In Ambient Assisted Living, pages 305–316. Springer, 2012. [7] C. Loconsole, F. Banno, A. Frisoli, and M. Bergamasco. A new kinectbased guidance mode for upper limb robot-aided neurorehabilitation. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1037–1042. IEEE, 2012. [8] S. Obdrzalek, G. Kurillo, F. Ofli, R. Bajcsy, E. Seto, H. Jimison, and M. Pavel. Accuracy and robustness of kinect pose estimation in the context of coaching of elderly population. In Proc. Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1188–1193. IEEE, 2012. [9] L. M. Pedro and G. A. de Paula Caurin. Kinect evaluation for human body movement analysis. In Proc. 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob), pages 1856–1861. IEEE, 2012. [10] K. Tanaka, J. Parker, G. Baradoy, D. Sheehan, J. R. Holash, and L. Katz. A comparison of exergaming interfaces for use in rehabilitation programs and research. Loading...,The Journal of the Canadian Game Studies Association, 6(9):69–81, 2012. [11] J. Wiemeyer and A. Kliem. Serious games in prevention and rehabilitationa new panacea for elderly people? European Review of Aging and Physical Activity, 9(1):41–50, 2012.