CSCI473/573 Human-Centered Robotics Project 2: Skeleton-Based Representations for Human Behavior Understanding Assigned: October 21 Multiple due dates (all materials must be submitted via Blackboard) Code due: Monday, November 2, 23:59:59 Report due: Wednesday, November 4, 23:59:59 Deadlines will not be extended In this project, you will implement several skeleton-based representations and apply Support Vector Machines (SVMs) to classify human behaviors using an activity dataset collected from a Kinect sensor. I. S UPPORT V ECTOR M ACHINES Let’s look at SVMs before we get into details of implementing skeleton-based representations. One objective of this project is to understand how to apply SVMs as a tool in realworld applications. In this project, we will use LIBSVM as our classifier (also referred to as a learner), which is an excellent open source implementation of SVMs and related methods. This library provides software support for SVMs, with source code available for C++, and interfaces available to MATLAB, Octave, Python, Perl, and many other programming languages. Here’s the link to obtain the LIBSVM library: http://www.csie.ntu.edu.tw/˜cjlin/ libsvm. A. Installing LIBSVM Follow the instructions on the LIBSVM website for downloading the software. The up-to-date release is Version 3.20. Note that the README file within the package provides a lot of helpful information for installing and using the LIBSVM package. In addition, make sure you read through and have a good understanding of the following Practical Guide by Hsu, Chang, and Lin, especially the examples in the guide: http://www.csie.ntu.edu.tw/˜cjlin/ papers/guide/guide.pdf. B. Getting Familiar with LIBSVM To understand LIBSVM, you are highly recommended to work on and play with the examples in Appendix A within the Practical Guide. The exemplary datasets (e.g., svmguide1 and others) are available online from the LIBSVM directory, already formatted into the form expected by LIBSVM, here: http://www.csie.ntu.edu.tw/˜cjlin/ libsvmtools/datasets/. If you have any questions, please contact the instructor, Prof. Hao Zhang, at hzhang@mines.edu. This write-up is prepared using LATEX. (a) Names (b) Indices Fig. 2: Skeleton joint names and indices from Kinect SDK. C. Data Format of LIBSVM The format of training and testing data files is: <label> <index1>:<value1> <index2>:<value2> ... . . . Each line contains an instance and must be ended by a ‘\n’ character. For classification (as in this project), <label> is an integer indicating the class label (multi-class is supported). Indices must be in ASCENDING order. Labels in the testing file are only used to calculate accuracy or errors. Please look at the examples in the Practical Guide, such as svmguide1, to make sure you understand the format. Otherwise, a wrongly formatted data file will fail LIBSVM. Please note that your skeleton-based representations must be converted to this data format for the LIBSVM software to process. II. DATASET The MSR Daily Activity 3D dataset1 will be used in this projet, which is a most widely applied benchmark dataset in human behavior understanding tasks. This dataset contains 16 human activities, as illustrated in Fig. 1, performed by 10 subjects. Each subject performs each activity twice, once in 1 http://research.microsoft.com/en-us/um/people/ zliu/actionrecorsrc/ (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) Fig. 1: The MSR Daily Activity 3D dataset contains sixteen activities: (1) drink, (2) eat, (3) read book, (4) call cellphone, (5) write on a paper, (6) use laptop, (7) use vacuum cleaner, (8) cheer up, (9) sit still, (10) toss paper, (11) play game, (12) lie down on sofa, (13) walk, (14) play guitar, (15) stand up, (16) sit down. standing position and once in sitting position. Although this dataset contains color-depth and skeleton data, we will only explore the skeletal information to construct skeleton-based human representations in this project. The skeleton in each frame contains 20 joints, as illustrated by Fig. 2. The correspondence between joint names and joint indices is also presented in the figure. For example, Joint #1 is HipCenter, Joint #18 is KneeRight, etc. In this project, a pre-formatted skeleton data will be used, which can be downloaded from the project website: http://inside.mines.edu/˜hzhang/Courses/ CSCI473-573/Projects/Project-2/dataset. tar.gz. The dataset in the project is a subset of the original dataset, which only contains six (6) activity categories: • • • • • • CheerUp (a08) TossPaper (a10) LieOnSofa (a12) Walking (a13) StandUp (a15) SitDown (a16) It is noteworthy that the provided dataset is not formatted to the LIBSVM form, but much easier to process then the data format used in the original dataset. In particular, the dataset contains two folders: Train and Test. Data instances in the directory Train are used for training (and validation). Data instances in Test are used for evaluating the performance. Each instance has a filename like: a12 s08 e02 skeleton proj.txt. This filename means the data instance belongs to activity category 12 (i.e., a12, that is “lie down on sofa” as in Fig. 1), from human subject 8 (s08) at his/her second trial (i.e., e02). Again, the dataset contains 6 activity categories, 10 human subjects, and 2 trials each subject. Instances from subjects 1–6 are used for training (in the directory Train), and data instances from subjects 7–10 are used for testing (in Test). When you open a data instance file, say a12 s08 e02 skeleton proj.txt, you will see something like: 1 1 0.326 -0.101 2.111 1 2 0.325 -0.058 2.152 1 3 0.319 0.194 2.166 . . . Each row contains five values, representing: 1) frame id, 2) joint id, 3) joint position x, 4) joint position y, 5) joint position z. These joint positions are in real-world 3D coordinates with a unit of meters. Each frame contains 20 rows that contain information of all joints in the frame. The skeleton data from all frames of an instance are included in the same data file (like a12 s08 e02 skeleton proj.txt). III. F EATURES TO I MPLEMENT (CSCI 473) Students in CSCI473 must complete two tasks: 1) Implement the human representation based on relative angles and distances of star skeleton. 2) Explore other similar representations. A. Relative Angles and Distances of Star Skeleton Students in CSCI 473 are required to implement the human representation based on the Relative Angles and Distances (RAD) of star skeleton, as described by Algorithm 1. The objective is to implement the RAD representation and use it to convert all data instances in the folder Train into a single Algorithm 1: RAD representation using star skeletons Input : Training set Train and testing set Test Output : rad and rad.t 1: 2: 3: 4: 5: 6: 7: 8: Fig. 3: Illustration of human representation based on relative distance and angles of star skeleton 9: 10: training file rad, which can be processed by the LIBSVM software. Similarly, all instances in the directory Test needs to be converted into a single testing file rad.t for evaluation purpose. After obtaining the formatted training and testing files, i.e., rad and rad.t, you can use LIBSVM to train a C-SVM model with Radial Basis Function (RBF) kernels. Then, you can use the model to make predictions of the testing data in rad.t. The predicted labels will be generated by LIBSVM with a name rad.t.predict. Please refer to the examples in the Practical Guide. Then, you need to evaluate the performance of your RAD representation by computing accuracy and confusion matrix. These results must be included in your report. B. Similar Representations Implement at least a new skeleton-based representation by choosing different joints other than the joints selected in the star skeleton. For example, you can change reference joints, use other joints other than body extremities, or compute distances of all joints but ignore the orientation information, as the representation of Histogram of Joint Position Differences (HJPD) discussed in Section V-A. C. Data to Gather For experiments discussed in Sections III-A and III-B, you need to provide sufficient details of your implementation and experiments, including (but not limited to): • Which joints are used in the RAD representation; • How the histograms are computed, and how many bins are used; • What are the “optimal” C and γ values of C-SVMs for the RAD representation; • A graph of the grid search (from grid.py) of the experiments must be included; • Experimental results in terms of accuracy and confusion matrix. The accuracy must be better than 50%; • Analyze your experimental results, such as: what joints are more representative and descriptive for human representation? what are the pros and cons of the represen- 11: 12: 13: 14: for each instance in Train and Test do for frame t = 1, ..., T do Select joints that form a star skeleton (Fig. 3); Compute and store distances between body extremities and the body centroid joint (dt1 , ..., dt5 ); Compute and store angles between two adjacent body extremities (θ1t , ..., θ5t ); end Compute a histogram of N bins for each di = {dti }Tt=1 , i = 1, ..., 5; Compute a histogram of M bins for each θ i = {θit }Tt=1 , i = 1, ..., 5; Normalize the histograms by dividing T to compensate for the different number of frames in different instances; Concatenate all normalized histograms into a one-dimensional vector of length 5(M + N ); Convert the feature vector to the LIBSVM format (the feature vector of the instance occupies a single row) end Construct a single file (named rad for Train and rad.t for Test), which needs to contain feature vectors of all instances and follow the LIBSVM format; return rad and rad.t tations? Other analyses of the results along these lines are encouraged. IV. S UBMISSION AND G RADING I NSTRUCTIONS (CSCI 473) A. Project Code (Due: Nov. 2, 23:59:59 to Blackboard) Your software submission should include the following: • • • • The code that implements the skeleton-based representations; The final data files you obtain from your representation code, i.e., rad and rad.t, and data files for your other representations, which must contain feature vectors of all instances and follow the LIBSVM format; Sufficient instructions to compile and test your projects, especially a README file with detailed instructions; Tar all materials into a single file named yourlastnameProject2-Code.tar (or tar.gz or zip) You are required to implement the approaches using either C++ or Python in Ubuntu (since it’s a CS course). Make sure to provide clear instructions of how to use the code and run your project. B. Project Paper (Due: Nov. 4, 23:59:59 to Blackboard) As part of your project, you are required prepare a single 1-3 page document (in pdf format, using LATEX and the 2column IEEE style as this write-up) that presents the results listed in “Data to Gather” in Section V-D. The documentation you turn in must be in pdf format and in a single file. Name your paper yourlastname-Project2-Paper.pdf. C. Grading Your grade will be based upon the quality of your project implementation and the documentation of your findings in the write-up: • 75%: The quality of your final working code implementation, and software documentation. You should have a “working” software implementation, which means that your skeleton-based representations are implemented in software, your code runs without crashing, and performs the behavior understanding task. AT X using IEEE styling, • 25%: Your paper, prepared in L E and submitted in pdf format – Figures and graphs should be clear and readable, with axes labeled and captions that describe what each figure or graph illustrates. The content should include all the experimental results and discussions mentioned previously in this document. V. F EATURES TO I MPLEMENT (CSCI 573) Students in CSCI573 must finish three tasks: 1) Implement the RAD representation based on star skeleton, 2) Implement the HJPD representation, 3) Implement the HOD representation. D. Data to Gather For all human representation implementations and experiments, you need to provide sufficient details to evaluate the performance of your representations and to repeat the experimental results. Please refer to Section V-D for examples. VI. S UBMISSION AND G RADING I NSTRUCTIONS (CSCI 573) A. Project Code (Due: Nov. 2, 23:59:59 to Blackboard) Your software submission should include the following: • • • • A. Relative Angles and Distances of Star Skeleton Implement the skeleton-based human representation based on Relative Angles and Distances (RAD) of star skeleton, as discussed in Section III-A. B. Additional Representations Students in CSCI573 are required to implement the Histogram of Joint Position Differences (HJPD): Given the 3D location of a joint (x, y, z) and a reference joint (xc , yc , zc ) in the world coordinate, the joint displacement is defined as: (∆x, ∆y, ∆z) = (x, y, z) − (xc , yc , zc ) The reference joint can be the skeleton centroid or a fixed joint. For each temporal sequence of human skeletons (in a data instance), a histogram is computed for the displacement along each dimension, including ∆x, ∆y or ∆z. Then, the computed histograms are concatenated into a single vector as a feature. This HJPD representation is similar to the RAD representation, except that it uses all joints and ignores the pairwise angles. Refer to Section 3.3 of reference [1] for more details, which is available online at: http://www.csse.uwa.edu.au/˜ajmal/ papers/hossein_WACV2014.pdf C-SVMs with RBF kernels should be used as the learner. C. Histogram of Oriented Displacements Implement the skeleton-based representation of Histogram of Oriented Displacements (HOD), as introduced in Section 3 of reference [2], including the Temporal Pyramid. The paper is available online at: http://ijcai.org/papers13/Papers/ IJCAI13-203.pdf C-SVMs with RBF kernels should be used as the learner. The code that implements three skeleton-based human representations (RAD, HJPD, and HOD); The final data files you obtained from your code, i.e., rad and rad.t, and similar files for HJPD and HOD experiments, which must contain feature vectors of all instances and follow the LIBSVM format; Sufficient instructions needed to compile and run your projects and to evaluate the performance. In particular, you are required to include a README file that provides detailed instructions, Tar all materials into a single file named yourlastnameProject2-Code.tar (or tar.gz or zip). You are required to implement the approaches using either C++ or Python in Ubuntu (since it’s a CS course). Make sure to provide clear instructions of how to use the code and run your project. B. Project Paper (Due: Nov. 4, 23:59:59 to Blackboard) As part of your project, you are required to prepare a single 4-6 page document (in pdf format, using LATEXand 2-column IEEE style as this write-up). The documentation you turn in must be in pdf format and in a single file. Name your paper yourlastname-Project2-Paper.pdf. Your paper must include the following: • • • • • An abstract containing 200 to 300 words to summarize your findings. A brief introduction describing the skeleton-based representation task and your implementations. A detailed description of your experiments, with enough information that would enable someone to recreate your experiments. An explanation of the results, including all the experimental details such as under “Data to Gather”, as well as your additional extensions: include figures, graphs, and tables where appropriate. Your results should make it clear that the software has in fact worked. Accuracy and confusion matrix must be presented and discussed. In particular, the accuracy must be better than 50%. A discussion of the significance of the results. The reader of your paper should be able to understand what you have done and what your software does without looking at the code itself. C. Grading Your grade will be based upon the quality of your project implementation and the documentation of your findings in the write-up. CSCI573 students will be graded more strictly on the quality of the implementation and paper presentation. The instructor expects a more thorough investigation of the experimental results, and a working implementation of your skeleton-based representations for human behavior recognition. Your analysis should include further extensions selected from the above list, or other ideas you may have. The paper should have the “look and feel” of a technical conference paper, with logical flow, good grammar, sound arguments, illustrative figures, etc. Your grade will be based on: • 75%: The quality of your final working code implementation, and software documentation. You should have a “working” software implementation, which means that the skeleton-based representations are implemented in software, your code runs without crashing, and performs the human behavior recognition task. AT X using IEEE styling, • 25%: Your paper, prepared in L E and submitted in pdf format – Figures and graphs should be clear and readable, with axes labeled and captions that describe what each figure and graph illustrates. The content should include all the experimental results and discussions mentioned previously in this document. R EFERENCES [1] H. Rahmani, A. Mahmood, A. Mian, and D. Huynh, “Real time action recognition using histograms of depth gradients and random decision forests,” in IEEE Winter Conference on Applications of Computer Vision, 2014. [2] M. Gowayyed, M. Torki, M. Hussein, and M. El-Saban, “Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition,” in International Joint Conference on Artificial Intelligence, 2013.