CSCI498B/598B Human-Centered Robotics Project 2: Skeleton-Based Representations for Human Behavior Understanding Assigned: September 12 Multiple due dates (all materials must be submitted via Blackboard) Code due: Friday, October 3, 23:59:59 Report due: Friday, October 10, 23:59:59 In this project, you will implement several skeleton-based representations and make use of Support Vector Machines (SVMs) to classify human behaviors using an activity dataset collected from a Kinect sensor. I. S UPPORT V ECTOR M ACHINES Let’s look at SVMs first before we get into details of implementing skeleton-based representations. One objective of this project is to understand how to apply SVMs as a tool in real-world applications. In this project, we will use LIBSVM as our classifier (also referred to as a learner), which is an excellent open source implementation of SVMs developed by Chang and Lin at National Taiwan University. This library provides software support for SVMs, with source code available for C++, and interfaces available to MATLAB, Octave, Python, Perl, and many other programming languages and environments. Here’s the link to the LIBSVM library: http://www.csie.ntu.edu.tw/˜cjlin/ libsvm. A. Installing LIBSVM First, follow the instructions on the LIBSVM website for downloading the software. The current release is Version 3.18. Note that the README file within the package provides a lot of helpful information for installing and using the LIBSVM package. In addition, make sure you read through and have a good understanding of the following Practical Guide by Hsu, Chang, and Lin, especially the examples in the guide: http://www.csie.ntu.edu.tw/˜cjlin/ papers/guide/guide.pdf. B. Getting Familiar with LIBSVM In order to help get up to speed with LIBSVM, I recommend following the examples in Appendix A of the above Practical Guide. The exemplary datasets (e.g., svmguide1 and others) are available online from the LIBSVM directory, already formatted into the form expected by LIBSVM, here: http://www.csie.ntu.edu.tw/˜cjlin/ libsvmtools/datasets/. If you have any questions or comments, please contact the instructor, Dr. Hao Zhang, at hzhang@mines.edu. This write-up is prepared using LATEX. (a) Names (b) Indices Fig. 2: Skeleton joint names and indices from Kinect SDK. C. Data Format of LIBSVM The format of training and testing data file is: <label> <index1>:<value1> <index2>:<value2> ... . . . Each line contains an instance and is ended by a ‘\n’ character. For classification (as in this project), <label> is an integer indicating the class label (multi-class is supported). Indices must be in ASCENDING order. Labels in the testing file are only used to calculate accuracy or errors. Please look at some examples, such as svmguide1, to make sure you understand the format. Otherwise, a wrongly formatted data file will fail LIBSVM. Please note that your skeleton-based representations must be converted to this data format for the LIBSVM software to process. II. DATASET The MSR Daily Activity 3D dataset1 will be used in this projet, which is a most widely applied benchmark dataset in human behavior understanding tasks. This dataset contains 16 human activities, as illustrated in Fig. 1, performed by 10 subjects. Each subject performs each activity twice, once in standing position and once in sitting position. Although this 1 http://research.microsoft.com/en-us/um/people/ zliu/actionrecorsrc/ (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) Fig. 1: The MSR Daily Activity 3D dataset contains sixteen activities: (1) drink, (2) eat, (3) read book, (4) call cellphone, (5) write on a paper, (6) use laptop, (7) use vacuum cleaner, (8) cheer up, (9) sit still, (10) toss paper, (11) play game, (12) lie down on sofa, (13) walk, (14) play guitar, (15) stand up, (16) sit down. dataset contains color-depth and skeleton data, we will only explore the skeletal information to construct skeleton-based human representations in this project. The skeleton in each frame contains 20 joints, as illustrated by Fig. 2. The correspondence between joint names and joint indices is also presented in the figure. For example, Joint #1 is HipCenter, Joint #18 is KneeRight, etc. In this project, a pre-formatted skeleton data will be used, which can be downloaded from the project website: http://inside.mines.edu/˜hzhang/Courses/ CSCI498B-598B-Fall14/Projects/Project-2/ dataset.tar.gz. The dataset in the project is a subset of the original dataset, which only contains six (6) activity categories: • • • • • • CheerUp (a08) TossPaper (a10) LineOnSofa (a12) Walking (a13) StandUp (a15) SitDown (a16) Also, it is noteworthy that this dataset is not formatted to the LIBSVM form but much easier to process then the data format used in the original dataset. In particular, the dataset contains two folders: Train and Test. Data instances in the directory Train are used for training (and validation). Data instances in Test are used for testing the performance. Each instance has a filename like: a12 s08 e02 skeleton proj.txt. This filename means the data instance belongs to activity category 12 (i.e., a12, that is “lie down on sofa” as in Fig. 1), from human subject 8 (s08) at his/her second trial (i.e., e02). Again, the dataset contains 6 activity categories, 10 human subjects, and 2 trials each subject. Instances from subjects 1–6 are used for training (in the directory Train), and instances from subjects 7–10 are used for testing (in Test). When you open a data instance file, say a12 s08 e02 skeleton proj.txt, you will see something like: 1 1 0.326 -0.101 2.111 1 2 0.325 -0.058 2.152 1 3 0.319 0.194 2.166 . . . So each row contains five values, representing: 1) frame id, 2) joint id, 3) joint position x, 4) joint position y, 5) joint position z. These joint positions are in real-world 3D coordinates with a unit of meters. Each frame contains 20 rows that contain information of all joints in the frame. The skeleton data from all frames of an instance are included in the same data file (like a12 s08 e02 skeleton proj.txt). III. F EATURES TO I MPLEMENT (CSCI 498B) There are two tasks for students in CSCI498B: implement the representation based on relative angles and distances of star skeleton, and explore other similar representation. A. Relative Angles and Distances of Star Skeleton Students in CSCI 498B will implement the human representation based on Relative Angles and Distances (RAD) of star skeleton, as described by Algorithm 1. The objective is to implement the RAD representation and use it to convert all instances in the folder Train into a single training file rad, which can be processed by the LIBSVM software. Similarly, all instances in the folder Test needs to be converted into a single testing file rad.t for testing purpose. Algorithm 1: RAD representation using star skeletons Input : Training set Train and testing set Test Output : rad and rad.t 1: 2: 3: 4: 5: 6: 7: 8: Fig. 3: Illustration of human representation based on relative distance and angles of star skeleton 9: 10: 11: After obtaining the formatted training and testing files, i.e., rad and rad.t, you can use LIBSVM to train a C-SVM model with radial basis function (RBF) kernels. Then, you can use the model to make predictions of the testing data in rad.t. The predicted labels will be generated by LIBSVM with a name rad.t.predict. Please refer to the examples in the above Practical Guide. Then, you need to evaluate the performance of your RAD representation by computing accuracy and confusion matrix. These results must be included in your report. 12: 13: 14: for each instance in Train and Test do for frame t = 1, ..., T do Select joints that form a star skeleton (Fig. 3); Compute and store distances between body extremities and the body centroid joint (dt1 , ..., dt5 ); Compute and store angles between two adjacent body extremities (θ1t , ..., θ5t ); end Compute a histogram of N bins for each di = {dti }Tt=1 , i = 1, ..., 5; Compute a histogram of M bins for each θ i = {θit }Tt=1 , i = 1, ..., 5; Normalize the histograms by dividing T to compensate for the different number of frames in different instances; Concatenate all normalized histograms into a one-dimensional vector of length 5(M + N ); Convert the feature vector to the LIBSVM format (the feature vector of the instance occupies a single row) end Construct a single file (named rad for Train and rad.t for Test), which needs to contain feature vectors of all instances and follow the LIBSVM format; return rad and rad.t IV. S UBMISSION AND G RADING I NSTRUCTIONS (CSCI 498B) A. Project Code (Due: Oct. 3, 23:59:59 to Blackboard) B. Similar Representations Implement at least a new skeleton-based representation by choosing different joints other than the joints used in the star skeleton. For example, you can change the reference joint; or you can use other joints other than body extremities; or you can compute distances of all joints but ignore the orientation information, as the presentation of Histogram of Joint Position Differences (HJPD) discussed in Section V-A. C. Data to Gather For experiments discussed in Sections III-A and III-B, you need to provide sufficient details of your implementations and experiments, including (but not limited to): • • • • • • Which joints are used in the RAD representation; How the histograms are computed, and how many bins are used; What are the “optimal” C and γ values of C-SVMs for the RAD representation; A graph of the grid search (from grid.py) of the experiments must be included; Experimental results in terms of accuracy and confusion matrix; Analyze your experimental results, such as: what joints are more representative and helpful for human representation? what are pros and cons of the representations? Other types of analyses of your results along these lines are encouraged. Your software submission should include the following: • The code that implements the skeleton-based representations; • The final data files you obtained from your code, i.e., rad and rad.t, and files for other experiments, which must contain feature vectors of all instances and follow the LIBSVM format; • Sufficient instructions, makefiles, etc., needed to compile and test your projects, including README files with instructions You are free to use any programming language you like, but make sure to provide clear instructions of how to use the code and run your project. In addition, tar all materials into a single file named yourlastname-Project2-Code.tar (or tar.gz or zip). B. Project Paper (Due: Oct. 10, 23:59:59 to Blackboard) As part of your project, you must prepare a single 1-3 page document (in pdf format, using LATEX and the 2-column IEEE style as this write-up) that presents the results listed in “Data to Gather” in Section V-D. The documentation you turn in must be in pdf format and in a single file. Name your paper yourlastname-Project2-Paper.pdf. C. Grading Your grade will be based on the quality of your project implementation and the documentation of your findings in the write-up: • • 75%: The quality of your final working code implementation, and software documentation. You should have a “working” software implementation, which means that the skeleton-based representations are implemented in software, your code runs without crashing, and performs the behavior understanding task. 25%: Your paper, prepared in LATEX using IEEE styling, and submitted in pdf format – Figures and graphs should be clear and readable, with axes labeled and captions that describe what each figure or graph illustrates. The content should include all the experimental results and discussions mentioned previously in this document. D. Data to Gather For all representation implementations and experiments, you need to provide sufficient details to evaluate the performance of your representation and to repeat the experimental results. Please refer to Section V-D for examples. VI. S UBMISSION AND G RADING I NSTRUCTIONS (CSCI 598B) A. Project Code (Due: Oct. 3, 23:59:59 to Blackboard) Implement the skeleton-based human representation based on Relative Angles and Distances (RAD) of star skeleton, as discussed in Section III-A. Your software submission should include the following: • The code that implements three skeleton-based human representations (RAD, HJPD, and HOD); • The final data files you obtained from your code, i.e., rad and rad.t, and similar files for HJPD and HOD experiments, which must contain feature vectors of all instances and follow the LIBSVM format; • Sufficient instructions, makefiles, etc., needed to compile and run your projects and to evaluate the performance, including README files with instructions You can use any programming languages. However, C++ is recommended and make sure to provide clear instructions of how to run your project. In addition, tar all materials into a single file named yourlastname-Project2-Code.tar (or tar.gz or zip). B. Similar Representations B. Project Paper (Due: Oct. 10, 23:59:59 to Blackboard) For example, you can implement a similar representation, such as the Histogram of Joint Position Differences (HJPD): Given the location of a joint (x, y, z) and a reference joint (xc , yc , zc ) in the world coordinate, the joint displacement is defined as: As part of your project, you are required to prepare a single 4-6 page document (in pdf format, using LATEXand 2-column IEEE style as this write-up). The documentation you turn in must be in pdf format and in a single file. Name your paper yourlastname-Project2-Paper.pdf. Your paper must include the following: • An abstract of 200 to 300 words summarizing your findings. • A brief introduction describing the skeleton-based representation task and your implementations. • A detailed description of your experiments, with enough information that would enable someone to recreate your experiments. • An explanation of the results, including all the experimental details such as under “Data to Gather”, as well as your additional extensions: use figures, graphs, and tables where appropriate. Your results should make it clear that the system has in fact worked. Accuracy and confusion matrix must be presented and discussed. • A discussion of the significance of the results. The reader of your paper should be able to understand what you have done and what your software does without looking at the code itself. V. F EATURES TO I MPLEMENT (CSCI 598B) Students who are in CSCI498B must finish three tasks: (1) implement the representation based on relative angles and distances of star skeleton, (2) explore other similar human representations, and (3) implement the representation based on histogram of oriented displacements. A. Relative Angles and Distances of Star Skeleton (∆x, ∆y, ∆z) = (x, y, z) − (xc , yc , zc ) The reference joint can be the skeleton centroid or a fixed joint. For each temporal sequence of human skeletons (in a data instance), a histogram is computed for the displacement along each dimension (e.g., ∆x, ∆y or ∆z). Then, the computed histograms are concatenated into a single vector as a feature. This HJPD representation is similar to the RAD representation, except that it uses all joints and ignores the pairwise angles. Refer to Section 3.3 of the reference [1] for more details, which is available online at: http://www.csse.uwa.edu.au/˜ajmal/ papers/hossein_WACV2014.pdf C-SVMs with RBF kernels should be used as the learner. C. Histogram of Oriented Displacements Implement the skeleton-based representation of Histogram of Oriented Displacements (HOD), as introduced in Section 3 of the paper [2], including Temporal Pyramid. The paper is available online at: http://ijcai.org/papers13/Papers/ IJCAI13-203.pdf C-SVMs with RBF kernels should be used as the learner. C. Grading Your grade will be based on the quality of your project implementation and the documentation of your findings in the write-up. CSCI598B students will be graded more strictly on the quality of the research and paper presentation. I expect a more thorough analysis of your experimental results, and a working implementation of the skeleton-based representation and behavior recognition system. Your analysis should include further extensions selected from the above list, or other ideas you may have. The paper should have the “look and feel” of a technical conference paper, with logical flow, good grammar, sound arguments, illustrative figures, etc. Your grade will be based on: • 75%: The quality of your final working code implementation, and software documentation. You should have a “working” software implementation, which means that the skeleton-based representations are implemented in software, your code runs without crashing, and performs the human behavior recognition task. AT X using IEEE styling, • 25%: Your paper, prepared in L E and submitted in pdf format – Figures and graphs should be clear and readable, with axes labeled and captions that describe what each figure and graph illustrates. The content should include all the experimental results and discussions mentioned previously in this document. R EFERENCES [1] H. Rahmani, A. Mahmood, A. Mian, and D. Huynh, “Real time action recognition using histograms of depth gradients and random decision forests,” in IEEE Winter Conference on Applications of Computer Vision, 2014. [2] M. Gowayyed, M. Torki, M. Hussein, and M. El-Saban, “Histogram of oriented displacements (HOD): Describing trajectories of human joints for action recognition,” in International Joint Conference on Artificial Intelligence, 2013.