CSCI498B/598B Human-Centered Robotics Project 2: Skeleton-Based Representations for Human Behavior Understanding

advertisement
CSCI498B/598B Human-Centered Robotics Project 2:
Skeleton-Based Representations for Human Behavior Understanding
Assigned: September 12
Multiple due dates (all materials must be submitted via Blackboard)
Code due: Friday, October 3, 23:59:59
Report due: Friday, October 10, 23:59:59
In this project, you will implement several skeleton-based
representations and make use of Support Vector Machines
(SVMs) to classify human behaviors using an activity dataset
collected from a Kinect sensor.
I. S UPPORT V ECTOR M ACHINES
Let’s look at SVMs first before we get into details of
implementing skeleton-based representations. One objective
of this project is to understand how to apply SVMs as a tool
in real-world applications.
In this project, we will use LIBSVM as our classifier (also
referred to as a learner), which is an excellent open source
implementation of SVMs developed by Chang and Lin at
National Taiwan University. This library provides software
support for SVMs, with source code available for C++, and
interfaces available to MATLAB, Octave, Python, Perl, and
many other programming languages and environments.
Here’s the link to the LIBSVM library:
http://www.csie.ntu.edu.tw/˜cjlin/
libsvm.
A. Installing LIBSVM
First, follow the instructions on the LIBSVM website for
downloading the software. The current release is Version
3.18. Note that the README file within the package provides a lot of helpful information for installing and using the
LIBSVM package. In addition, make sure you read through
and have a good understanding of the following Practical
Guide by Hsu, Chang, and Lin, especially the examples in
the guide:
http://www.csie.ntu.edu.tw/˜cjlin/
papers/guide/guide.pdf.
B. Getting Familiar with LIBSVM
In order to help get up to speed with LIBSVM, I recommend following the examples in Appendix A of the above
Practical Guide. The exemplary datasets (e.g., svmguide1
and others) are available online from the LIBSVM directory,
already formatted into the form expected by LIBSVM, here:
http://www.csie.ntu.edu.tw/˜cjlin/
libsvmtools/datasets/.
If you have any questions or comments, please contact the instructor, Dr.
Hao Zhang, at hzhang@mines.edu.
This write-up is prepared using LATEX.
(a) Names
(b) Indices
Fig. 2: Skeleton joint names and indices from Kinect SDK.
C. Data Format of LIBSVM
The format of training and testing data file is:
<label> <index1>:<value1> <index2>:<value2> ...
.
.
.
Each line contains an instance and is ended by a ‘\n’
character. For classification (as in this project), <label> is
an integer indicating the class label (multi-class is supported).
Indices must be in ASCENDING order. Labels in the testing
file are only used to calculate accuracy or errors. Please look
at some examples, such as svmguide1, to make sure you
understand the format. Otherwise, a wrongly formatted data
file will fail LIBSVM.
Please note that your skeleton-based representations must
be converted to this data format for the LIBSVM software
to process.
II. DATASET
The MSR Daily Activity 3D dataset1 will be used in this
projet, which is a most widely applied benchmark dataset in
human behavior understanding tasks. This dataset contains
16 human activities, as illustrated in Fig. 1, performed by 10
subjects. Each subject performs each activity twice, once in
standing position and once in sitting position. Although this
1 http://research.microsoft.com/en-us/um/people/
zliu/actionrecorsrc/
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
Fig. 1: The MSR Daily Activity 3D dataset contains sixteen activities: (1) drink, (2) eat, (3) read book, (4) call cellphone,
(5) write on a paper, (6) use laptop, (7) use vacuum cleaner, (8) cheer up, (9) sit still, (10) toss paper, (11) play game, (12)
lie down on sofa, (13) walk, (14) play guitar, (15) stand up, (16) sit down.
dataset contains color-depth and skeleton data, we will only
explore the skeletal information to construct skeleton-based
human representations in this project.
The skeleton in each frame contains 20 joints, as illustrated
by Fig. 2. The correspondence between joint names and joint
indices is also presented in the figure. For example, Joint #1
is HipCenter, Joint #18 is KneeRight, etc.
In this project, a pre-formatted skeleton data will be used,
which can be downloaded from the project website:
http://inside.mines.edu/˜hzhang/Courses/
CSCI498B-598B-Fall14/Projects/Project-2/
dataset.tar.gz.
The dataset in the project is a subset of the original dataset,
which only contains six (6) activity categories:
•
•
•
•
•
•
CheerUp (a08)
TossPaper (a10)
LineOnSofa (a12)
Walking (a13)
StandUp (a15)
SitDown (a16)
Also, it is noteworthy that this dataset is not formatted to
the LIBSVM form but much easier to process then the data
format used in the original dataset.
In particular, the dataset contains two folders: Train and
Test. Data instances in the directory Train are used for
training (and validation). Data instances in Test are used
for testing the performance. Each instance has a filename
like: a12 s08 e02 skeleton proj.txt. This filename
means the data instance belongs to activity category 12 (i.e.,
a12, that is “lie down on sofa” as in Fig. 1), from human
subject 8 (s08) at his/her second trial (i.e., e02). Again, the
dataset contains 6 activity categories, 10 human subjects, and
2 trials each subject. Instances from subjects 1–6 are used
for training (in the directory Train), and instances from
subjects 7–10 are used for testing (in Test).
When you open a data instance file, say a12 s08 e02 skeleton proj.txt, you will see something like:
1
1
0.326
-0.101
2.111
1
2
0.325
-0.058
2.152
1
3
0.319
0.194
2.166
.
.
.
So each row contains five values, representing:
1) frame id,
2) joint id,
3) joint position x,
4) joint position y,
5) joint position z.
These joint positions are in real-world 3D coordinates with
a unit of meters. Each frame contains 20 rows that contain
information of all joints in the frame. The skeleton data from
all frames of an instance are included in the same data file
(like a12 s08 e02 skeleton proj.txt).
III. F EATURES TO I MPLEMENT (CSCI 498B)
There are two tasks for students in CSCI498B: implement
the representation based on relative angles and distances of
star skeleton, and explore other similar representation.
A. Relative Angles and Distances of Star Skeleton
Students in CSCI 498B will implement the human representation based on Relative Angles and Distances (RAD) of
star skeleton, as described by Algorithm 1. The objective is
to implement the RAD representation and use it to convert all
instances in the folder Train into a single training file rad,
which can be processed by the LIBSVM software. Similarly,
all instances in the folder Test needs to be converted into
a single testing file rad.t for testing purpose.
Algorithm 1: RAD representation using star skeletons
Input : Training set Train and testing set Test
Output : rad and rad.t
1:
2:
3:
4:
5:
6:
7:
8:
Fig. 3: Illustration of human representation based on relative
distance and angles of star skeleton
9:
10:
11:
After obtaining the formatted training and testing files, i.e.,
rad and rad.t, you can use LIBSVM to train a C-SVM
model with radial basis function (RBF) kernels. Then, you
can use the model to make predictions of the testing data in
rad.t. The predicted labels will be generated by LIBSVM
with a name rad.t.predict. Please refer to the examples
in the above Practical Guide.
Then, you need to evaluate the performance of your RAD
representation by computing accuracy and confusion matrix.
These results must be included in your report.
12:
13:
14:
for each instance in Train and Test do
for frame t = 1, ..., T do
Select joints that form a star skeleton (Fig. 3);
Compute and store distances between body
extremities and the body centroid joint (dt1 , ..., dt5 );
Compute and store angles between two adjacent body
extremities (θ1t , ..., θ5t );
end
Compute a histogram of N bins for each di = {dti }Tt=1 ,
i = 1, ..., 5;
Compute a histogram of M bins for each θ i = {θit }Tt=1 ,
i = 1, ..., 5;
Normalize the histograms by dividing T to compensate
for the different number of frames in different instances;
Concatenate all normalized histograms into a
one-dimensional vector of length 5(M + N );
Convert the feature vector to the LIBSVM format (the
feature vector of the instance occupies a single row)
end
Construct a single file (named rad for Train and rad.t
for Test), which needs to contain feature vectors of all
instances and follow the LIBSVM format;
return rad and rad.t
IV. S UBMISSION AND G RADING I NSTRUCTIONS
(CSCI 498B)
A. Project Code (Due: Oct. 3, 23:59:59 to Blackboard)
B. Similar Representations
Implement at least a new skeleton-based representation by
choosing different joints other than the joints used in the
star skeleton. For example, you can change the reference
joint; or you can use other joints other than body extremities;
or you can compute distances of all joints but ignore the
orientation information, as the presentation of Histogram of
Joint Position Differences (HJPD) discussed in Section V-A.
C. Data to Gather
For experiments discussed in Sections III-A and III-B, you
need to provide sufficient details of your implementations
and experiments, including (but not limited to):
•
•
•
•
•
•
Which joints are used in the RAD representation;
How the histograms are computed, and how many bins
are used;
What are the “optimal” C and γ values of C-SVMs for
the RAD representation;
A graph of the grid search (from grid.py) of the experiments must be included;
Experimental results in terms of accuracy and confusion
matrix;
Analyze your experimental results, such as: what joints
are more representative and helpful for human representation? what are pros and cons of the representations?
Other types of analyses of your results along these lines
are encouraged.
Your software submission should include the following:
• The code that implements the skeleton-based representations;
• The final data files you obtained from your code, i.e.,
rad and rad.t, and files for other experiments, which
must contain feature vectors of all instances and follow
the LIBSVM format;
• Sufficient instructions, makefiles, etc., needed to compile and test your projects, including README files
with instructions
You are free to use any programming language you like,
but make sure to provide clear instructions of how to use
the code and run your project. In addition, tar all materials
into a single file named yourlastname-Project2-Code.tar (or
tar.gz or zip).
B. Project Paper (Due: Oct. 10, 23:59:59 to Blackboard)
As part of your project, you must prepare a single 1-3
page document (in pdf format, using LATEX and the 2-column
IEEE style as this write-up) that presents the results listed
in “Data to Gather” in Section V-D. The documentation you
turn in must be in pdf format and in a single file. Name your
paper yourlastname-Project2-Paper.pdf.
C. Grading
Your grade will be based on the quality of your project
implementation and the documentation of your findings in
the write-up:
•
•
75%: The quality of your final working code implementation, and software documentation. You should have a
“working” software implementation, which means that
the skeleton-based representations are implemented in
software, your code runs without crashing, and performs
the behavior understanding task.
25%: Your paper, prepared in LATEX using IEEE styling,
and submitted in pdf format – Figures and graphs should
be clear and readable, with axes labeled and captions
that describe what each figure or graph illustrates. The
content should include all the experimental results and
discussions mentioned previously in this document.
D. Data to Gather
For all representation implementations and experiments,
you need to provide sufficient details to evaluate the performance of your representation and to repeat the experimental
results. Please refer to Section V-D for examples.
VI. S UBMISSION AND G RADING I NSTRUCTIONS
(CSCI 598B)
A. Project Code (Due: Oct. 3, 23:59:59 to Blackboard)
Implement the skeleton-based human representation based
on Relative Angles and Distances (RAD) of star skeleton, as
discussed in Section III-A.
Your software submission should include the following:
• The code that implements three skeleton-based human
representations (RAD, HJPD, and HOD);
• The final data files you obtained from your code, i.e.,
rad and rad.t, and similar files for HJPD and HOD
experiments, which must contain feature vectors of all
instances and follow the LIBSVM format;
• Sufficient instructions, makefiles, etc., needed to compile and run your projects and to evaluate the performance, including README files with instructions
You can use any programming languages. However, C++
is recommended and make sure to provide clear instructions
of how to run your project. In addition, tar all materials into
a single file named yourlastname-Project2-Code.tar (or tar.gz
or zip).
B. Similar Representations
B. Project Paper (Due: Oct. 10, 23:59:59 to Blackboard)
For example, you can implement a similar representation,
such as the Histogram of Joint Position Differences (HJPD):
Given the location of a joint (x, y, z) and a reference joint
(xc , yc , zc ) in the world coordinate, the joint displacement is
defined as:
As part of your project, you are required to prepare a single
4-6 page document (in pdf format, using LATEXand 2-column
IEEE style as this write-up). The documentation you turn in
must be in pdf format and in a single file. Name your paper
yourlastname-Project2-Paper.pdf. Your paper must include
the following:
• An abstract of 200 to 300 words summarizing your
findings.
• A brief introduction describing the skeleton-based representation task and your implementations.
• A detailed description of your experiments, with enough
information that would enable someone to recreate your
experiments.
• An explanation of the results, including all the experimental details such as under “Data to Gather”, as well
as your additional extensions: use figures, graphs, and
tables where appropriate. Your results should make it
clear that the system has in fact worked. Accuracy and
confusion matrix must be presented and discussed.
• A discussion of the significance of the results.
The reader of your paper should be able to understand what
you have done and what your software does without looking
at the code itself.
V. F EATURES TO I MPLEMENT (CSCI 598B)
Students who are in CSCI498B must finish three tasks: (1)
implement the representation based on relative angles and
distances of star skeleton, (2) explore other similar human
representations, and (3) implement the representation based
on histogram of oriented displacements.
A. Relative Angles and Distances of Star Skeleton
(∆x, ∆y, ∆z) = (x, y, z) − (xc , yc , zc )
The reference joint can be the skeleton centroid or a fixed
joint. For each temporal sequence of human skeletons (in a
data instance), a histogram is computed for the displacement
along each dimension (e.g., ∆x, ∆y or ∆z). Then, the
computed histograms are concatenated into a single vector
as a feature.
This HJPD representation is similar to the RAD representation, except that it uses all joints and ignores the pairwise
angles. Refer to Section 3.3 of the reference [1] for more
details, which is available online at:
http://www.csse.uwa.edu.au/˜ajmal/
papers/hossein_WACV2014.pdf
C-SVMs with RBF kernels should be used as the learner.
C. Histogram of Oriented Displacements
Implement the skeleton-based representation of Histogram
of Oriented Displacements (HOD), as introduced in Section
3 of the paper [2], including Temporal Pyramid. The paper
is available online at:
http://ijcai.org/papers13/Papers/
IJCAI13-203.pdf
C-SVMs with RBF kernels should be used as the learner.
C. Grading
Your grade will be based on the quality of your project
implementation and the documentation of your findings in
the write-up. CSCI598B students will be graded more strictly
on the quality of the research and paper presentation. I expect
a more thorough analysis of your experimental results, and
a working implementation of the skeleton-based representation and behavior recognition system. Your analysis should
include further extensions selected from the above list, or
other ideas you may have. The paper should have the “look
and feel” of a technical conference paper, with logical flow,
good grammar, sound arguments, illustrative figures, etc.
Your grade will be based on:
• 75%: The quality of your final working code implementation, and software documentation. You should have a
“working” software implementation, which means that
the skeleton-based representations are implemented in
software, your code runs without crashing, and performs
the human behavior recognition task.
AT X using IEEE styling,
• 25%: Your paper, prepared in L
E
and submitted in pdf format – Figures and graphs should
be clear and readable, with axes labeled and captions
that describe what each figure and graph illustrates. The
content should include all the experimental results and
discussions mentioned previously in this document.
R EFERENCES
[1] H. Rahmani, A. Mahmood, A. Mian, and D. Huynh, “Real time action
recognition using histograms of depth gradients and random decision
forests,” in IEEE Winter Conference on Applications of Computer
Vision, 2014.
[2] M. Gowayyed, M. Torki, M. Hussein, and M. El-Saban, “Histogram of
oriented displacements (HOD): Describing trajectories of human joints
for action recognition,” in International Joint Conference on Artificial
Intelligence, 2013.
Download