CSCI473/573 Human-Centered Robotics Project 2: Skeleton-Based Representations for Human Behavior Understanding

advertisement
CSCI473/573 Human-Centered Robotics Project 2:
Skeleton-Based Representations for Human Behavior Understanding
Assigned: October 21
Multiple due dates (all materials must be submitted via Blackboard)
Code due: Monday, November 2, 23:59:59
Report due: Wednesday, November 4, 23:59:59
Deadlines will not be extended
In this project, you will implement several skeleton-based
representations and apply Support Vector Machines (SVMs)
to classify human behaviors using an activity dataset collected from a Kinect sensor.
I. S UPPORT V ECTOR M ACHINES
Let’s look at SVMs before we get into details of implementing skeleton-based representations. One objective of this
project is to understand how to apply SVMs as a tool in realworld applications.
In this project, we will use LIBSVM as our classifier (also
referred to as a learner), which is an excellent open source
implementation of SVMs and related methods. This library
provides software support for SVMs, with source code available for C++, and interfaces available to MATLAB, Octave,
Python, Perl, and many other programming languages.
Here’s the link to obtain the LIBSVM library:
http://www.csie.ntu.edu.tw/˜cjlin/
libsvm.
A. Installing LIBSVM
Follow the instructions on the LIBSVM website for downloading the software. The up-to-date release is Version 3.20.
Note that the README file within the package provides a lot
of helpful information for installing and using the LIBSVM
package. In addition, make sure you read through and have
a good understanding of the following Practical Guide by
Hsu, Chang, and Lin, especially the examples in the guide:
http://www.csie.ntu.edu.tw/˜cjlin/
papers/guide/guide.pdf.
B. Getting Familiar with LIBSVM
To understand LIBSVM, you are highly recommended to
work on and play with the examples in Appendix A within
the Practical Guide. The exemplary datasets (e.g., svmguide1
and others) are available online from the LIBSVM directory,
already formatted into the form expected by LIBSVM, here:
http://www.csie.ntu.edu.tw/˜cjlin/
libsvmtools/datasets/.
If you have any questions, please contact the instructor, Prof. Hao Zhang,
at hzhang@mines.edu.
This write-up is prepared using LATEX.
(a) Names
(b) Indices
Fig. 2: Skeleton joint names and indices from Kinect SDK.
C. Data Format of LIBSVM
The format of training and testing data files is:
<label> <index1>:<value1> <index2>:<value2> ...
.
.
.
Each line contains an instance and must be ended by a ‘\n’
character. For classification (as in this project), <label> is an
integer indicating the class label (multi-class is supported).
Indices must be in ASCENDING order. Labels in the testing
file are only used to calculate accuracy or errors. Please look
at the examples in the Practical Guide, such as svmguide1, to
make sure you understand the format. Otherwise, a wrongly
formatted data file will fail LIBSVM.
Please note that your skeleton-based representations must
be converted to this data format for the LIBSVM software
to process.
II. DATASET
The MSR Daily Activity 3D dataset1 will be used in this
projet, which is a most widely applied benchmark dataset in
human behavior understanding tasks. This dataset contains
16 human activities, as illustrated in Fig. 1, performed by 10
subjects. Each subject performs each activity twice, once in
1 http://research.microsoft.com/en-us/um/people/
zliu/actionrecorsrc/
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
Fig. 1: The MSR Daily Activity 3D dataset contains sixteen activities: (1) drink, (2) eat, (3) read book, (4) call cellphone,
(5) write on a paper, (6) use laptop, (7) use vacuum cleaner, (8) cheer up, (9) sit still, (10) toss paper, (11) play game, (12)
lie down on sofa, (13) walk, (14) play guitar, (15) stand up, (16) sit down.
standing position and once in sitting position. Although this
dataset contains color-depth and skeleton data, we will only
explore the skeletal information to construct skeleton-based
human representations in this project.
The skeleton in each frame contains 20 joints, as illustrated
by Fig. 2. The correspondence between joint names and joint
indices is also presented in the figure. For example, Joint #1
is HipCenter, Joint #18 is KneeRight, etc.
In this project, a pre-formatted skeleton data will be used,
which can be downloaded from the project website:
http://inside.mines.edu/˜hzhang/Courses/
CSCI473-573/Projects/Project-2/dataset.
tar.gz.
The dataset in the project is a subset of the original dataset,
which only contains six (6) activity categories:
•
•
•
•
•
•
CheerUp (a08)
TossPaper (a10)
LieOnSofa (a12)
Walking (a13)
StandUp (a15)
SitDown (a16)
It is noteworthy that the provided dataset is not formatted to
the LIBSVM form, but much easier to process then the data
format used in the original dataset.
In particular, the dataset contains two folders: Train and
Test. Data instances in the directory Train are used for
training (and validation). Data instances in Test are used
for evaluating the performance. Each instance has a filename
like: a12 s08 e02 skeleton proj.txt. This filename
means the data instance belongs to activity category 12 (i.e.,
a12, that is “lie down on sofa” as in Fig. 1), from human
subject 8 (s08) at his/her second trial (i.e., e02). Again, the
dataset contains 6 activity categories, 10 human subjects, and
2 trials each subject. Instances from subjects 1–6 are used for
training (in the directory Train), and data instances from
subjects 7–10 are used for testing (in Test).
When you open a data instance file, say a12 s08 e02 skeleton proj.txt, you will see something like:
1
1
0.326
-0.101
2.111
1
2
0.325
-0.058
2.152
1
3
0.319
0.194
2.166
.
.
.
Each row contains five values, representing:
1) frame id,
2) joint id,
3) joint position x,
4) joint position y,
5) joint position z.
These joint positions are in real-world 3D coordinates with
a unit of meters. Each frame contains 20 rows that contain
information of all joints in the frame. The skeleton data from
all frames of an instance are included in the same data file
(like a12 s08 e02 skeleton proj.txt).
III. F EATURES TO I MPLEMENT (CSCI 473)
Students in CSCI473 must complete two tasks:
1) Implement the human representation based on relative
angles and distances of star skeleton.
2) Explore other similar representations.
A. Relative Angles and Distances of Star Skeleton
Students in CSCI 473 are required to implement the human
representation based on the Relative Angles and Distances
(RAD) of star skeleton, as described by Algorithm 1. The
objective is to implement the RAD representation and use it
to convert all data instances in the folder Train into a single
Algorithm 1: RAD representation using star skeletons
Input : Training set Train and testing set Test
Output : rad and rad.t
1:
2:
3:
4:
5:
6:
7:
8:
Fig. 3: Illustration of human representation based on relative
distance and angles of star skeleton
9:
10:
training file rad, which can be processed by the LIBSVM
software. Similarly, all instances in the directory Test needs
to be converted into a single testing file rad.t for evaluation
purpose.
After obtaining the formatted training and testing files, i.e.,
rad and rad.t, you can use LIBSVM to train a C-SVM
model with Radial Basis Function (RBF) kernels. Then, you
can use the model to make predictions of the testing data in
rad.t. The predicted labels will be generated by LIBSVM
with a name rad.t.predict. Please refer to the examples
in the Practical Guide.
Then, you need to evaluate the performance of your RAD
representation by computing accuracy and confusion matrix.
These results must be included in your report.
B. Similar Representations
Implement at least a new skeleton-based representation by
choosing different joints other than the joints selected in the
star skeleton. For example, you can change reference joints,
use other joints other than body extremities, or compute distances of all joints but ignore the orientation information, as
the representation of Histogram of Joint Position Differences
(HJPD) discussed in Section V-A.
C. Data to Gather
For experiments discussed in Sections III-A and III-B, you
need to provide sufficient details of your implementation and
experiments, including (but not limited to):
• Which joints are used in the RAD representation;
• How the histograms are computed, and how many bins
are used;
• What are the “optimal” C and γ values of C-SVMs for
the RAD representation;
• A graph of the grid search (from grid.py) of the experiments must be included;
• Experimental results in terms of accuracy and confusion
matrix. The accuracy must be better than 50%;
• Analyze your experimental results, such as: what joints
are more representative and descriptive for human representation? what are the pros and cons of the represen-
11:
12:
13:
14:
for each instance in Train and Test do
for frame t = 1, ..., T do
Select joints that form a star skeleton (Fig. 3);
Compute and store distances between body
extremities and the body centroid joint (dt1 , ..., dt5 );
Compute and store angles between two adjacent body
extremities (θ1t , ..., θ5t );
end
Compute a histogram of N bins for each di = {dti }Tt=1 ,
i = 1, ..., 5;
Compute a histogram of M bins for each θ i = {θit }Tt=1 ,
i = 1, ..., 5;
Normalize the histograms by dividing T to compensate
for the different number of frames in different instances;
Concatenate all normalized histograms into a
one-dimensional vector of length 5(M + N );
Convert the feature vector to the LIBSVM format (the
feature vector of the instance occupies a single row)
end
Construct a single file (named rad for Train and rad.t
for Test), which needs to contain feature vectors of all
instances and follow the LIBSVM format;
return rad and rad.t
tations? Other analyses of the results along these lines
are encouraged.
IV. S UBMISSION AND G RADING I NSTRUCTIONS
(CSCI 473)
A. Project Code (Due: Nov. 2, 23:59:59 to Blackboard)
Your software submission should include the following:
•
•
•
•
The code that implements the skeleton-based representations;
The final data files you obtain from your representation
code, i.e., rad and rad.t, and data files for your other
representations, which must contain feature vectors of
all instances and follow the LIBSVM format;
Sufficient instructions to compile and test your projects,
especially a README file with detailed instructions;
Tar all materials into a single file named yourlastnameProject2-Code.tar (or tar.gz or zip)
You are required to implement the approaches using either
C++ or Python in Ubuntu (since it’s a CS course). Make sure
to provide clear instructions of how to use the code and run
your project.
B. Project Paper (Due: Nov. 4, 23:59:59 to Blackboard)
As part of your project, you are required prepare a single
1-3 page document (in pdf format, using LATEX and the 2column IEEE style as this write-up) that presents the results
listed in “Data to Gather” in Section V-D. The documentation
you turn in must be in pdf format and in a single file. Name
your paper yourlastname-Project2-Paper.pdf.
C. Grading
Your grade will be based upon the quality of your project
implementation and the documentation of your findings in
the write-up:
• 75%: The quality of your final working code implementation, and software documentation. You should have a
“working” software implementation, which means that
your skeleton-based representations are implemented in
software, your code runs without crashing, and performs
the behavior understanding task.
AT X using IEEE styling,
• 25%: Your paper, prepared in L
E
and submitted in pdf format – Figures and graphs should
be clear and readable, with axes labeled and captions
that describe what each figure or graph illustrates. The
content should include all the experimental results and
discussions mentioned previously in this document.
V. F EATURES TO I MPLEMENT (CSCI 573)
Students in CSCI573 must finish three tasks:
1) Implement the RAD representation based on star skeleton,
2) Implement the HJPD representation,
3) Implement the HOD representation.
D. Data to Gather
For all human representation implementations and experiments, you need to provide sufficient details to evaluate the
performance of your representations and to repeat the experimental results. Please refer to Section V-D for examples.
VI. S UBMISSION AND G RADING I NSTRUCTIONS
(CSCI 573)
A. Project Code (Due: Nov. 2, 23:59:59 to Blackboard)
Your software submission should include the following:
•
•
•
•
A. Relative Angles and Distances of Star Skeleton
Implement the skeleton-based human representation based
on Relative Angles and Distances (RAD) of star skeleton, as
discussed in Section III-A.
B. Additional Representations
Students in CSCI573 are required to implement the Histogram of Joint Position Differences (HJPD): Given the 3D
location of a joint (x, y, z) and a reference joint (xc , yc , zc )
in the world coordinate, the joint displacement is defined as:
(∆x, ∆y, ∆z) = (x, y, z) − (xc , yc , zc )
The reference joint can be the skeleton centroid or a fixed
joint. For each temporal sequence of human skeletons (in a
data instance), a histogram is computed for the displacement
along each dimension, including ∆x, ∆y or ∆z. Then, the
computed histograms are concatenated into a single vector
as a feature.
This HJPD representation is similar to the RAD representation, except that it uses all joints and ignores the pairwise
angles. Refer to Section 3.3 of reference [1] for more details,
which is available online at:
http://www.csse.uwa.edu.au/˜ajmal/
papers/hossein_WACV2014.pdf
C-SVMs with RBF kernels should be used as the learner.
C. Histogram of Oriented Displacements
Implement the skeleton-based representation of Histogram
of Oriented Displacements (HOD), as introduced in Section 3
of reference [2], including the Temporal Pyramid. The paper
is available online at:
http://ijcai.org/papers13/Papers/
IJCAI13-203.pdf
C-SVMs with RBF kernels should be used as the learner.
The code that implements three skeleton-based human
representations (RAD, HJPD, and HOD);
The final data files you obtained from your code, i.e.,
rad and rad.t, and similar files for HJPD and HOD
experiments, which must contain feature vectors of all
instances and follow the LIBSVM format;
Sufficient instructions needed to compile and run your
projects and to evaluate the performance. In particular,
you are required to include a README file that provides detailed instructions,
Tar all materials into a single file named yourlastnameProject2-Code.tar (or tar.gz or zip).
You are required to implement the approaches using either
C++ or Python in Ubuntu (since it’s a CS course). Make sure
to provide clear instructions of how to use the code and run
your project.
B. Project Paper (Due: Nov. 4, 23:59:59 to Blackboard)
As part of your project, you are required to prepare a single
4-6 page document (in pdf format, using LATEXand 2-column
IEEE style as this write-up). The documentation you turn in
must be in pdf format and in a single file. Name your paper
yourlastname-Project2-Paper.pdf. Your paper must include
the following:
•
•
•
•
•
An abstract containing 200 to 300 words to summarize
your findings.
A brief introduction describing the skeleton-based representation task and your implementations.
A detailed description of your experiments, with enough
information that would enable someone to recreate your
experiments.
An explanation of the results, including all the experimental details such as under “Data to Gather”, as well
as your additional extensions: include figures, graphs,
and tables where appropriate. Your results should make
it clear that the software has in fact worked. Accuracy
and confusion matrix must be presented and discussed.
In particular, the accuracy must be better than 50%.
A discussion of the significance of the results.
The reader of your paper should be able to understand what
you have done and what your software does without looking
at the code itself.
C. Grading
Your grade will be based upon the quality of your project
implementation and the documentation of your findings in
the write-up. CSCI573 students will be graded more strictly
on the quality of the implementation and paper presentation.
The instructor expects a more thorough investigation of the
experimental results, and a working implementation of your
skeleton-based representations for human behavior recognition. Your analysis should include further extensions selected
from the above list, or other ideas you may have. The paper
should have the “look and feel” of a technical conference
paper, with logical flow, good grammar, sound arguments,
illustrative figures, etc.
Your grade will be based on:
• 75%: The quality of your final working code implementation, and software documentation. You should have a
“working” software implementation, which means that
the skeleton-based representations are implemented in
software, your code runs without crashing, and performs
the human behavior recognition task.
AT X using IEEE styling,
• 25%: Your paper, prepared in L
E
and submitted in pdf format – Figures and graphs should
be clear and readable, with axes labeled and captions
that describe what each figure and graph illustrates. The
content should include all the experimental results and
discussions mentioned previously in this document.
R EFERENCES
[1] H. Rahmani, A. Mahmood, A. Mian, and D. Huynh, “Real time action
recognition using histograms of depth gradients and random decision
forests,” in IEEE Winter Conference on Applications of Computer
Vision, 2014.
[2] M. Gowayyed, M. Torki, M. Hussein, and M. El-Saban, “Histogram of
oriented displacements (HOD): Describing trajectories of human joints
for action recognition,” in International Joint Conference on Artificial
Intelligence, 2013.
Download