Uploaded by Raymond cyiza B


Human Motion Prediction
Raymond B. Cyiza
Dept. of Oslo Metropolitan University
ACIT: Robotics and Control
This article reviews the research paper work from Philipp Kratzer,Marck Toussaint and Jim
Mainprice. Their work revolves around predicting human movement with the help of a robot
based on approached methods used in deep learning, such as recurrent neural networks. With
their work, they aim to create an environment where robots and human agents can work together
safely and effectively by avoiding unnecessary accidents.
Human movement is very difficult to mimic with a machine, and over the years, more
and more scientists have gathered around to work together on solving the problem of creating
systems that can execute a number of tasks with the same fluidity and smoothness in movement
as humans have. This means when operating, a robot arm in motion would have to continuously
self adjust the speed and the angle from where it is rotating at, just like a human arm would. To
solve this task, Philipp Kratzer, Marck Toussaint and Jim Mainprice developed a method to
predict human movement by using model systems such as recurrent neural networks while also
taking in account obstacles around the model when being tested. A recurrent network system is
data driven model that operates with a certain amount of input data over a period and works
around on predicting the output based on all previous input data given. For example, if we say
that the human arms moves his right leg up, down and then left, and we wish a machine to
predict the humans last movement, which would be left in our case, the robot will give us an
output based on the two inputs we put in, but we could never be sure that the output we want
would come true, but instead we would simply have a random output. When given more precise
information or just a lot of data representing the behaviors observed, we can with more certainty
be sure that the output would come out just as we wished it to predict. In Philips, Marcks and
Jims work, they state that their framework can be utilized to lay out robot trajectories that are
optimized to work with human agents.
The work introduced in this paper revolves around first observing the human trajectories
and then predicting the kinematic movement based on the previous movements observed as
inputs. This operation is executed with the help of a recurrent neural network(RNN) model on
human motion capture data. To begin with they use the learned model that has data of the
observed human trajectories, which is used as inputs into the deep network system(RNN) to
predict the humans future states. In order to create the robot human model that reads and predicts
the trajectories, dimensional vectors based on the base position, base rotation and joint angles
were defined. A simple example on how this could be done on a stick figure is shown in figure 1.
Where theta 1 corresponds to the angle between the body and the sticks left shoulder, that means
whether we desire to increase or decrease that angle, the arm would rotate up and down,
considering the stick is always in the 2D. And so we define where the rest of the angles should
be set for as long as the part can rotate in our case. The human model in the research paper is
represented by 66 dimensional vectors, that corresponds for the whole human body, and as
mentioned they include the base position, base rotation and joint angles, and are defined like this,
s = (p;r;j).
Figure 1
To further specify the model, additional constraints are taken in account regarding the
trajectory optimization. First, the desired difference between the human prediction and the neural
network prediction should be small. Second, the human hand of the human prediction should end
at a specified position p when predicted. Third, the collision between the human prediction and
obstacles should not happen. Fourth, the human prediction should also not collide with the robot
model. Fifth, smoothness is desired for the robot's trajectory. Sixth, the end effector of the
robot’s trajectory should end at a specified position p when predicted. Lastly, the robot's
trajectory should not collide with the obstacles.
The figure 2 below shows results of the human in white and the human prediction in
green, both reaching for the object on the table starting from 0 s to 2 s. Beginning at left picture,
which is when the prediction is after 1 s, the 1.3 s, 1.6s, and lastly after 2 s. Note that the
obstacles are not considered in this test, since the RNN model used at this point has no
implementation of the obstacles given.
Figure 2
In the second test, figure 3, obstacles are implemented in the RNN with a specified
function and now both tests show that the human prediction arrives at reaching the object to be
picked up, the difference in the tests is that the human prediction walks through the chair,
because they obstacles are not registered in the RNN system.
Figure 3
After successfully predicting the humans movement, they can now test the robot and
human prediction in the same environment. Regarding the robot's trajectory planning, it is
desired for the robot to plan its own trajectory while also reading the prediction from the human
movement, this is so that no collision between human or robot shall occur. To solve this, they
implement joint optimization in the trajectory. This means that in order for the robot to not
collide with the human, the robot will speed up a bit when it realizes that the trajectory paths
between the human and the robot leads to a collision, if they leave the starting point at the same
time. So when the robot reaches a certain point before the robot and human prediction trajectory
collide, with the implementation of joint optimization, the robot increases speed at that point to
avoid collision simply based on the trajectory readings. Figure 4 below shows experiment of
when the joint optimization isn’t take in.
Figure 4
Figure 5 below shows when joint optimization is involved.
Figure 5
Prior work related to this paper's topic revolves around the ideas of Human Motion
Prediction in Robotics, Neural Network Human Motion Prediction and Motion Optimization.
Regarding the Human Motion Prediction in Robotics, methods like full-body motion primitives
from Kuli’c et al. [1] and movement prediction from Koppula and Saxena [2], are limited when it
comes to large amounts of data of motion capture. Inverse Optimal Control(IOC) is also a
method used for human prediction motion, but it can’t fully capture complex biomechanical
movements of the human body to accurately predict the observed movement. More popular are
the recurrent neural network systems, which are also used around human motion prediction for
short term motion. This approach however, gives good results when predicting a short-term
motion, but only works around human specified data only. Meaning the environment
surroundings such as obstacles are not involved in the prediction. For this to happen, a large
amount of data is required, but as said, the RNN cannot handle such data yet. The last approach
involves gradient-based optimization algorithms, these are also popular when optimizing
trajectories. Mordatch et al. use motion optimization techniques to synthesize complex
Ever since the introduction of intel processors chips in computers in 1971, the transistor
count has increased exponentially, especially around the 2000s, this means also that computing
power in computers have increased. Another factor that has also increased during the years is the
vast amount of data collected from technology´s use.. This has allowed different methods like
motion prediction in deep learning to gain popularity over the years. Such systems today can be
used to, like shown in from the experiment, predict states of the future, from the inputs given, the
results from the data can either vary from good to bad based on how much input “learning data”
the neural network has collected. The method used for motion prediction systems can also be
used for prediction systems, such as to mimic how humans are able to tell for example what sign
it says when faced with CAPTCHA checkers, in facebook logins for example when wrong
passwords are typed in, humans can easily observe and know what is says while for a robot, this
could take a bit of experience, but jokes aside, a machine would have to acquire enough input
data from inside the prediction system used. Humans on the other hand have outstanding speech
and text recognition that require way lesser training compared to the robot. Scientists are
working hard in training machines into learning vast amounts of data to execute a variety or
different tasks,and I believe with this that it is a question of time before machines start doing
really complex tasks better than humans.
The world is steadily moving fast towards a future where most of our physical work will
be replaced by robots, for this to happen we have to take enough safety measures where we can
fully trust that all automated robots will execute the task given will minimum to zero errors.
Motion prediction is one of the tools used to reach this goal and as explained in the background
section, it can be used to plan a robot trajectory that doesn’t collide with a human trajectory
when working. This technology when perfected could also be included in future AI cars, Future
AI “self” driving cars would be able to predict a human´s movement in case of potential car
accidents and stop, or steer away. Other examples that can be linked to RNN are the ability to
predict a range of diseases by analyzing for example eyeball redness[6] based on given inputs,
another example can be denoising, which involves the cleaning up of an image only based on
input picture given without access to the original image, figure 6 shown below.
Figure 6
For future work it is desired for their work to succeed in real world scenarios, where
humans and machines interact with each other on different working planes. They also plan to
further improve the trajectory optimization so that even if it comes to complex interaction
scenarios, machines and humans will still manage to perform their tasks without zero accidents
Motion prediction can be used in different cases to predict an outcome based only on the
inputs given. In this case, the prediction method used is the recurrent neural network system
which is a concept used in deep learning. Deep learning is one type of machine learning system
inspired by the brain, a way to extract useful data in an automated way with as little effort from
humans as possible and allows us to find patterns in relatively limited data by making
generalizations. This method is still under development and shows no signs of stopping in terms
of the technology being published regarding it, this is noticed fairly when observing the amount
of data being created in today's technology and the increase in power in all computers based on
the transistor count.[5]
Other Sources than paper:
[1] ​D. Kuli´c, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, “Incremental
learning of full body motion primitives and their sequencing
through human motion observation,” International Journal Of Robotic
Research, vol. 31, no. 3, pp. 330–345, 2012.
[2]​H. S. Koppula and A. Saxena, “Anticipating human activities using
object affordances for reactive robotic response,” IEEE transactions
on pattern analysis and machine intelligence, vol. 38, no. 1, pp. 14–
29, 2016.
[3]​I. Mordatch, E. Todorov, and Z. Popovi´c, “Discovery of complex
behaviors through contact-invariant optimization,” ACM Transactions
on Graphics (TOG), vol. 31, no. 4, pp. 1–8, 2012.
[4].​I. Mordatch, J. M. Wang, E. Todorov, and V. Koltun, “Animating
human lower limbs using contact-invariant optimization,” ACM Transactions
on Graphics (TOG), vol. 32, no. 6, pp. 1–8, 2013.