Human Motion Prediction Raymond B. Cyiza Dept. of Oslo Metropolitan University ACIT: Robotics and Control s326055@oslomet.no Abstract This article reviews the research paper work from Philipp Kratzer,Marck Toussaint and Jim Mainprice. Their work revolves around predicting human movement with the help of a robot based on approached methods used in deep learning, such as recurrent neural networks. With their work, they aim to create an environment where robots and human agents can work together safely and effectively by avoiding unnecessary accidents. Contents INTRODUCTION 1 BACKGROUND 2 PREVIOUS WORK 4 IMPORTANCE OF DEEP LEARNING 4 APPLICATIONS 5 FUTURE WORK 6 DISCUSSION 6 INTRODUCTION Human movement is very difficult to mimic with a machine, and over the years, more and more scientists have gathered around to work together on solving the problem of creating systems that can execute a number of tasks with the same fluidity and smoothness in movement as humans have. This means when operating, a robot arm in motion would have to continuously self adjust the speed and the angle from where it is rotating at, just like a human arm would. To solve this task, Philipp Kratzer, Marck Toussaint and Jim Mainprice developed a method to predict human movement by using model systems such as recurrent neural networks while also 1 taking in account obstacles around the model when being tested. A recurrent network system is data driven model that operates with a certain amount of input data over a period and works around on predicting the output based on all previous input data given. For example, if we say that the human arms moves his right leg up, down and then left, and we wish a machine to predict the humans last movement, which would be left in our case, the robot will give us an output based on the two inputs we put in, but we could never be sure that the output we want would come true, but instead we would simply have a random output. When given more precise information or just a lot of data representing the behaviors observed, we can with more certainty be sure that the output would come out just as we wished it to predict. In Philips, Marcks and Jims work, they state that their framework can be utilized to lay out robot trajectories that are optimized to work with human agents. BACKGROUND The work introduced in this paper revolves around first observing the human trajectories and then predicting the kinematic movement based on the previous movements observed as inputs. This operation is executed with the help of a recurrent neural network(RNN) model on human motion capture data. To begin with they use the learned model that has data of the observed human trajectories, which is used as inputs into the deep network system(RNN) to predict the humans future states. In order to create the robot human model that reads and predicts the trajectories, dimensional vectors based on the base position, base rotation and joint angles were defined. A simple example on how this could be done on a stick figure is shown in figure 1. Where theta 1 corresponds to the angle between the body and the sticks left shoulder, that means whether we desire to increase or decrease that angle, the arm would rotate up and down, considering the stick is always in the 2D. And so we define where the rest of the angles should be set for as long as the part can rotate in our case. The human model in the research paper is represented by 66 dimensional vectors, that corresponds for the whole human body, and as mentioned they include the base position, base rotation and joint angles, and are defined like this, s = (p;r;j). 2 Figure 1 To further specify the model, additional constraints are taken in account regarding the trajectory optimization. First, the desired difference between the human prediction and the neural network prediction should be small. Second, the human hand of the human prediction should end at a specified position p when predicted. Third, the collision between the human prediction and obstacles should not happen. Fourth, the human prediction should also not collide with the robot model. Fifth, smoothness is desired for the robot's trajectory. Sixth, the end effector of the robot’s trajectory should end at a specified position p when predicted. Lastly, the robot's trajectory should not collide with the obstacles. The figure 2 below shows results of the human in white and the human prediction in green, both reaching for the object on the table starting from 0 s to 2 s. Beginning at left picture, which is when the prediction is after 1 s, the 1.3 s, 1.6s, and lastly after 2 s. Note that the obstacles are not considered in this test, since the RNN model used at this point has no implementation of the obstacles given. Figure 2 In the second test, figure 3, obstacles are implemented in the RNN with a specified function and now both tests show that the human prediction arrives at reaching the object to be picked up, the difference in the tests is that the human prediction walks through the chair, because they obstacles are not registered in the RNN system. 3 Figure 3 After successfully predicting the humans movement, they can now test the robot and human prediction in the same environment. Regarding the robot's trajectory planning, it is desired for the robot to plan its own trajectory while also reading the prediction from the human movement, this is so that no collision between human or robot shall occur. To solve this, they implement joint optimization in the trajectory. This means that in order for the robot to not collide with the human, the robot will speed up a bit when it realizes that the trajectory paths between the human and the robot leads to a collision, if they leave the starting point at the same time. So when the robot reaches a certain point before the robot and human prediction trajectory collide, with the implementation of joint optimization, the robot increases speed at that point to avoid collision simply based on the trajectory readings. Figure 4 below shows experiment of when the joint optimization isn’t take in. Figure 4 Figure 5 below shows when joint optimization is involved. Figure 5 PREVIOUS WORK Prior work related to this paper's topic revolves around the ideas of Human Motion Prediction in Robotics, Neural Network Human Motion Prediction and Motion Optimization. Regarding the Human Motion Prediction in Robotics, methods like full-body motion primitives 4 from Kuli’c et al. [1] and movement prediction from Koppula and Saxena [2], are limited when it comes to large amounts of data of motion capture. Inverse Optimal Control(IOC) is also a method used for human prediction motion, but it can’t fully capture complex biomechanical movements of the human body to accurately predict the observed movement. More popular are the recurrent neural network systems, which are also used around human motion prediction for short term motion. This approach however, gives good results when predicting a short-term motion, but only works around human specified data only. Meaning the environment surroundings such as obstacles are not involved in the prediction. For this to happen, a large amount of data is required, but as said, the RNN cannot handle such data yet. The last approach involves gradient-based optimization algorithms, these are also popular when optimizing trajectories. Mordatch et al. use motion optimization techniques to synthesize complex behaviors[3],[4]. IMPORTANCE OF DEEP LEARNING Ever since the introduction of intel processors chips in computers in 1971, the transistor count has increased exponentially, especially around the 2000s, this means also that computing power in computers have increased. Another factor that has also increased during the years is the vast amount of data collected from technology´s use.. This has allowed different methods like motion prediction in deep learning to gain popularity over the years. Such systems today can be used to, like shown in from the experiment, predict states of the future, from the inputs given, the results from the data can either vary from good to bad based on how much input “learning data” the neural network has collected. The method used for motion prediction systems can also be used for prediction systems, such as to mimic how humans are able to tell for example what sign it says when faced with CAPTCHA checkers, in facebook logins for example when wrong passwords are typed in, humans can easily observe and know what is says while for a robot, this could take a bit of experience, but jokes aside, a machine would have to acquire enough input data from inside the prediction system used. Humans on the other hand have outstanding speech and text recognition that require way lesser training compared to the robot. Scientists are working hard in training machines into learning vast amounts of data to execute a variety or different tasks,and I believe with this that it is a question of time before machines start doing really complex tasks better than humans. APPLICATIONS The world is steadily moving fast towards a future where most of our physical work will be replaced by robots, for this to happen we have to take enough safety measures where we can 5 fully trust that all automated robots will execute the task given will minimum to zero errors. Motion prediction is one of the tools used to reach this goal and as explained in the background section, it can be used to plan a robot trajectory that doesn’t collide with a human trajectory when working. This technology when perfected could also be included in future AI cars, Future AI “self” driving cars would be able to predict a human´s movement in case of potential car accidents and stop, or steer away. Other examples that can be linked to RNN are the ability to predict a range of diseases by analyzing for example eyeball redness[6] based on given inputs, another example can be denoising, which involves the cleaning up of an image only based on input picture given without access to the original image, figure 6 shown below. Figure 6 FUTURE WORK For future work it is desired for their work to succeed in real world scenarios, where humans and machines interact with each other on different working planes. They also plan to further improve the trajectory optimization so that even if it comes to complex interaction scenarios, machines and humans will still manage to perform their tasks without zero accidents expected. DISCUSSION Motion prediction can be used in different cases to predict an outcome based only on the inputs given. In this case, the prediction method used is the recurrent neural network system 6 which is a concept used in deep learning. Deep learning is one type of machine learning system inspired by the brain, a way to extract useful data in an automated way with as little effort from humans as possible and allows us to find patterns in relatively limited data by making generalizations. This method is still under development and shows no signs of stopping in terms of the technology being published regarding it, this is noticed fairly when observing the amount of data being created in today's technology and the increase in power in all computers based on the transistor count.[5] Other Sources than paper: [1] D. Kuli´c, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, “Incremental learning of full body motion primitives and their sequencing through human motion observation,” International Journal Of Robotic Research, vol. 31, no. 3, pp. 330–345, 2012. [2]H. S. Koppula and A. Saxena, “Anticipating human activities using object affordances for reactive robotic response,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 1, pp. 14– 29, 2016. [3]I. Mordatch, E. Todorov, and Z. Popovi´c, “Discovery of complex behaviors through contact-invariant optimization,” ACM Transactions on Graphics (TOG), vol. 31, no. 4, pp. 1–8, 2012. [4].I. Mordatch, J. M. Wang, E. Todorov, and V. Koltun, “Animating human lower limbs using contact-invariant optimization,” ACM Transactions on Graphics (TOG), vol. 32, no. 6, pp. 1–8, 2013. [5]https://en.wikipedia.org/wiki/Transistor_count [6]https://www.theverge.com/2018/8/13/17670156/deepmind-ai-eye-disease-doctor-moorfields 7