Dear Editor, We have revised our paper based on the comments and suggestions made by the reviewers. In particular, we have improved the manuscript throughout regarding language and presentation style, and we have also incorporated additional references as suggested by the reviewers. Further, we have modified and revised the text of our manuscript considering all the specific comments that were made. The clarity of our presentation should now be much better, thanks to the extremely helpful and valuable comments. Below, we provide our detailed responses to all recommendations and requests made by the reviewers. Yours truly, Yuanfeng Zhu Response to Reviewer 1: Comments to the Author Nice work. There are some nice ideas presented here. Ultimately, I believe this requires a fair number improvements to the manuscript in order to be ready for publication: - The grammar is very difficult to read, a good thorough edit is a must. The paper has been extensively rewritten to improve grammar and clarity. - The section numbers seem to have fallen off. The section numbers are added now. - Motion capture is introduced as a difficult problem on page 3, in the introduction, but we don't learn what it is used for until page 19. The challenge of using motion capture is addressed in the 2nd paragraph of the introduction. - Are there other motion capture techniques to consider that are non-intrusive? Including image-based one (such as painting lines directly on the users fingers and using cameras? Alternatively, would Kinect-type technology work here?) Motion capture may be possible for straight playing, but will not deal well with situations like the finger crossover where parts of the hand are occluded. One of our goals is to be able to play any arbitrary piece of input music. This would require significant adaptation of any captured movement to fit the new music. In this work, we instead explore a generative approach, using some limited motion capture data to improve realism. - The related work section is severely lacking. There are many other references that are relevant to this work. To pick one, for example, the use of a Trellis graph for optimizing a motion path problem is not new, though no referenced mentions of it were made. 6 related papers have been added to the related work section. 5 papers are now mentioned that use a Trellis graph for fingering generation, in the end of the first paragraph in the related work section. - The wrist rotations initial values are computed using pre-computed weights (Page 15, line 10), how are these weights pre-computed? The computation of the weights is discussed in the 2nd paragraph in Section 4.1.3. - The hand model is not sufficiently described, it would be beneficial to point out the 21 joints and using clear notation for how the rotation values for the 21 joints are articulated (or which specifically are articulated by the system and which are not). The detailed description of hand model used in our system is added into the 2nd paragraph in Section 4 Finger and Hand Pose Calculation for Chords. - The hand in the video looks very uncomfortable and not human like. It looks like some proportion compromises had to be made in order to facilitate a solution that will hit the target. We have tried to improve the realism of our system. Our hand model is based on the real hand of a piano player, so that we can evaluate and regulate the hand parameters more effectively and efficiently. The same hand model is used to generate all the demos to demonstrate a comprehensive normal piano performance, including playing of scales, chords and a music piece, and we also have generated the video comparison between real playing and the generated animation in the demos called C Major skill and Chord. Maybe the view points, the mesh skinned to the hand skeleton or some unnatural hand poses influence the effect of the hand appearance during playing. Also, in future, one of the important improvements is to generate standard piano playing for various sizes of hand models. - It's very nice that you tackled the crossover of the thumb problem. Thanks for your appreciation of this point. - It sounds like the motion capture data is only used to improve the quality of the animation in the wrist's translational motion. If so, why not use the motion capture data for more realistic rotational components as well (in both the wrist and the finger)... this has been done before as well. The motion capture data is in fact widely used to extract parameters to drive the hand motion both translation and rotation component in our system, but the data are usually not directly used (some content on how to use motion data is not included in the paper because it might not belong to the contribution of this paper), and we take some examples for better explanation as follows: The motion capture data is directly used to improve the wrist’s translational motion, as Section 6.1 Wrist motion between chords discusses. Also, the motion capture data is indirectly used to evaluate the weights to generate hand poses by: . Vicon camera matrix is used to capture the wrist rotation for key poses of different finger crossovers, chords and arpeggios by: attaching markers to wrist and middle finger base, attaching makers also to piano keyboard surface to setup local coordinate system, and then evaluate wrist rotation max angel in piano local coordinate system for various key poses. Vicon camera matrix is also used to capture the rotation from each finger base to fingertip by the above similar way, in order to evaluate the influence from how the finger rotation along vertical axis influences the wrist rotation. This is talked with more detail in “Section 4.1.3 Initiate wrist orientation”. Therefore, the captured data is also used to improve the reality of rotation components but are not directly used to drive the animation. The reason not to directly use the captured data is because we want to generate hand motion for various music with various speeds (which is generally achieved for this paper) and for various-size hand models (which is considered as one important future work), so it is impossible to generate piano animation driven only by captured data. - In the video the wrist translation looks linear and unnatural when large gaps are crossed. This problem has been improved in the new demo by using spline interpolation on the wrist translation component of the key hand poses. - Very good that you tackled the non-instructed finger interdependence problem. However, it sounds like you only consider the Y translation component. The fingers are interdependent in rotation as well, among fingers and within the joints of a finger itself. Yes. It would be meaningful to improve the animation quality of finger motion by evaluation of how the music volume and speed influences the interdependent rotation between fingers and within the joints of an instructed/non-instructed finger. We consider this future work and now discuss it in the last paragraph in the conclusion. - Though its aim is to be an instructional tool, no mention was made as to its effectiveness in such a role (e.g., as determined through experiment, and analysis by piano teachers). It would be great to show it to a piano teacher and ask how well this virtual student did... those results would be interesting. While one of the motivations for this work is to employ it in piano tutoring, the main focus of the current paper is on animation generation. We have tried to adjust the positioning appropriately. Evaluating the system as a tutoring tool is future work. We do seek to show the work to a piano teacher and agree that this will provide interesting feedback. Overall, a good start at an interesting project. Though, I think, much is needed in terms of polish and execution. Thank you so much for so many valuable comments, which help a lot to improve our work. Response to Reviewer 2 Comments to the Author The paper is interesting and technically sound. There are many problems with the English, both spelling and grammar, and these must be corrected in order to make the paper acceptable for journal publication. Thanks a lot for your appreciation of our work! The paper has been extensively rewritten to improve grammar and clarity. The paragraph on page 12, lines 16-36, has inconsistent directions. It states that -Z is "to the left" but then says "-0.5 (move it to the right)". This was an error, and the words "-0.5 (move it to the right)" have been replaced with "0.5 (move it to the right)". Various costs are defined. For example, Cost(a,b,d) on page 9 and graphed in Figure 3. How was the data obtained for this graph? It is uneven, which suggests some kind of experimental measurement rather than a formula. Similarly, it would be useful to know how other costs were estimated. The explanation of how the costs are obtained has been added into Section 3.1.2 and Section 3.1.3. Also, for better understanding the various costs, we have rewritten Section 3.1.2 and 3.1.3. The discussion focuses mainly on melodies and chords. Does the program work in other situations? For example, a pianist often has to play both melody and (partial) accompaniment with the right hand. This is a good suggestion, because this is an often-used advanced performance, but we have not considered yet. We have added this limitation as the future work, in the fifth point in the conclusion and future work section. Response to Reviewer 3 Comments to the Author - a bit more elaboration on previous / related work would help in better understanding the explanations that follow in the paper In order to enrich the research background, 6 additional papers are now added and discussed in the paper. - including some figures / images would be helpful to visualize what is happening All of the figures have been largely re-generated, and 3 new figures (Figure 6, 10 and 11) have been added to help understand important processes. - a few instances of minor grammatical errors (i.e: 'relax' instead of 'relaxed') These errors are corrected now, and the paper has been rewritten for better clarity. - Overall: very interesting application of ideas, intriguing blend of computational techniques, music and virtual animation Thank you very much for your feedback! We hope this research can help students a little to better self-learn how to play the piano in the near future.