>> Zhengyou Zhang: Okay. So let's get started. ... pleasure to introduce Professor Minoru Asada. I know Minoru...

advertisement
>> Zhengyou Zhang: Okay. So let's get started. I'm Zhengyou Zhang. It's really my big
pleasure to introduce Professor Minoru Asada. I know Minoru for many, many years, probably
25 years, a long time ago. He used to work in computer vision, but in the early '90s he moved on
to work on robotics and became a very influential figure in the robotics community.
I don't know how many people know the RoboCup. RoboCup is a robot that play soccer. He
was the president of the RoboCup Federation. And his research I think from very early on is
very biologically inspired, tried to develop robot skills using principles learned from animals
including human.
And he is associate editor of the IEEE Transactions on Autonomous Mental Development. So
I'm editor-in-chief of the journal. So that journal is really multidisciplinary, try to study animals,
how they develop skills by interacting with the world, and then apply those learnings to robotics
to develop additional skills by interacting with the real world.
And today he will talk about how to enable robot to speak.
>> Minoru Asada: Not speak ->> Zhengyou Zhang: [inaudible] I guess first step, first step to enable robot to speak. Minoru,
please.
>> Minoru Asada: Thanks. Okay. Thanks, Zhengyou. Okay. So today's title of my talk is
Development of Speech Perception and Vocalization of Infants and Robots. And Zhengyou
mentioned the speak, but not yet, just vocalize, so the early, early development of the speech
vocalizations.
Okay. So as Zhengyou introduce myself, I'm the professor of the Graduate School of
Engineering at Osaka University, and also I'm the former president of the RoboCup Federation
2002-2008 and the research director of the JST ERATO Project until it was over.
And also was -- while the unique point is I'm the vice president of the Japanese Society of the
Baby Science. So many people ask me why you baby science society? My answer is very clear,
okay, so I like -- we like to make robots much more intelligence. But what's the intelligence?
That's very big question.
So we look at the baby, how the baby grown up, develop the capabilities, they learn some things.
That's a big, big mystery. And we like to -- because, you know, the big question is still we do
not know the exact design policy that make the robot much more intelligent because we do not
know how does the baby develop their intelligence and so on.
Okay. So on the other hand, we suppose that by using the robots we may approach to this big
mystery by using the robots or something.
And also we new start the new project. Okay. Anyhow, that's what I'm saying.
>>: [inaudible] money.
>> Minoru Asada: Yeah.
>>: [inaudible] U.S. dollars for you [inaudible] doesn't know how to ->> Minoru Asada: Yeah. In this project I get almost 4 million U.S. dollars a year, the five-years
project, but four or five groups. This one is just only myself, my group. It's 1 million U.S.
dollars a year and five-years project started.
Okay. So I skipped this one. And this is actually not the latest issue. Sorry. This is the last
year's. That's why there was some -- anyhow, this is -- our robots, the baby robots. And the
ACM focussed on our robots, and they ask me just to send the picture. And we didn't know how
they utilize our robots, but this is a typical American way [inaudible] robots, many, many.
There is two -- these robots I will just call Affett. It's Italian name. But, anyhow, just two Affett
is different from the rest of them. One is here, then the other one I forgot.
>>: [inaudible].
>> Minoru Asada: Oh, yeah, [inaudible]. Right. This guy. Okay.
And this special issues they focus on the [inaudible] challenge of the -- ahead for the bio-inspired
soft robotics and so on. Actually this robot is very, very [inaudible] and very, very, you know -a very realistic baby face and so on.
I will mention that why we develop these kind of robots, but at the beginning I'd like to start on
the RoboCup.
[video playing: [inaudible] four robot sizes, small size, middle size, standard platform, and
humanoid. In the small size league, one of the challenges is building the robot hardware within
strict size constraints and detecting and tracking the robots with an overhead camera in real time
based on colored markings on the robots.
The global view of the complete field is passed to an off-board computer which aims at
generating sophisticated teamwork strategies in the form of efficient positioning and passing.
Remote control commands are then sent wirelessly to the robots with no human involvement.
Complex action sequences emerge in real time in response to the behaviors of different
opponents, such as this four-way pass.
Today robots in the small size league move very quickly and deal with high ball velocities. They
precisely manipulate the ball in three dimensions on fields and in teams of increasing size.
In the middle size league, robots of larger size are constructed by the teams with onboard
computation, actuation, and sensing. In this league the robots manipulate a regulation size
soccer ball on a larger field. Like in all the leagues, the robots are fully autonomous.
As the league evolved, balls on the field were removed, colored markings around the field were
removed, and omnidirectional cameras were banned. Distributed state assessment was
accomplished via inter-robot communication and localization relies on accurate motion models.
As in all the leagues, the teams devise efficient, robust multi-robot path planning algorithms with
vision-based obstacle avoidance.
>>: So you have a similar type of robot -[video playing: The level of play has increased to the point that in 2007 ->> Minoru Asada: Yeah, we have some regulation of the size, the weight, the height and so on
[inaudible].
[video playing: [inaudible] four-legged robots available to researchers offered the opportunity
for teams to focus on the locomotion, vision, localization, and teamwork algorithms using a
standard hardware platform.
One technical challenge in this league is defining fine motor skills that precisely manipulate an
object given uncertain sensor information with articulated bodies. Frame-based motion
sequences enable precise and effective control.
Teams develop localization algorithms capable of addressing the challenges of sensitive cameras
with limited fields of view, ambiguous landmarks, occasional repositioning of the robots and the
need to actively decide whether to focus on the ball or on landmarks.
As AIBOs became unavailable, the league transitioned to the NAO robots from Aldebaran.
Biped locomotion and upright kicking became a primary technical focus still on the standard
robot platform.
RoboCup has had a long-standing commitment to fostering the development of humanoid robots.
At first, simply standing and kicking was an ambitious challenge.
Today teams of small humanoids play in fast-paced games, and researchers continue to advance
the state of the art in team and adult-size robots where mechanical and power challenges abound.
Overall in RoboCup, thousands of researchers from more that 40 countries are engaged in
developing teams of autonomous robots towards RoboCup's specific long-term challenge of
beating the human World Cup champion by the year 2050.
>> Minoru Asada: Yeah. This give kind of exhibition [inaudible] like much between the robots
and the human. I'm guessing it was the robot team lost the game, but still the -- it's very nice
one. Okay.
So we're guessing I can spend more than maybe two hours, something, right, to explain the
RoboCup and the activity and so on. But today [inaudible] the title [inaudible].
So, anyhow, this is the grand challenge, to build the team of 11 humanoids that can get a win
against a FIFA World-Cup champion team by 2050. So only 37 years left.
And the great success of the RoboCup is mobile robotics, the humanoid robotics, the machine
learning applications, and the education. Especially RoboCupJunior. Today I have shown just
only the RoboCup, the senior league. But actually half of the participants is the RoboCupJunior
from elementary school to the high school students, the competition of the -- not only [inaudible]
but also the [inaudible] robot and so on. So to many people [inaudible] some of our activity
from the viewpoint of the educations.
But how about the artificial intelligence, not as academic society, but in general. Because
actually in the world -- in the video clips, so the many teams designed the behavior of the robots
by the designer's inspirations or more correctly the researcher write down -- wrote down the
exact behavior how to do. So the intelligence is on the designer's side, not the robot's side.
That's kind of the challenge -- the problem. So the explicit specification of the behavior by the
designer, the limited capabilities and the little rooms for the learning and the development.
Then much more fundamental research issues should be, you know, attacked. So how humans
and the humanoids develop their cognitive functions to adapt themselves with complex and
dynamic environments. That's my question now.
So this is outline of my talk today. So first I'd like to explain how do humans and the humanoids
develop. And second I'd like to explain the cognitive developmental robotics that we have
[inaudible]. And [inaudible] towards language acquisition by CDR, especially by the
vocalization. So how do robots vocalize the vowel like infants. That's the outline of today's
topic.
So what's the human development? That's a big question, of course. So I was born 1953. And
this is elementary school in Japan and the junior high school and the high school student I was.
And I got [inaudible] students. And I stayed one year in the University of Milan 1986, '87. This
is the first RoboCup in '97. And I enjoyed karaoke in Shanghai [inaudible]. This is a typical
human development. Okay. I'm just joking.
Okay. So this indicates some early brain development of the human beings. So after the
conception, this is an embryo, 18 days, 20 days, 22, 24 days, so every two days. So at the
beginning just very, very simple shape, but gradually form [inaudible] some type of the brain
structure, like this one. They're very, very complicated. So every two days.
So these days maybe the gene information is very dominant to specialize some structure of the
brain cell. But because I'm not sure what's happened in the earth, outside the earth, this is
because, you know, only to us, so -- and after 20 weeks or 25 weeks, so we're almost the brain
structure the same as the adults, but the connection is not complete.
By the way, I was impressed by the textbook of the neuroscience by U.S. researchers. I cannot
find this kind of book in Japan, because Japan research is so idle about this education. But,
anyhow, this is very nice book about the neuroscience.
And that is what's going on and on, so this axis indicates how many weeks by the conception,
and the [inaudible] indicates the different kind of the behavior. For example, the sucking and the
swallow is around here, so we're 10 or 11 or 12 weeks, so where the -- there were fetus studies
on this kind of behavior. About the sensation, the touch/taste sensations first, around 10 weeks,
so the fetus sense the touch/taste informations. And also around 20 weeks the auditory or the
vision started.
Of course it's not so [inaudible] but the baby will [inaudible] stimulus [inaudible]. For example
[inaudible] showing some light, even though through the body of the program, but, anyhow, so
the fetus react to the light iteration. Therefore they have already perceived some kind of light
stimulus, this one.
And this indicated in the womb. Nowadays we can see in the womb some real movement of the
fetus. For example, in the 26 weeks, you can see that [inaudible] just touch the faces so often.
And this indicates at 33 weeks, just before the birth, and we can see that, so open the mouth and
open the eye and so on. So before the birth, they have already studied the very different kind of
behaviors, something.
And after the birth, the newborn baby obtained, learned the different kind of behaviors. For
example, at the fifth month the hand regard that -- the hands really just -- the baby look at the
hand.
So this means that as a learning target from the viewpoint of robotics, that's the forward and the
inverse model of the hand. That is that, for example, we [inaudible] the angles over here, here,
and here. We can [inaudible] the posture of the hand. This is the forward [inaudible] with the
forward model.
On the other hand, if I try to catch this one or this one, and then [inaudible] to find that posture,
what angle to here and here to here. This is inverse model. So at the [inaudible] of the baby, the
hand regard [inaudible] mean that they learned some kind of forward and inverse model of the
their hand.
Or at six months they finger the other's face. For the adults it's no wonder digital information
and tactile information together. Okay. But for the baby, it's not intuitive yet. So this indicates
some integration of the visual tactile sensation of the face.
Or at seven months drop object and observe the result. That's the causality and the permanency
of the object.
And eight months, nine months, 10 months, 12 months. At 12 months, this one here. So there
are many different kinds of behaviors. For example, at 10 months, they imitate movements. So
this is imitation. It's very, very, very important for the baby to grasp the different kind of
behaviors. And to grasp and pretend. Also especially pretend. This kind of mental simulation is
a very, very difficult issue for the robots.
So for just one year the newborn baby obtained different kind of behaviors. And now I'm sure
we, the robot researcher, cannot build such a robot who can't obtain these different kind of
behaviors just one year. So it means that the big, big mystery why or how the newborn baby
obtain these kind of behaviors.
So some people say that, okay, there's no learning; from the beginning they have some capability
to do that, but it does not answer for us how to design, how to embed these kind of structures.
So this is some book, the Nature Versus Nurture, so where no longer it is nature versus nurture
but the nature via the nurture, the Nature Via Nurture. Okay. So some people say that
everything is determined by the genes. Or, on the other hand, everything determined after the
birth, the -- from environmental or social, environmental, everything. But nowadays the two
schools or two ideas is closely [inaudible] in the different [inaudible] so no longer it is nature
versus nurture but nature via nurture.
So a balance between the nature side, just we say the embedded structure, and the nurture side,
learning and development, side is an issue designing the artifacts like humans.
So this is our robot. Okay. That's Affett I just mentioned. So this is -- was a skin and we've got
the skin. So we have the special -- the material to make the -- this kind of the skin of the baby
robots. I suppose maybe this is only the video. So all you can see [inaudible] some feeling or
something. But actually if you touch this robot, it's really, really -- the robots are a real baby,
like a real baby. So the very, very special materials.
And, sorry, this was not the movement. But, anyhow, also the movement. So we have some
special [inaudible] the actuators to realize the movement, very, very soft movement like a baby's.
Okay.
So reason why such a baby robot is that because we'd like to make robots much more intelligent.
But what intelligence or how can humans be intelligent. That big questions. So in order to
answer this question, it is not sufficient -- it's not sufficient for the only one discipline but the
interdisciplinary approaches seem promising. And one approach is to include the process of the
building such robots that develop like humans, the pathway to make robots more intelligent.
And especially we focus on what kind of interaction between the baby, the infant and the
caregivers. And in this situation, in these interactions, some kind of some emotional interactions
were affirmative bias of the caregiver is important.
So before the Affetts or the several of the baby robots, it's sometimes mechanical, something not
so realistic. But the two study patterns of emotional interaction between the infant and the
caregiver, we need much more realistic the baby. So that's the reason why we build some
realistic baby, the Affett.
Okay. So we suppose that the body shapes brain, but how does the body development affect the
development of the brain. Because this is kind of -- again, this is some kind of the -- in the brain
scientist's approach, because the brain scientists, all the time they ever see the brain, brain, brain.
But actually we focus on the body much more because we suppose that the body shapes the brain
in the early development phase of the infant.
And the nature versus nurture, just I already mentioned, okay, so the nature side and the
maximize -- okay. The nature, we tried to reduce the nature side and maximize the nurture side.
In other words, make effort to explain by learning and development as long as and as much as
possible to acquire the blueprint to design the intelligence. That's why we say the cognitive
developmental robotics.
So what's the cognitive developmental robotics? So the cognitive developmental robotics aims
at understanding the human cognitive developmental process by synthetic or constructive
approaches by using the robots. And its core idea is the physical embodiment and the social
interaction that enable information structuring through the interaction with the environment
including the other agent.
So the physical embodiment is just some body structure and some kind of a musculoskeletal
system and also a neural system, neural architecture of the brain itself, if everything is some
physical embodiment.
And the social interaction is especially focusing on some kind of interaction, especially the infant
and the caregiver's impression, to develop some behaviors. And also the caregiver's side changes
due to some big changes of the infant behavior and so on. So the mutual interaction or mutual
imitation happen.
So about the physical embodiment, before my birth, okay. But Sperry, he's the very famous guy
with the split brain, but he mentioned that to understand the mind, begin with the patterns of the
motor activities and derive the underlying mental structures from them. Anyhow, so the -- more
than -- almost -- more than 60 years ago some guy mentioned about some importance of the
physical embodiment.
And that is the social interactions. So this guy is very young, first of all. The mind cannot be
understood except in terms of the interaction of the whole organism with the external
environment, especially the social environment.
So the title of his book is Out of Our Heads. So as I mentioned, the brain research, it just focus
on the brain. But the mind does not come only inside the one brain. So between the brains, so
there's some interaction. So he focus on this kind of idea. So where this is a person may be
the -- we may have some similar idea because, you know, the interaction or social interaction
may happen, may coexist, some kind of situations of some interaction, and we suppose that,
okay, so some kind of the -- something like a mind may happen through the interaction. This is
kind of the idea of the ->>: How broad is the mind [inaudible] the concept of mind is [inaudible]. Can you talk about
consciousness?
>> Minoru Asada: Consciousness? Oh, consciousness.
>>: [inaudible] motor control aspect of the mind?
>> Minoru Asada: But the consciousness is also [inaudible] issues and at this -- yeah, at this
moment, so of course, you know, with consciousness also I suppose maybe kind of the
[inaudible] of the -- the -- you know, during the interaction or the -- for the -- in the videos,
consciousness and consciousness -- of course we can [inaudible] define the consciousness or
unconsciousness. But our point is how to design or how to make this kind of a situation through
the interaction.
So, for example, another example is Professor [inaudible] Android [phonetic]. So Android is
Android. Not the [inaudible] but we feel something about his Android. So we can share
something. So this is a [inaudible] even though, okay, so in the case of the interaction, so, for
example, the communication, sometime [inaudible] ambitious about [inaudible] human or not
human something. But the important thing is some capability we can communicate. So in that
case the consciousness is a sort of this kind of the phenomena. Yeah. Just my personal idea,
right? So but, anyhow, that's kind of the big issue. And also as you mentioned that the motor
sensorimotor activity is a fundamental one. That's a very, very basic requirement.
>>: Yeah, but is it clear that for the motor skills it requires body [inaudible].
>> Minoru Asada: Yes.
>>: But for some other skills, for example, like speech skill, maybe vision skill, do they require
the body in order to --
>> Minoru Asada: I suppose maybe in a way. It depends on the [inaudible]. Especially the
motor activity. For example, the physical interaction is one of the typical one of the motor
activity. But also, you know, vocalization is also the motor activity, so the motor theory.
In that case, I suppose maybe -- for example, the physical movement, some -- the sense of the
motion is very important to realize the self or something. But in case of the conversation or the
vocalization, communication, in that case the motor activity of the vocalization and it auditory,
their both together and close to each other, so that's kind of -- the interaction is very important to
think about or the communication.
Also I suppose maybe it's a weight of some kind of the consciousness. Yeah, of course,
consciousness can be -- it can be some state of the consciousness of the -- only the individual.
But to do that I suppose maybe some experience of the interaction with other people is
necessary.
A very simple idea that, okay, if we send the robot, one single robot, to the moon or some planet,
so this robot does not need to emerge the mind, but if you put two robots or three robots or one
human or two more humans, right, in that case they need some communication who
communicate with the other. In that case there's some possibility to immerse some kind of the
mind.
Today I have a big mouth, okay. Not so scientific, but that's my idea.
So, anyhow, so this indicates some achievement of my purpose project and starting from the
fetus simulations and also the motor skill learning, but not ones [inaudible] some pneumonic
activity or some kind action of the muscles, because the action of muscle has some kind of
very -- very soft movement, very -- affects the movement in like the humans, and this kind of,
you know, actuator is never asked to -- some more of the literature interaction.
For example, in case of the Honda [inaudible], it's electric motors, it's very, very -- motor does
control, like this one, working. All the time motor is -- I know motor driving. So that's very,
very, you know -- but in the case of the humans, all the time we are idle. And then the one point
is just working like this one, so much, much energy saving. So in case of that issue, muscle is a
much more dynamic interaction is possible due to some kind of the -- you know, the compliant
movement and so on.
Okay. And also as I research issues include the self as discriminations or some kind of
interaction. For example, today I focus on the vocalization here or some attentions or some
imitations from social communication and so on.
So this is in -- summarized in my previous project. But, anyhow, so we focus on some research
issues of, you know -- during the development process of the cognitive functions.
And the many times I was asked can a robot physically growing? Yeah, of course that's our
desire. So we act design the robot physically grown up, but still it's very, very difficult. And
some researchers focus on what type of robotics that grown up, but they still need more time and
also the money and researchers too.
So therefore instead we design the different kind of the research platform, depending on the age,
and depending on the age we focus on the different research point -- research issues. For
example, in case of the just after the birth, the fetus -- or before the birth, the fetus simulation,
fetus simulations like this one, was very simple by the structure and muscle systems and also the
neural systems through and so on. And also the seven months or the eight months like standing
up or some crawling, they're working and so on. Okay.
>>: I was wondering what sort of software architecture is underneath all these? Is it like a fixed
program to do certain things, or is it more of a machine learning example correction kind of
thing?
>> Minoru Asada: Usually we are using the two kinds of learning method. One is the Hebbian
learning, the very, very simple Hebbian learning of the fire together, wire together. Another one
is the self-organized mapping. It is that kind of a clustering, a very simple one.
So today or in my project almost the learning method [inaudible] this one, the Hebbian learning,
based on the Hebbian learning and the self-organized mapping.
So the [inaudible] is kind of a [inaudible] I have used to deliver some learning to the cycle
robots. But this time it's just more fundamental research issues of the neural system. Therefore
the Hebbian learning and the self-organized mapping is most fundamental learning method. And
we share, we mean the robots and the human, share this kind of the learning methods.
So actually today I have not shown the neural fetus simulation. But in case of the fetus
simulation is some newer system and the Hebbian learning and the self-organized mapping is the
main, main learning methods. Did that answer to the question?
>>: Yes, yes.
>> Minoru Asada: Okay.
>>: Hebbian learning often is unstable.
>> Minoru Asada: Oh, yeah. So some covariance or something. Some modification of the
Hebbian learning, yeah, okay, so the very simple Hebbian learning is all the time the problem.
So we modified sometimes.
>>: So this Hebbian, use it for movement, motion, or ->> Minoru Asada: Yeah. I focus on doing the ->>: [inaudible].
>> Minoru Asada: No, no, no. Okay. So it's actually I have -- in our group we have simulated
the fetus simulations, fetus simulations [inaudible] Tokyo at Professor [inaudible] group and
[inaudible]. Now he's a research -- assistant professor [inaudible]. Anyhow, so they prepared
some simple [inaudible] of the left hemisphere and right hemisphere [inaudible] and that they
prepared some two-handed muscle. Each muscle has the muscle spindles, and some structure of
the [inaudible] neurons and gamma neuron to control the -- each muscles. So at one point each
muscle has one [inaudible] the CPZ. Okay. So with the two-handed muscles and some body
structure, okay, some body structures and also the neural system and the environment is
[inaudible] the muscle [inaudible].
So the three kinds, the environment, the musculoskeletal system, and neural system. All three
systems interact together. And each method has its own observations. And two-handed muscles
interact with each other through the muscle system and the environment.
So the Hebbian learning was for kind of the mapping of where is my body, is it the arm or leg or
something, because, for example, the fetus sometimes kick, kicking something. But the
environment is limited. Therefore the neck is bending. Like this one. But this kind of constraint
can't work with the mapping of the Hebbian learning, okay? So therefore the mapping at the
beginning -- so the random -- you know, randomly distributed, or the tactile sensation, paired to
the learning, the Hebbian learning. So here is a leg, here is a torso, and so on.
The Hebbian learning is the work for these kind of observations. And the self-organized
mapping, also the Hebbian learning, together that's kind of the image of the behavior. For
example, after the birth, they picked up the features to the outside and due to the gravity, so the
behavior like crawling or turn over happen.
So one point is that we did not write down any exact behavior, just this kind of behavior emerged
from the Hebbian learning and the self-organized mapping. But the important things is the
physical embodiment, so the body structure, musculoskeletal system, and also neural system too.
>>: One last question on that subject. Is there any sort of motivation?
>> Minoru Asada: Motivation? We'd like to know everything. Yeah. Because actually -- so
one of the question is that, okay, today I -- I -- okay. So I mentioned that nature and the nurture
side is big of the balance. To what extent we prepare and how much we expect from the
learning. So the age is very big issues.
For example, if you started at 10 months, okay, so the baby already crawling and try to start
walking, but from here not crawling yet. So the starting point is very important for us. So at the
beginning all we did is just kick [inaudible] three or four years' kids and one year baby and the
10 months, six months, newborn baby, and back to the fetus.
So the idea is that -- so we'd like to know everything, especially from the beginning. Of course
we know -- actually back, back, back, back. Yeah. So this time this simulation is just, you
know -- so this is a very, very [inaudible]. So it's still very difficult to simulate. But after the 25
or 30 weeks, okay, so the fetus simulation started from this age. Because the brain structure
almost the same as a human adult's. The connection not complete.
But if you -- of course, in our desire, I'd like to start this point, and then here and here and here.
But still it's very, very difficult. And we do not know how to design -- how to, you know, design
the program even though the gene information, gene has some information to develop, but still
it's not so clear how to clearly form this one and so on, so on.
So of course, you know, we'd like to attack this kind of the simulation, but still we need some big
mystery. So in case of the fetus simulation, we started around the 25 or 30 weeks. But as
education we shoot back to, you know, this one maybe in the future.
Did that answer question? It's okay? So our desire is that we'd like to know everything to design
something.
So okay. So these -- today I have not explained detail, but this one is a [inaudible] some robots
with 52 degrees of freedom and it has vision, audibly and tactile, 200 tactile sensors, auditory. In
this case it's kind of a core learning or core development between the robot and the caregiver. So
depending on the change of the behavior of the robot, also the caregiver change how to teach.
So both side try to have some model of opponent, and this model interaction of the -- during -between the model, they enabled this kind of interaction. But it's still in the way, this one. And
this one is a tabletop -- the humanoids. And you can see that there's some kind of
communication happen, but actually nothing, almost a rundown.
Sometimes one robot try to imitate the other one as the behavior [inaudible] and so on, and also
the two eyes is fake. Typically can only hear. But this kind of a gaze, the gaze-like movement is
very, very important to learn some communication.
And we brought this robot to the hospital for the -- to care -- or the -- all these children, or
Asperger's syndrome children, how they react to these robots and how they try to control
something. But still also this is underway. And this robot is just for imitation. And this is
already down. This is ->>: So [inaudible] kind of learning, so I suppose that you put a lot of sensors throughout the
body so [inaudible] processes of learning and modify the behavior afterward? What's going on?
>> Minoru Asada: Actually, it's still underway, and one thing is unfortunately this guy, they
graduated and go on. Therefore, this stopped underway. So the actual -- we have not get some
exact result on this one.
So the one attempt is that so we tried to -- okay. So robots try to change the behavior according
to the teaching methods. So where this guy tried to keep and take the robots -- and these robots
have -- initial program is just like a very left walking, and through the interaction, so robot
change how to walk. And then, okay, so if the robot change the way to walk, so caregiver passes
the change of the walking of the robots. At the beginning very, very poor walking, but gradually
skillful. And now depending on the level of the skill, so the caregiver change how to teach. At
the beginning very, very carefully, very, very slowly, and so on. But then gradually, you know,
less support or something.
That's the story.
>>: That information is fact to you collect all this information ->> Minoru Asada: Yeah, in this case ->>: -- use it to update whatever motions they are ->> Minoru Asada: Actually, the -- okay. Actually let me explain. Actually what the robot have
done is that the robot try to collect the data of the -- his own movements, yeah, and depending on
the teaching, so they modify the behavior of the robots, its own behavior, so it's some kind of
mutual interactions.
I'm sorry [inaudible]. We stopped because this guy is gone.
>>: The point is that there is a learning method ->> Minoru Asada: Yes.
>>: -- that make use of the information you collected through ->> Minoru Asada: Yeah.
>>: -- to update [inaudible].
>> Minoru Asada: Not everything. But in this case just only, you know, some kind of -- how do
you say -- the posture, posture information.
>>: [inaudible].
>> Minoru Asada: Yeah. Actually, you know, my personal design is some eye contact. So the
eye contact is very, very important to verify that, okay, I'm okay or not or something like this
one. But at this time it's not included, just only the posture information is updated and run.
Okay? I spent so much time already. Okay.
So the acquisition -- or the topic towards the language acquisition by CDR. So we're I suppose
maybe ->> Zhengyou Zhang: Yeah, it's probably 40 minutes.
>> Minoru Asada: 40 minutes, okay.
>> Zhengyou Zhang: Maximum 40 minutes.
>> Minoru Asada: Okay. This indicates the developmental changes in their perception,
articulation, and interaction of the infants. So this indicates the months, first month, second,
third, fourth, fifth, sixth month. It's the first half year of the infants' vocalization and the
interaction.
So the perception is that the first month is discrimination between the mother's voice and others.
So during the -- before the birth, the fetus listen the mother's voice already. Therefore some kind
of auditory template already happen. So after the birth, they show some preference to the
mother's voice already.
And also they prefer some of the motherese. Motherese is a very special ways to speak to the
kids. For example, hi, hi, hi, hi, hi, like this one. Okay. You know, so that the infant or the
baby can anticipate easily and so on.
And also the [inaudible] this one, this one, this one. So development of perception is an area,
then the vocalization itself. So the vocalization is around three or four months. The cooing is
kind of the practice, coo, coo, coo, coo, coo, like this one, and also the babbling is around here.
So even the perception or the area, then the vocalization, but somehow, okay, some kind of
interaction, so the interaction is the motherese, the mother shows some kind of special way to
speak to the kids, and the infant's preference to the motherese or the caregiver's imitation with
high frequencies, okay, around here the imitation may happen.
And also infant's promotion of mother's contingent response. The contingent means that some
kind of relation or the response. So the contingency is very important for the development
process of the infant and the caregiver's interaction.
And the infant's vowel-like cooing promotes the mother's imitations. So it's a mutual imitation.
Okay? So the mother's very happy if their baby imitates the [inaudible] her own voice. And also
the baby is very happy if the mother is imitating the baby voice. So the mutual imitation
increase the frequency of the occurrence of the both side. It's very, very good practice for the
vocalization of the birth side. And also words, okay, cooing and the motherese and so on. Okay.
So actually there are some theories about the development of the speech perception and the
vocalizations. So how do infants develop their faculties of speech perception and the
vocalizations. So how do infants develop their faculties of speech perception and the
vocalization through the interaction with a caregiver in the beginning -- benign environment due
to the various factors such as speaking rates and the voice of the speakers in addition to noise and
so on.
And there are several theories about this one, the invariant acoustic functions, the motor theory
and so on. And so today I just pick up the very important ones, so [inaudible] the motor theory.
It's Liberman. It's 1967. The one solution to the problem of the lack of the invariance in
acoustic signal is to suggest that speech perception may be aligned with embodied articulation
process. So this is the motor theory in 1967.
And after that you want some modified one that happened, and our summary point is very close
to here. So our physical environment is one of the constraint of the vocalization and also was -this makes, you know, some kind of -- how to say? -- some principle of the vocalization -- or
development of vocalization itself.
And another one is a modified, and also we're -- I'd like to skip this one. Another idea is some -it's the speech and the gesture momentarily activate and entrain one another as a coupled
oscillator, the dynamic process. So where we applied this kind of idea to the dynamic process
between the infant and the caregiver's interactions, so not fixed the theory of the interaction, so
changing all the time and so on.
And also there's a neurophysical -- neuropsychological studies for the speech and gestures and
shows that, for example, gesture and speech are tightly linked or some common brain mechanism
exists for the language and the sequential motor functions, specifically in the lateral periSylvan
cortex of the dominant hemisphere of something. And the high level of the EEG activity are
found in the motor areas of the brain when the people are asked to read silently, especially when
the words are the verbs.
So these studies indicate that for the vocalization and the -- you know, the -- is -- also is related
some to the motor activities, and especially linked to some sequential motor functions and so on.
And also some studies show the high activity in the left premotor cortex when the people had to
retrieve the words for the tools but not the words for the other conceptual categories. And that's
a strong connection between the cerebellum and traditional language areas of the brain such as
the Broca's area.
Or some functional MRI studies shows that the Broca's area is activated when the people just
think about moving their hands, just thinking, but still this kind of area is activated.
And so now is a very famous theory, is the [inaudible] system, so the activation and the
perception is tied to the [inaudible]. And also this is related to the [inaudible] and so on.
So I suppose maybe I like to skip these ones. But, anyhow, the one thing that for -- you know,
before six months of the baby, so they can perceive all kind of vowels around the world. Or the
other -- in other words, they perceive all kind of vowels continuously as it is, but after the sixth
month, due to the effect of the native language -- for example, in Japanese, in Japan, the vowels
are a, i, u, e, o, like this one, and so on. So after the six months their capability of the perception
of the vowels are changed, and then some kind of the perceptive categorization happened. Even
though I change [making sound], like this one, before six months, the newborn baby -- by the six
months the baby perceives continuously the change of this one, but after the six months they
perceive only [speaking Japanese vowels] like this one. This is a [inaudible] or it's a magnet
effect.
So these kind of studies indicates this, not exactly the same one [inaudible] to this kind of the
finding, so the -- before the -- okay. Before six months they have some more general capability
of the speech perception, but after six months they tuned.
So typical example is that for the Asian people, especially Japanese people, it's very difficult to
discriminate L and R. It's very easy for the U.S. or the American people. Because after six
months -- or in case of the L and R, maybe I forgot exactly, but, anyhow, so the newborn baby,
Japanese newborn baby, they can discriminate L and R collectively. But after, you know, maybe
eight months, nine months, anyhow, so their perception shifted and they do not need to
discriminate L and R anymore in case of Japanese. So they lose the capability. So [inaudible]
happen in the early days. Okay. So I skipped this one.
So imaging studies of the neural verification of the behavior data, behavior data, brain regions
and so on, [inaudible] to the mechanism? Not yet. So what kind of learning methods and what
kind of interaction between the caregiver and the infants enable to share the phoneme regardless
of different utterance regions?
Okay. So one of the big, big issue is that this is in the formant space of the -- formant [inaudible]
the peak frequency from the lower one. And it's supposed to be that a very good feature to
discriminate above the others. Anyhow, so this was the first formant and the second formant. So
you can see that adults is here and here. The infant is here. This is the robots. Okay.
So the point is that from the sound itself, the adults and infant, is quite different. So I mentioned
that the mutual imitation is very important. But the imitation is not sound as it is. We need some
kind of bias or something due to the, you know, beyond the difference of these sound properties.
So the physical embodiment is learning the relationship between our own vocalization and its
perception. It's own. The social interaction is the two roles of the mother's imitation. One is to
show the correct example for the correspondence between the infant's vocalization and her own
one. That's kind of general. It's no wonder.
The more important thing is that to entrain the infant's vocalization to her own one unconsciously
since she cannot perfectly imitate the infant's vocalization.
So even the sound is different, but some kind of entrainment between them, or, you know, the
caregiver try to entrain the infant's vocalization to their own vocalization area [inaudible]
unconsciously because she cannot perfectly imitate the infant's vocalization.
So in case of the imitation between the infant and the caregiver, as I mentioned, as a sound, it's
quite different, okay, this is a disadvantage or not. Okay. So sometimes, you know, this kind of
disadvantage has a very good effect to the interaction, because it's kind of the handicap.
Therefore the caregiver try to entrain the infant's vocalization to his or her own areas.
Okay. To show these kinds of things, we have down the series of the works. Okay. So this
indicates the computation models for the speech perception and vocalization. So I skip the
details of the existing ones, but one of the important thing is that, as I mentioned, so mutual
interaction between the agent with a body structure, different vocalization with different sound
properties, before this one, there are several computation model of the perception and
vocalization, but they suppose that, so, okay, so -- so the agent is a very homogeneous agent,
they have the capability to reproduce as a sound with the same sound, similar sound. But these
studies focus sounds into interaction, a mutual interaction between the agent with different body
structure, different body system or different vocal system.
Okay. Also I skip this one. This is some kind of trial of the work. This is actually the Waseda
University group of the vocal system. They tried to reproduce the vocal system mechanically,
artificial lung and also with some kind of vocal tract and the tongue and so on. Very, very
sophisticated one. I hope there's some sound [playing sound]. Can see, right? Yeah, Japanese a,
i, u, e, o. Okay? This is [speaking Japanese]. Okay.
So today I just -- I have shown -- I showed in earlier activities only the vowels, but the consonant
is much, much more difficult. Okay. So many people asking me why not using the speaker,
speakers are much easier, but that's against our philosophy. The physical embodiment is very,
very important.
Anyhow, this is in one of the example. So the very -- they had very, very good skill to design
this kind of thing. They had good [inaudible] this one. Or another group is, you know -- just the
human vocal tract is very complicated shapes. But actually we can, you know, simplify just
some silicon tube. So some group just simplified the vocal tract just by cone, you know, the
cylinder type like this one, the silicon tube. Okay. And also we [inaudible] this one. Okay. I
skip some details of this one. But okay.
So this is from our first trial ten years ago, ten years ago, the vowel acquisition by maternal
imitations. The vowel imitation between the agent with different articulation parameters by
the -- this is by the parrot-like teaching.
So infants seem to acquire, to imitate, the phonemes without any explicit knowledge about the
relationship between their sensorimotor system and phonemes. So ten years ago we suppose
that, but [inaudible] you found that so when the fetus in the womb opened the mouth when they
listened to their mother's voice. So some kind of relationship of the vocalization. We didn't
know anything about why this has happened, but somehow the observation of the fetus' behavior.
So they listen to them, they listened to their mothers voice, they opened mouth. I don't know. I
have no idea. But ten years ago we suppose that relationship that fetus may know something.
Anyhow. And also with the capability to reproduce adult sounds there, how can we develop that.
That's a question.
So the purpose is that to build a robot that can acquire the vowels of the human caregiver. The
design issue is that what kind of a mechanism should be embedded, so that's a robot's
mechanism. And also what should be the behavior of the caregiver.
So I mentioned that to keep it [inaudible] robotics is that physical embodiment and the social
interaction. So the physical embodiment of what kind of mechanism embedded beforehand and
what kind of behavior enable the robot to learn something.
Look at the real things, the observation, the human inference. As I mentioned, the mutual
imitation increase the frequency of the utterance of this one. Then we conjecture that it
reinforces the infant's speech-like cooing, and it helps to find the correspondence between the
cooing and phonemes. Okay?
So we designed also a very similar one, not the [inaudible] or something but very simple silicon
tube, okay, and we are using some real medical tool for the patient who then [inaudible] artificial
vocal cord. So just put here, and the patient change the lip shape and then the sound is changing.
So we applied this mechanism. Okay. Sound sources like this one, but the resonance when there
was a sound change the outcome of this one. This is a generation process.
But listening we focus on some formant extractor, a formant feature as a sound frequence, a
sound feature. As I said, this is a sound spectrum and the formant is peak frequency from the
lower one, F1, F2, F3, and so on.
So this is a formant space of the resonant frequency ->>: So can you go back to the earlier case. You use a very strange [inaudible]. So you get a
personal analysis and then you churn that through whatever PC processing [inaudible].
>> Minoru Asada: Okay.
>>: So now but that ->> Minoru Asada: [inaudible].
>>: -- your philosophy.
>> Minoru Asada: Just a moment, just a moment. So we just picked the sound source and the
changing, and this is a formant space, just a fundamental structure of the robots.
>>: Yeah.
>> Minoru Asada: What is the question?
>>: Well, my question is that in order to make the robot speaking back, so am I understanding,
what I recall here is that you actually do some conversion between a [inaudible] pickup to the
microphone into articulation.
>> Minoru Asada: Uh...
>>: And then the articulation ->>: He's talking the [inaudible] then he [inaudible].
>> Minoru Asada: Yeah. I'm still not sure of your question.
>>: I don't know if this [inaudible] formant [inaudible] information.
>>: Correct, correct.
>> Minoru Asada: This one.
>>: So the question really is how does this kind of interaction reinforce your original belief that
you need to have interaction in terms of motor -- in order to control how the robot can speak
back to you.
>> Minoru Asada: Um-hmm.
>>: So you control the robots ->> Minoru Asada: Yes.
>>: [inaudible] early on so that the robot can [inaudible].
>> Minoru Asada: Okay. So I explained a little bit more about our system. But at this moment
I may answer that we did not exactly control explicitly the vocal system. But from through the
interaction we may change something, the robot may change something. Not explicitly come to
the parameters itself.
>>: Okay.
>> Minoru Asada: Okay. I will explain. And then they -- ask me again, okay, what were your
two, okay? Okay. So, anyhow, so why we are using the formant space is we suppose that -- that
the vocal feature for the vowel discrimination and also non-human primates and the birds utilize
as the perceptual cues for this one. Therefore we suppose that the formant structure is okay, the
embedded structure different.
But, as you ask me, still not clear if the human perception -- if human auditory system utilize the
formant or not physiologically. As a processing it might be okay physiologically. So maybe
four years ago I have checked with a brain scientist, it was the auditory system does not exactly
have some [inaudible] process the formant itself.
You just reiterate, you have --
>>: No, I just [inaudible]. You're not studying the ear in this case [inaudible] microphone
[inaudible].
>> Minoru Asada: That's right. That's right. Yeah.
>>: And you focus on the vocal.
>> Minoru Asada: On the vocalization. Sorry. Yeah. Okay. That's a point [inaudible].
>>: [inaudible].
>> Minoru Asada: I see. I see. Okay. I got it, yeah. Of course ideally we'd like to also sense
[inaudible] the auditory system too. But this case it's a very simple formant extraction. So sorry.
And we added a very simple tool here for the brain-like processing. So the auditory layer and,
you know, action layer. So we're -- and this time the behavior of the caregiver is parrot-like
teaching, okay, as the real infant show that some cooing at the fourth month or three or fourth
month [inaudible] cooing, coo, coo, coo, like this one, so therefore the robot tried to generate
some sound randomly.
So they were randomly -- sorry. Randomly controls inward the -- sorry. Control the change of
the shape of the inward, the cylinder, and the different sound happen.
If caregiver suppose that, okay, this sound is one of my own vowels, for example, a, o, e, in that
case the caregiver [inaudible] his or her a, o, e to the robots. And to two layers store the data.
So this is, you know, [inaudible] layer and this is the auditory layer, so the formant vector and
the articulation vector is like a Hebbian learning, the connection between them, so the two kinds
of learning is that the Hebbian learning between them and also the self-organized mapping of
the [inaudible], this one.
So as a result, okay, this is experiment by former students, and this is a very, very simple robot,
the vocal robot. Okay. So this indicates some research in -- it's articulation vectors correspond
to the variation of the caregiver's vowels. For example, [speaking Japanese vowels], like this
one, and depending on the order, the robot's sound, the caregiver return his or her own vowels
with a different rate depending on the situation, [making sound], like this one. So therefore some
kind of distributions happened.
And then we introduce some kind of physical constraint so that the subjective criterion is more
facile articulation better. So less torque and less deformation change. Then we modify the
Hebbian learning, like this one, and of course this is kind of famous, and then the results is like
this one, so much more the combust of this one. The best [inaudible] physical constraint. And
[playing sounds] can you hear? It's okay? It's okay? Okay?
>>: This is after training, right?
>> Minoru Asada: After training.
>>: So you need to learn the same sound, right, so the caregiver say ah [inaudible] --
>> Minoru Asada: Yes. Yes.
>>: -- they basically would say ah.
>> Minoru Asada: Yeah. But it depends on the situation [inaudible] like this one.
>>: [inaudible].
>>: So how do you get feedback information to the model?
>> Minoru Asada: Feedback ->>: Feedback [inaudible].
>> Minoru Asada: Okay. That's a good point. Okay. So feedback is not auditory, nor
[inaudible]. That's a big mystery. In case of the human, no auditory feedback. You cannot
speak anymore. So this robot just a signal comes from the caregiver. Is this one. So where this
guy just [inaudible] not his own but they know the mother's voice. The connection [inaudible]
the correspondence. So we're hearing the ah means, okay, my own ah, like this one. It's like a
[inaudible] system. So the [inaudible] and the [inaudible] are the same or some strong
connection.
>>: But somehow you have to know what are the [inaudible] layer in order to -- that
corresponds to the [inaudible]. So if you don't have an [inaudible], how do you -- what do you
use [inaudible]?
>> Minoru Asada: This time it's just only the -- all or nothing. So the [inaudible] teaching is just
a caregiver return so they know the action vector is selected, this one, this is one, this is one, this
is one, and also with the formant vector, of course they know [inaudible] may happen. During
this process it was Hebbian, self-organized mapping. For example, in this case, so where the
part -- the caregiver supposes that just five categories, a, i, u, e, o, but if the -- if the L-R signal
happens, so self-organized mapping may help just four categories or six categories. In this case
it happen.
But, anyhow, in the principle, so this is kind of very simple binary selection on or off because
depending with the decision of the caregiver.
>>: I see. Okay. So you have to have preknowledge about what are the [inaudible] layer that
can generate the [inaudible] or how do you learn it?
>>: [inaudible] random try them?
>> Minoru Asada: [inaudible] actually, the action vector is five dimensions. We have five wires
to deform the tubes, the five dimensions. And the [inaudible] just clustering depending on the
response of the caregiver.
>>: But the caregiver does not say this [inaudible] ->>: But you have to know that, right? If you don't know, how do you -- on what signal do you
use to learn [inaudible]?
>> Minoru Asada: [inaudible].
>>: But does the caregiver say this is [inaudible] caregiver say that?
>> Minoru Asada: Yes.
>>: [inaudible].
>> Minoru Asada: In some sense. Yeah. Right, right, right. That's right. So the -- only the
good response pick up -- the caregiver picks up on the good sound every time so the robot select,
oh, this is good, this is bad. They abandoned the bad and selected only the good. Yeah.
[multiple people speaking at once].
>> Minoru Asada: Of course, you know -- yeah, that's -- so the [inaudible] response from the
caregiver means good. No response from the caregiver bad.
>>: Okay.
[multiple people speaking at once].
>> Minoru Asada: Ah. Okay. That's -- your question. Oh, sorry. I didn't understand. Okay.
Anyhow, so this is some -- okay. So okay. Actually this sound is in a not-baby-like voice, right,
it's like middle age, after college, yeah. And then we change [playing sounds]. Yeah. So we
change the vocal cord and also the air of like artificial lung on someone. And also the
relationship imitation -- oh. The batteries can't -- okay. Suppose maybe I can finish now
[playing sounds]. Okay? More like ->>: So you have -- in your system you have a mapping between what kind of control signal you
have that controls the mechanical device with respect to what humans [inaudible].
>> Minoru Asada: Yeah, yeah.
>>: Okay.
>> Minoru Asada: Okay. So what's missing? This first one is that -- I have a not mentioned is
that no auditory feedback. Yeah. That's very difficult program. So we suppose that one, the
following one. But, anyhow, so we're -- you suppose that -- so this one, okay? Sorry. So you
suppose that this is going to be okay, but actually you are tricked.
So now I just mention that I have ten minutes, it's okay?
>> Zhengyou Zhang: Yeah.
>> Minoru Asada: Okay. So actually I have two stories more, but especially -- only the one
more, one research is that so we suppose that sharing the process of the perceptual and the
behavior primitives between the caregiver and her infant across their different bodies. That's our
questions. So already the physical quantities of our quantities of their producible vowels are
different. Or infants' audition and articulation adapt to the mother's tongue. So this is the kind of
dynamic process including intrapersonal interactions, so social segment, social engagement, and
perceptual development. So this just shows some kind of emotional affirmative by us or the
caregiver's side.
So our anticipations bias our perception, and the caregivers' anticipation for their infants can bias
their perception and imitations.
Okay. So in the previous video clips, just a, i, u, e, o, you have already expect, anticipated next
[speaking Japanese vowels]. This anticipation biased your perception. Okay. So this perception
is -- this kind of situation is very, very important.
So the methods that aids unconscious guidance in mutual imitation for infant development based
on the biasing element with the two different kinds of modules.
One is I have already have mentioned the perception magnet, so the magnet effect.
Other one is auto mirroring bias. That's by which the heard vowel is much closer to the expected
vowel because the other's utterance is an imitation of the listener's own utterance. This is kind of
the automatic bias I mentioned. You already expect the next [speaking Japanese vowels] or
something. So this kind of anticipation bias your perception itself.
So I will show there is some kinds of -- the evidence. For example, so we generated, you know,
some kind of sound [inaudible] around the [inaudible] primitives. And we asked some students,
some subjects to just copy the sound, not the vowel. And the outcome is like this one. Okay. So
the subject did not listen the sound as it is, it's like sound as their own vowels, and also generate
by their own vowels. So therefore the variance is much smaller. So this is some evidence of the
magnet effect.
The other one is an auto mirroring bias. So this situation is that, a very similar one, so the
[inaudible] we say please imitate the computer sound. And the experimental [inaudible] please
imitate the sound. Sometimes the robot imitate [inaudible] voice. It's actually the same. Bad
this is not kind of bias, kind of the [inaudible] effect. Okay? The [inaudible] effect is that this is
the two groups and the one group, some [inaudible] effect [inaudible]. The dynamic group is not
so effective, but two groups improve that better and better.
So some kind of possible effect happen. So we just say that couldn't you have just -- we ask the
subject please copy the sound, but experiment group please copy the sound. Sometimes robot
copy, imitate your own voice. Okay. So the subject anticipate that, okay, robot may imitate my
own voice, therefore, okay, [inaudible] I say -- subject say ah. Okay. Robot react like an ah, and
then the subject say again ah. So the difference of the T and T plus 2, it's not so different in the
experiment group.
But this group, you know, all the time is changing. Therefore the difference, okay, like this one,
so control group and experimental group, so experimental group smaller, the difference or the
voice difference of the -- it's only smaller than the control group. So this indicates that so where
our anticipation affect your perception and the response to.
So we simulate this one with -- okay. Skip this one. So as [inaudible] auto mirroring bias like
this one and also magnet effect we're using some kind of the normalized Gaussian network to
control the parameter of the magnet effect and so on.
So this is some transfer of functions. And this is some results that -- Tom's paper, this one. And
okay. So this shows, you know, the blue indicator is the mother robot's response and the red
indicator is the infant robot's response. So you can see that for -- even though [inaudible]
happen, but you can see that, okay, so the baby correctly [inaudible] like this one with both
biases, the magnet effect and the auto mirroring bias.
But if -- yep. So this is no auto mirroring bias. So we'll very quickly converge due to the effect
of the magnet effect by the wrong phrase.
In case of the [inaudible] bias, so they were almost very -- you know, the -- the distribution
location almost it's okay, but not the correct expression of this one, nor is the various almost
random, this one. Okay.
So but actually we need some kind of balance between the magnet effect and auto mirroring bias.
So the -- we experimentally checked in were some what kind of balance is suitable for the, you
know -- to acquire the best solution, but still we need some kind of some trained analysis to
obtain the best parameters. Okay.
I -- I do not have so much time, so this one -- so I skip this one. Yeah. It's actually, you know,
in [inaudible] so the -- some 20 percent the imitation, less than the different reaction of the
mother. But even the [inaudible] rest imitative situations, still we can do the, you know, with the
second story. But I skip the detail. I skip this one. Sorry. No time. Oh.
So where the question is of when not using a speaker, okay, so where the -- we focus on the
physical embodiment. The physical embodiment enables to introduce the subjective criterion
such as less torque and less deformation. And we introduce the respiration turn taking with the
caregiver. So breathing is air force for the speakers for the turn taking. The speak and the rest
and the speak and rest and so on. So the mutual -- the -- so turn taking is also really important to
the -- to any of the communications.
So lastly we applied, sorry, [playing sounds]. So we had to design this kind of robot that still we
like to show some capability, that these kind of things and the pictures here, okay, so we can
produce a baby-like voice. So the next challenge is how to control this one by using some
[inaudible] and so on.
Okay.
>>: [inaudible].
[laughter].
>> Minoru Asada: [inaudible]. So this just summarizes the cognitive developmental robotics is
a promising approach to new science of the human cognition based on the design theory.
Learning and development of speech perception and vocalization is a typical issue including core
ideas of CDR, cognitive developmental robotics, the physical embodiment and the social
interaction and the cognitive development.
The caregiver's explicit/implicit scaffolding leads the learners' capabilities from the
language-general to language-specific ones.
The much more realistic situations not only for the practice in the vocalization, but also for other
perception and actions for the object categorization, lexicon acquisition, and language
communication should be attacked.
The collaboration with the behavioral and neuro-scientific studies using the more realistic
platforms for interaction experiments is essential.
So this is acknowledgment of the group leaders of the [inaudible] project: Professor Hiroshi
Ishiguro, Professor Koh Hosoda and Professor Yasuo Kuniyoshi and Professor Toshiro Inui.
These three guys are robotics, but this guy is not robotics, the cognitive neuroscientist with
imaging studies and so on.
And the vocal system is Dr. Yuichiro Yochikawa, and this is Dr. Katsushi Miura and Ph.D.
candidate Hisahi Ishihara and Ph.D. candidate Yuki Sasamoto. Okay.
And thank you for your attention.
>> Zhengyou Zhang: Thank you.
[applause].
>> Minoru Asada: Sorry. So I skip those -- many more.
>> Zhengyou Zhang: Just in time.
>> Minoru Asada: Just in time. The question [inaudible] we have already interactive.
>> Zhengyou Zhang: That's very [inaudible]. Thank you very much.
>> Minoru Asada: Yeah, thank you. Thank you for your all attention.
[applause]
Download