Applying Robotics and Genetic Algorithms to Synchronize Speech

advertisement

Using Quantum Fuzzy Logic to learn facial gestures of a

Schrödinger Cat puppet for

Robot Theater

Arushi Raghuvanshi

Prof. Marek Perkowski

24 May 2008

1

Background: Quantum Robots

A B

H

P Q

Quantum

Braitenberg

S1

L1

M3

M5

S2

M6

M4

L2

M2

M1

Mr PotatoHead

(ISMVL 2007)

Old Duck Biped

Schrödinger's Cat

Programming Robot Behaviors

Simple sequential flow with no feedback

Theatre

Director

Behavior

Selection sound

Input

Initialization

Quantum or other logic controller

Measurement Effectors

3

Programming Robot Behaviors

Adding emotions and environmental feedback

Theatre

Director

Behavior

Selection sound

Input

Initialization

Theatre

Director emotion

Quantum or other logic controller

Environment including human audience

Measurement Effectors

4

Programming Robot Behaviors

Emotional Interactive Robots with Sensors and Feedback

Modifying the Behavior

Theatre

Director

Behavior

Selection sound

Theatre

Director emotion

Input

Initialization sensors

Quantum or other logic controller

Environment including human audience

Measurement Effectors

5

Quantum & Fuzzy Logic

Quantum Circuit

(Can be transformed into Quantum Fuzzy

Logic, by replacing gates)

NOT -> Fuzzy NOT

OR -> MAX

AND -> MIN

Fuzzy Logic with MIN

& MAX operators

New Operators and

Literals can be defined for Quantum

Fuzzy Logic

6

0.3

0.3

Fuzzy Logic Example

0.3

0.3

0.7

0.7

0.7

0.7

0.3

0.7

0.7

0.3

0

1

7

Fuzzy Logic Operations

• Multiple ways to create Fuzzy operations

• Two examples below

• Example 1

– NOT (a) = (1 – a)

• e.g. NOT (0.34) = 0.66

– MIN (a, b) = if (a < b) then a else b

• e.g. MIN (0.3, 0.75) = 0.3

– MAX (a, b) = if (a > b) then a else b

• e.g. MAX (0.63, 0.83) = 0.83

a MAX b = NOT ( NOT (a) MIN NOT (b))

• Example 2

– NOT (a) = (1 – a)

• e.g. NOT (0.34) = 0.66

=NOT ((1-a)*(1-b))

=NOT(1-a-b+a*b)

– MIN (a, b) = a * b

=1-1+a+b-a*b

• e.g. MIN (0.3, 0.7) = 0.21

– MAX (a, b) = (a + b) – a*b

=a+b-a*b

• e.g. MAX (0.3, 0.7) = 0.3+0.7-0.21 = 0.79

• As in example 2, MAX and MIN may be misnomers. They can be called OR and AND operations instead

8

Representing Fuzzy Values on

Bloch Sphere

-1

X

1

0

Z

1

-1

|0

|1

0

0.15

0.5

0.8

1

Measurements

-1

1

Y

• Fuzzy values can be represented in different ways on

Bloch Sphere

• Simplest way to represent is along the meridian (as shown on left)

• After measurement, value can be 0, 1 or anywhere in between

• Other mechanisms (e.g. values inside the Bloch Sphere, or parallels of latitudes etc. ) can also be used

9

Quantum Fuzzy Literals

Y

Rotation Around Y Axis

Z

Rotation Around X Axis Phase Shift (270 degree rotation around Z axis)

X

We use this to define the Fuzzy NOT operations (Other literals can be used as well).

10

Quantum Fuzzy ‘NOT’ operator

Inverter is defined in exactly the same way as in quantum logic:

Fuzzy Quantum Not(α|0  +β|1 

)

 β|0  +α |1  where the square of the (in general complex) value associated with ket |1

 is an equivalent of fuzzy value in interval [0, 1].

11

Quantum Fuzzy ‘MIN’ operator

α1|0  + α2|1 

Min (α1|0  + α2|1  , β1|0  + β2|1 

)

= Davio (α1|0  + α2|1  , β1|0  + β2|1 

, 0)

β1|0  + β2|1 

α1β1|000  + α1β2|010 

+ α2β1|100 

+ α2β2|111 

0 R (Davio)

= ( α1β1|00  + α1β2|01 

+ α2β1|10 

)  |0

+

( α2β2|11 

)  |1

Input is Kroenekar product of 3 parallel input lines

α1

α2

β1

β2

1

0

=> Probability of measurement of ‘1’ is | α2β2

2

=

=

α1β1

α1β2

α2β1

α2β2

α1β1

0

α1β2

0

α2β1

0

α2β2

0

1

0

Toffoli Gate

1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0

0 0 0 1 0 0 0 0

0 0 0 0 1 0 0 0

0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1

0 0 0 0 0 0 1 0

X

α1β1

0

α1β2

0

α2β1

0

α2β2

0

=

α1β1

0

α1β2

0

α2β1

0

0

α2β2

000

001

010

011

100

101

110

111

Input Matrix Output Matrix

Quantum Fuzzy ‘MAX’ operator

The definition of Fuzzy Quantum Maximum

Operator is calculated from De Morgan rule:

A max B = NOT ( NOT (A) min NOT (B)).

13

Quantum Fuzzy Logic in Robots

Fuzzy Value Sensors

Light Sensors

0 = completely dark

0.5 = semi-dark

1 = completely bright

Sound Sensors

0 = pin-drop silence

0.5 = normal noise (people talking)

1 = loud crash

Image Sensors Quantum Fuzzy Logic

Motor

Controls causing output behaviors

14

Back to Robot Theatre….

Combination of Genetic Algorithm and Quantum Fuzzy Logic

15

Synchronizing Lips with Speech

Want This

Not This

16

Traditional Methods

• Use mapping of phonetic symbol to a lip shape (as shown on left)

• Sound streams can be parsed to generate phonetic symbols

• The methods are language dependent (i.e. different mapping for different language)

• Need to be modified for speed and style of speaking

17

Using Genetic Algorithms

A

Sound

Input

Initial Set of genomes representing lip movements

(initial population for GA)

These are dynamically generated by program

*** The matching function is dynamic, so it doesn’t matter if people have different accents, talk slower/faster, etc.

GA Engine

Input to Fitness

Function

(User evaluation – interactive) B

Sequence representing Lip movements matching with input stream ‘A’

ESRA Robot

Shows Lips

Movements 18

Genome

• A Genome (or a chromosome) is a pattern that corresponds to a behavior.

• A possible solution to the given problem can be encoded encoded to create a genome.

• In genetic algorithms, a set of random genomes are created.

• When decoded these genomes represent possible solutions to the given problem.

• In my experiment, a genome is an encoded string that represents a sequence of lip movements. For example:

49__9__31__9__46_1640__

• When decoded, this code represents the lip motion for the phrase “Hi I am a robot.”

19

Encoding Lip Shapes for Defining the Genome

Code 0, 1

Upper: 127

Lower: 127

Code 2

Upper: 87

Lower: 173

Code 3

Upper: 170

Lower: 120

Code 4

Upper: 140

Lower: 56

Code 5

Upper: 0

Lower: 0

Code 6

Upper: 0

Lower: 167

Code 7, 8

Upper: 80

Lower: 45

Code 9

Upper: 100

Lower: 45

20

Fitness Function

• The better the robot completes the problem, the higher the fitness function.

• When synchronizing sound and lip motion the fitness function would be a user input.

• To test the Genetic Algorithm, I calculated the fitness function by comparing the genomes to the best solution.

• The best solution was determined by the traditional method.

21

Fitness Function Algorithm

Best Genome (for calculating

Fitness Score)

1 4 9 5 7 _ 3 8

Genome Under Test

5 3 _ 8 3 _ 3 8

Find Difference for each corresponding element

• Closeness implies better match (4-3 is better than 1-5)

• Pauses ‘_’ must match in position to get any score, so it is either 0 or 9

X =

9-X =

↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑

4 1 9 3 4 0 0 0

5 8 0 6 5 9

Total Score = 5+8+0+6+5+9+9+9 = 51

9

Higher number is better now !

9

Fitness Score % = (Total/TotalPossible)*100

= 51/72 * 100 = 70.83% 22

Selection

• The higher the fitness score, the higher the probability of being selected.

• Selection methods include the Roulette

Wheel, Tournament

Selector, and

Truncation Selection

• In my experiment, I used a Roulette

Wheel for selection.

23

Crossover

When two chromosomes from the group are selected they are combined to create a new genome.

Dependent on the crossover rate the bits from each chosen genome are crossed at a randomly chosen point.

The higher the crossover rate is, the more likely it is that a crossover will occur.

The crossover occurs at a randomly chosen point in the genome.

24

Mutation

• Depending on the mutation rate, chosen bits of the genome are changed.

• The higher the mutation rate, the more likely it is that a bit will be changed.

• Shown to the right are many types of mutation

25

Mutation

• In my experiment I used two different mutation functions

– Swap mutation

– myMutator

• I created my own mutator which changes a single bit, rather than swapping two bits.

26

Terminating Conditions

This generational process is repeated until a termination condition has been reached. Common terminating conditions are

* A solution is found that satisfies minimum criteria

* Fixed number of generations reached

* Allocated budget (computation time/money) reached

* The highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results

* Manual inspection

* Combinations of the above.

I used a fixed number of generations as the ending criteria.

Default-4,000 generations; I also experimented with changing the number of generations.

27

Basic Genetic Algorithm Flow

initialize population select individuals for mating based on Fitness Function mate individuals to produce offspring mutate offspring insert offspring into population are stopping criteria satisfied?

finish

28

GA for Lip Synchronization

Automated Mode

Interactive Mode

Test Sound Input length A

Matching Sequence for Automating

Fitness Fn Evaluation

Initial Set of genomes representing lip movements

(initial population for GA)

These are dynamically generated by program

GA Engine

In real application, input to

Fitness Function is dynamic, language independent, and it doesn’t matter if people have different accents, talk slower/faster, etc. original sound input

ESRA Robot

Shows Lips

Movements

Interactive

Input to

Fitness

Function B

Sequence representing Lip movements matching with input stream ‘A’

29

Genetic Algorithm Behaviors

Input Le ngth vs . Tim e (My Mutator)

14.000

12.000

10.000

8.000

6.000

4.000

2.000

0.000

1 2 4 8 16 32 64 128

Input Le ngth (num be r of characte rs )

Mutation Rate-Swap Mutator

95.000

90.000

85.000

80.000

75.000

70.000

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Mutation Rate (0-1)

Mutation Rate (My Mutator)

120

100

80

60

40

20

0

0 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Mutation Rate (from 0-1)

Crossover Rate vs. Objective Score

90.000

88.000

86.000

84.000

82.000

80.000

78.000

76.000

74.000

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Crossover Rate (0-1)

Num be r of Ge ne rations vs . Obje ctive

Score

100.000

80.000

60.000

40.000

20.000

0.000

1

16 64

256 1024 2048 4096 8192

16

38

4

Num be r of Ge ne rations

Num ber of Generations vs. Avg Tim e

6.000

5.000

4.000

3.000

2.000

1.000

0.000

1

16 64

256 1024 2048 4096 8192

16

38

4

Num ber of Generations

40.000

35.000

30.000

25.000

20.000

15.000

10.000

5.000

0.000

Population Size vs. Avg. Tim e

1 16 32 64 128 256 512

Population Size

GA Results thus far..

• Created a self-learning robot that can learn how to synchronize sounds and words with appropriate facial expressions.

• Finding the best solution depends on different conditions. In general, I noticed that the functions that gave the higher objective scores tended to take more time to complete 4,000 generations.

34

Ongoing work

• Combining Quantum Fuzzy Logic to Robotic

Theatre.

• Modify the body language (hand and arm movements) based on environmental sensors

– Sound Sensors (fuzzy value input) to detect noisy or quiet environments and modify behavior

– Light sensor values (fuzzy value input) to detect day and nights and modify behavior

• Quantum Fuzzy Schrödinger Cat sitting on

Quantum Fuzzy Braitenberg vehicle arguing with

Einstein, singing a song and going crazy  .

35

Cat Singing

A lively little quantum went darting through the air, Just as happy quanta go speeding everywhere

………..

Thank You

37

Genetic Algorithms

A genetic algorithm is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are a particular class of evolutionary algorithms that use techniques such as inheritance, mutation, selection, and crossover.

38

Traditional Method

(Without Genetic Algorithms)

Audio

Speech

Recognition

Phonetic

Letters,

Punctuation, and syllables

Language

Dependent

*** Since the matching function is static, it will have to be entirely recoded for different people: they have different accents, talk slower/faster, etc.

Sequence representing Lip movements matching with audio input string.

ESRA Robot

Shows Lips

Movements

Matches input to correct lip motion:

Static

39

ESRA Robot Facial Expressions

• ESRA Robot has several motors for lips, eyelids and arm movements

• I am primarily using lip motors for my experiment

• Specific position of lip motors define the shape of the lip

• The shape can be matched with speech

Motor for

Eye Lids

Motor for

Upper Lip

Motor for

Lower Lip

40

Crossover

• Single Point Crossover

• Double Point Crossover gives any two points on each genome an equal chance of being split up.

• In my experiment, I used a single point crossover with a 90 percent crossover rate.

41

Procedure

1.

Create a robot with a face, a mouth, and two motors for lip movement.

2.

Assign shapes of the mouth for every sound/syllable

3.

Encode these shapes using numbers and characters

4.

Create a random set of genomes for a given input.

5.

Depending on the number of encodings that match with the appropriate sound, a fitness function will be assigned to each genome.

6.

Using a Roulette Wheel, genomes will be selected for reproduction. The higher the fitness score: the higher the probability of being selected for reproduction.

7.

To create a new set of offspring, one random crossover point will be chosen for each pair of genomes.

8.

There will also be a 1% mutation rate.

9.

A new set of genomes (the offspring) are created.

10.

Repeat steps 5-9 for a fixed number of generations.

11.

Change the Genetic Algorithm parameters and record the dependent variables.

42

Program

• I used GALib from MIT lab as a library in my program.

• I designed my own genome

• Defined my fitness function

• Created an initializer function

• Created a mutator function

• Program link - Project file

• EsraGA - Main C++ source code

43

Data

Data Tables with swap mutator

Data Tables with my mutator

44

Abstract

The purpose of this project is to create efficient Genetic Algorithms for robotic learning and the synchronization of speech and visual expressions.

This experiment uses an ESRA robot which has a set of motors to control facial expressions including lip motion and eyebrow motion. Emotions can be created using facial expressions and arm motion; however, for the simplicity of this experiment, the focus is on lip motion. Various shapes of the mouth are assigned to the appropriate sounds and encoded. Using these encodings I create a random set of chromosomes. I then use Genetic

Algorithms so the robot can develop the lip motion to correspond with spoken text. Next, I use the Genetic Algorithm to test how long it takes to synchronize text and lip motion for varying length, crossover rate, mutation rate, number of generations, population size, and number of offspring.

Overall, I concluded that my hypothesis was supported because using genetic algorithms for behavioral evolution, I was able to create a robot that can learn how to synchronize sounds and words with appropriate facial expressions. After testing various parameters, I concluded that functions that return higher objective scores, take a longer time to complete. Some applications of this project include translating text into lip motion for animation movies and humanoid robots. The next step in this project would be to try different parameters such as convergence and migrating populations. I could also develop body language as well as lip motion.

45

Applications

• With a program using genetic algorithms, matching lip movements to speech are language independent. Also, one can use the same program for different people. In the traditional style, the tables would have to be recoded because everyone has individual accents, body language, and how fast they talk.

• This program can be used to match text and lip motion for movie animation and humanoid robots.

• Animation industries don’t have to hand draw lip motion or use a databank of words. This would be most affective if I used a combination of pre-programmed lipcodes and user inputs.

• This could be used to convert sounds into lip motion so deaf people can understand what is being said in situations in which they can’t see the person who is speaking. I

• t could also be used in reverse and convert lip motion into text. This could be useful in documenting presentations, speeches, and even court cases. It could also be used to create subtitles in movies.

46

Representing Fuzzy Values on

Bloch Sphere

• Show L1 through L5 options

47

Synchronizing Lips with Speech

Want This

Not This

48

Download