What is lipsync?

  • The process of matching pre-recorded dialogue with lips movement to make a character appear to talk, sing or otherwise communicate in some way.

[ next ]

 

Methods to Animate Lipsync

  • Animated texture maps.

  • Bones

  • FFD (Free Form Modifier)

  • Weighted Morphing

  • Motcap

  • vTracker

[ next ]

 

Lipsync Software

[ next ]

 

2-Frame Rule

  • The sound should lag 2 frames behind the key frames i.e the action should happen, then followed by the sound.

    However, most novice are misled by the speed of light and sound theory. 

    Reasons for the 2 frames lag is:
     

    • You need to form the shape of the mouth before u force air through it to effectively create the desired sound you need.

     

    • It allows you to at least hold the shape for 2 frames so that they read better in the animation. Tips: All consonants should hold for 2 frames to get a good read.

    The above 2 reasons debunk the old "sound is slower than light" theory for lipsync. Also, because conversation happens in such close distance, the difference in speed of light and sound is not noticeable.

  • Example 1

    Try saying "you", notice that you actually pucker up before your lung pushes air out to create the sound. This is actually true for most if not all phonemes!

  • Example 2

    Now, try saying the letter "k", notice that first, you make a mouth shape "kh" then as the air comes out from your lung, your mouth opens slightly and transforms to an "ay" shape.

    [ next ]

 

Phonemes

  • The smallest phonetic unit distinguished by the speakers of a particular language. Every phoneme dictates an unique mouth shape to produce a specific sound.
      

  • To animate lipsyncing in 3D animation, phonemes are modelled and often referred as sound shapes or morph targets. These morph targets are animated against the dialogue to make the character appears to talk.

    There are 9 basic phonemes in English that most people use. However, some animators will breakdown the phonemes further into 13 or more. 
     

  • Vowels

  • Consonants

[ next ]

 

Common Approach

  1. Create morph targets.

  2. Listen to the dialogue.

  3. For every phoneme you hear, move the slider of the morph target to 100%.

  4. Make a preview of the lipsync animation.

  5. Watch the mouth flap out of control and wonder what went wrong?

Compare [ Novice ] [ Impressionism ] [ Professional ] [ Professional (Rendered) ] - DivX compression

[ next ]

 

Impressionism Application

  • In lipsync animation, words are not represented by letters but the shape of the sound. The common problem with beginners with the above approach is that they tend to associate letters with sound. 
     

  • However, a suggested approach is to FEEL and INTERPRET the speech like an impressionist would. 
     

  • Example 1

"What do you think you are doing?"
Novice approach:
w
Ah
t
d
ooo
y (pucker)
ooo
th
i
n
k
y (pucker)
oo
r
d
oo
i
n
g

The above result would be very flappy because of trying to hit all the letters in the words.

A more impressionistic interpretation would be to emphasize the following major accents:
wha-do-ooo-theenk-ooo-ing. So a good mix for the sound shapes would be something like this:

Impressionist approach:
w (pucker)
ah
d
oo
oo
th
ee
n
oo
r
d
oo
ee
n

The 'i' in 'think' and the 'i' in 'doing' are actually eee sounds. There is no need to put the Y shape in the 'do you'. If you watch yourself saying 'do you' in a mirror, you will notice that the mouth hardly even moves to denote the 'y'. That is because you can do it mostly with the tongue inside the mouth. So you can keep the 'ooo' shape right through the 'do you' part. Again the Y of 'you're' after 'think' is lost in the transition from the 'k' in 'think'. It is inside the mouth, so the lips do not really need to show it. And there is no need to do the 'g' at the end of 'doing' because it is generally the same thing as leaving it on the 'n' of 'doing'. The 'n' of 'doing' will be mostly an 'eee' shape because the 'n' will be highly influenced by the 'eee' sound of the 'i' in 'doing'.

Example 2

Say in front of the mirror, "I love u". Then say "Elephant Shoes" 
Notice how similar the 2 are in how they look?

Compare [ Novice ] [ Impressionism ] [ Professional ] [ Professional (Rendered) ] - DivX compression

[ next ]

 

Facial Expression (Basic)

No matter how good a lipsynching animation may be, it will not work well if inappropriate facial expressions are used. A good understanding of the facial muscles is required to animate facial expressions.

  • Facial Muscles

Out of 26 or so muscles that move the face, 11 are responsible for facial expression.


Click above image for details

  • There are 6 basic expressions to convey emotions: 

Example [ Expression Test ] - DivX compression

[ next ]

 

Facial Expression (Additional Morph Targets)

And apart from the 6 basic expression, additional morph targets are to be created. There is no need to create every single morph target below. However, a simple guideline is to separate the left, right, top and bottom of the face when you create your morph targets. This ensure that you will have superior controls over the facial movement of your character. For example, you can make your character grin left or right, sneer left or right or blink right or left. Another reason is to break the symmetry movement in the animation which is also important and crucial for good animation.. 

Compare [ Novice ] [ Impressionism ] [ Professional ] [ Professional (Rendered) ] - DivX compression

[ next ]

 

English Vs Japanese

  • English has the largest variety of phonemes and can be used to animate most languages. 
     

  • For example, in Japanese, there are 5 vowels, 46 basic phonemes which can be further broken down into 104 phonemes

a i u e o
'
k
s
t
n
h
m
y
r
w
n

 

a i u e o
g
z
d
b
p

 

ya yu yo
k
s
t
n
h
m
g
z
d
b
p
  • However, English phonemes are more than enough to represent these 104 different sounds. The good news is that you can use the 5 vowels of Japanese to represent the 104 phonemes because each phoneme is made up of one vowel and one consonant in Romanji and the vowel dictates the shape of the phoneme.
     

  • Vowels


 

(pronounced as AH) is similar to the English phoneme A, I

 


  (pronounced as Yee) is similar to the English phoneme E

 

 

(pronounced as WOO) is similar to the English phoneme W, OO, Q

 


(pronounced as AYE) is similar to the English phoneme A, I ; This phoneme is similar to except the tongue position is different. But unless doing closeups, you can get away with using the phoneme.

 


(pronounced as O) is similar to the English phoneme O
  • Japanese Accent 
     

    • Japanese accent is pitch type (High or Low in fundamental frequency).

     

    • English accent is stress type (Strong or Weak in speech power). 

     

    • High pitch sound requires more air to be forced out through the throat ; Low pitch requires lesser. So generally, this affects variation of the mouth shape.

     

  • Example 

Locus of Fundamental Frequencies


 

HL(L), chopsticks

 

LH(L), bridge

 

LH(H), edge

Example and graphs quoted from 
http://sp.cis.iwate-u.ac.jp/sp/lesson/j/doc/accent.html

  • Interesting Facts

    1) Japanese language in nature do not have the sound R, F, TH, V. 

    2) Japanese cannot differentiate between R and L.

    3) Japanese do not move their lips very much when talking. Alot is done by the tongue movement.
      

  • Is Japanese lipsync technique feasible for English language animation?
    YES, it is possible but u need to know how Japanese speak English.
     

  • Example 1

The English word "weekend" becomes a six-syllable word when pronounced in Japanese - "u-ii-ku-e-n-do" (oo-ee-koo-en-doh). 

  • Example 2

The two popular search engines Google and Yahoo, the Japanese pronounced "goh-guru" and "yah-hoe" 

  • It is fun but difficult to know how Japanese adopt English according to phonetic rules that make sense to them.

[ next ]

 

Conclusion

Lipsync is only part of facial animation. Good lipsync must be bundled with good facial expression and body language to get the message across the audience. The bottom line is if it looks good, then it is good.

[ next ]

 

References

Online

Michael B. Comet (1998). Lip Sync - Making Characters Speak [online]. http://www.comet-cartoons.com/toons/3ddocs/lipsync/lipsync.html

Gary C Martin (1997). Lipsynch: [online]. http://www.geocities.com/~gcmartin/mouth_shapes.html

Gary C Martin (1998). Lipsynch: Phoneme Examples [online]. http://www.geocities.com/~gcmartin/phoneme_examples.html

Henk Dawson. Jack [online]. http://d3d.com/heads/Art/round_4/jack.html

Keith Lango (2001). Principles of Lip Sync Animation [online]. http://www.keithlango.com/lipSync.html

Jouji Miwa (2000) Language Education System for Speech on an On-demand Network (LESSON) [online] http://sp.cis.iwate-u.ac.jp/sp/lesson/j/doc/kana.html

Jouji Miwa (2000) Fundamentals of Experimental Phonetics [online] http://sp.cis.iwate-u.ac.jp/sp/jp/phonetics.html

Yoshiko (1999) Linguistic Technical Terms [online] http://www.sfo.com/~ucathinker/earth/english/phone/tech.htm

 

Books

Gary Fagin 1990, The Artist's Complete Guide to Facial Expression. Watson-Guptill Publications.

Andras Szunyoghy, Dr. Gyorgy Feher 1999, Human Anatomy For Artists. Konemann.

Preston Blair, 1994. Cartoon Animation. Walter Foster Publishing.

Frederick I. Parke, Keith Waters 1996. Computer Facial Animation. A K Peter Ltd.

Bill Fleming, Darris Dobbs 1999. Animating Facial Features and Expressions. Charles River Media.

Tadashi Ozawa 2001, How to Draw Anime & Game Character Vol 2 : Expressing Emotion. Nippan IPS.

 

Special Thanks

Keith Lango, 3D Animator, Big Idea

Sonny, 3D Freelancer, Orange3d

Marc Tan, 3D Freelancer, The Hand

Julian Khor, 3D Modeller (Lead), Typhoon Digital