FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

Linze Li1, Sunqi Fan2, Hengjun Pu2, Zhaodong Bing1, Yao Tang1, Tianzhu Ye2, Tong Yang1, Liangyu Chen1, Jiajun Liang1, †,
1MEGVII Technology 2Tsinghua University
main figure

Abstract

Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontrollable human poses. To address these challenges, we introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities while ensuring frame consistency. This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models when incorporating a motion module. We propose two strategies towards this objective: training-free and training-based anchor frame methods. Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models, delivering substantial improve ments over the original outcomes in terms of facial fidelity, text-to-image editability, and video motion. Moreover, we introduce conditional control using a 3D parametric face model to capture accurate facial movements and expressions. This solution augments the creative possibilities for facial animation generation through the integration of multiple control signals.

🎬   Personalized Facial Animation   🎬


A close-up of a guy with a relaxed and easygoing expression A man, wearing a hat, against a beach background A man with a confident smile, making eye contact with the camera
A portrait of a woman in a natural setting, wearing a headband A close-up shot of a girl with glasses and earrings A girl adorned with a colorful tattoo on the cheek
A man with a goatee and a leather cowboy hat A man with a piercing gaze against a backdrop of mountain peaks A man with a surprised and delighted expression, showcasing a moment of unexpected joy
A beautiful woman starts to talk, bright, beautiful face, realistic, solo A close-up shot of a woman with elegant pearl earrings and a vintage hairstyle A close-up shot of a woman with vibrant, multicolored hair

Our main results. We select four representative LoRA character and perform personalized facial animation. Each line represents a different LoRA character (from top to bottom: Liam Neeson, Asian Girl, Suriya Sivakumar, Angelina Joile). For each LoRA character, we have employed three prompts to generate facial animations. The prompt corresponding to each video clip is annotated below it. Our FACC works effectively and generate frame-consistent results on diverse characters and prompts. Moreover, two advantages of FACC can be reflected in the above results: (1) face identity fidelity (2) editing capabilities by text prompts. Due to our key-frame strategy, we can better maintain fidelity to characters and text prompt, without being compromised by the motion module.

🎞   Control Results   🎞


Driving Template a close-up shot of a woman with vibrant, multicolored hair A man with a goatee against a backdrop of mountain peaks
Driving Template a woman, elegant angel, crown on the head, wings on the back, in a floral garden setting A portrait of a face against a brick wall, wearing a bandana
Driving Template A portrait of a man with a goatee, wearing a plaid shirt and surrounded by autumn leaves A portrait of a face against a graffiti-covered wall, wearing a necklace
Driving Template A girl is smiling, bright, beautiful-face,realistic,solo A portrait of a guy with a handlebar mustache, wearing a vintage pilot's jacket

Our control results. We demonstrate our FACC-with-control results. We select four representative driving templates, including common facial movements such as opening the mouth, blinking, turning the head, and their combinations. Each line represents a different driving template. Then we use the driving template to control the facial animation results, assisted by the text prompt. The specific control method is mentioned in the paper. The text prompt corresponding to each video clip is annotated below it. The above results indicate that we can generate controllable facial animations with targeted motion patterns.

BibTeX


      @misc{li2023faac,
        title={FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability}, 
        author={Linze Li and Sunqi Fan and Hengjun Pu and Zhaodong Bing and Yao Tang and Tianzhu Ye and Tong Yang and Liangyu Chen and Jiajun Liang},
        year={2023},
        eprint={2312.03775},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
      }