Recently, we have witnessed a boom in applications for 3D talking face generation. However, most existing 3D face generation methods can only generate 3D faces with a static head pose,… Click to show full abstract
Recently, we have witnessed a boom in applications for 3D talking face generation. However, most existing 3D face generation methods can only generate 3D faces with a static head pose, which is inconsistent with how humans perceive faces. Only a few papers focus on head pose generation, but even these ignore the attribute of personality. In this paper, we propose a unified audio-driven approach to endow 3D talking faces with personalized pose dynamics. To achieve this goal, we establish an original person-specific dataset, providing corresponding head poses and face shapes for each video. Our framework is composed of two separate modules: PoseGAN and PGFace. Given an input audio, PoseGAN first produces a head pose sequence for the 3D head, and then, PGFace utilizes the audio and pose information to generate natural face models. With the combination of these two parts, a 3D talking head with dynamic head movement can be constructed. Experimental evidence indicates that our method can generate person-specific head pose sequences that are in sync with the input audio and that best match with the human experience of talking heads.
               
Click one of the above tabs to view related content.