Given a face image, most of previous works in gaze estimation infer the gaze via a well-trained model with supervised training. However, the distribution of test data may be very… Click to show full abstract
Given a face image, most of previous works in gaze estimation infer the gaze via a well-trained model with supervised training. However, the distribution of test data may be very different compared to that of training data since samples might be corrupted in real-world scenarios (e.g., taking a photo in strong light). This will lead to a gap between source domain (i.e., training data) and target domain (i.e., test data). In this paper, we first introduce self-supervised learning into our method for addressing challenging situations in gaze estimation. Moreover, existing appearance-based gaze estimation methods focus on directing towards the development of powerful regressors, which mainly utilize face and eye images simultaneously or face (eye) images only. However, the problem of inter cues between face and eye features has been largely overlooked. To this end, we propose a novel Modulation-based Adaptive Network (MANet) for gaze estimation, which uses high-level knowledge to filter the distractive information and bridges the intrinsic relationship between face and eye features. Further, we combine self-supervised learning and MANet to learn to adapt to challenging cases, such as abnormal lighting conditions and poor-quality images, by minimizing a self-supervised loss and a supervised loss jointly. The experimental results on several datasets demonstrate the effectiveness of our proposed approach with a real-time speed of 900 fps on a PC with an NVIDIA Titan RTX GPU.
               
Click one of the above tabs to view related content.