Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult… Click to show full abstract
Egocentric vision holds great promise for increasing access to visual information and improving the quality of life for blind people. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users tend to include their hand either interacting with the object they wish to recognize or simply placing it in proximity for better camera aiming. We propose a method that leverages the hand as the contextual information for recognizing an object of interest. In our method, the output of a pre-trained hand segmentation model is infused to later convolutional layers of our object recognition network with separate output layers for localization and classification. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves more accurate localization than other approaches that encode hand information. Given only object centers along with labels, our method achieves comparable classification performance to the state-of-the-art method that uses bounding boxes with labels.
               
Click one of the above tabs to view related content.