Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at… Click to show full abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems. Author Summary Computational object vision represents the new frontier of brain models, but do current artificial visual systems known as deep convolutional neural networks (DCNNs) represent the world as humans do? Our results reveal that DCNNs are able to capture important representational aspects of human vision both at the behavioral and neural levels. At the behavioral level, DCNNs are able to pick up contextual regularities of objects and scenes thus mimicking human high-level semantic knowledge such as learning that a polar bear “lives” in ice landscapes. At the neural representational level, DCNNs capture the representational hierarchy observed in the visual cortex all the way up to frontoparietal areas. Despite these remarkable correspondences, the information processing strategies implemented differ. In order to aim for future DCNNs to perceive the world as humans do, we suggest the need to consider aspects of training and tasks that more closely match the wide computational role of human object vision over and above object recognition.
               
Click one of the above tabs to view related content.