Social media has drawn scientists’ attention due to their potential values for health and marketing, among other activities. To effectively exploit such huge online resources by using incomplete user profiles,… Click to show full abstract
Social media has drawn scientists’ attention due to their potential values for health and marketing, among other activities. To effectively exploit such huge online resources by using incomplete user profiles, which are very common on social media sites and can be caused by user privacy settings, it is meaningful to explore user profile identification methods. In this paper, we focus on gender identification using reposting behaviors on social networks. Whereas most existing works rely on pure statistical methods, we propose a scheme that is underpinned by homophily and four intuitive methods based on it by combining knowledge of statistics and sociology. For our data set, which was obtained from Sina Weibo and contained 1039 test samples and 528k user profiles, our methods perform with 86.7% accuracy. We explore the sensitivity of our methods on the scale of the data set and find surprisingly competitive results surpassing the binary classification baseline. Finally, we further suggest possible extensions to our methods.
               
Click one of the above tabs to view related content.