Eating disorders (EDs) are characterised by abnormal eating habits and obsessive thought about food, weight, shape, and body image. EDs are experienced by a significant portion of our population. Social… Click to show full abstract
Eating disorders (EDs) are characterised by abnormal eating habits and obsessive thought about food, weight, shape, and body image. EDs are experienced by a significant portion of our population. Social media is identified as a possible source of influence for EDs, and there is growing evidence of a large amount of ED-related discussions on the Web via social media platforms, such as Twitter. With this growing trend, automatic content analysis for EDs is becoming increasingly important. To date, there does not exist any comprehensive benchmark ED lexicon to identify ED-related conversations that would, in turn, facilitate these content analysis tasks. In this paper, we propose a novel method for generating a lexicon base for ED language, called EDBase. The method starts with collecting over 3.7 million ED-focused tweets. In order to semantically represent potential ED terminology in a vector space, an ED word embedding model (EDModel) is trained. Then we develop a novel multi-seeded hierarchical density-based algorithm with contrasting corpora for ED lexicon expansion. The EDModel is queried by the proposed lexicon expansion algorithm to expand the seed terms to a comprehensive lexicon base. Our EDBase consists of a (further expandable) list of 3794 high-quality ED terms, quantified by an ED score, and linked to their parent terms. The proposed method significantly outperforms all existing alternative baseline methods and models by over 25% in terms of precision and 1500 in terms of true positives. This research is expected to be impactful in the health data science and healthcare community.
               
Click one of the above tabs to view related content.