OBJECTIVE Recent advances in Web 2.0 technologies have seen significant strides towards utilizing patient-generated content for pharmacovigilance. Social media-based pharmacovigilance has great potential to augment current efforts and provide regulatory… Click to show full abstract
OBJECTIVE Recent advances in Web 2.0 technologies have seen significant strides towards utilizing patient-generated content for pharmacovigilance. Social media-based pharmacovigilance has great potential to augment current efforts and provide regulatory authorities with valuable decision aids. Among various pharmacovigilance activities, identifying adverse drug events (ADEs) is very important for patient safety. However, in health-related discussion forums, ADEs may confound with drug indications and beneficial effects, etc. Therefore, the focus of this study is to develop a strategy to identify ADEs from other semantic types, and meanwhile to determine the drug that an ADE is associated with. MATERIALS AND METHODS In this study, two groups of features, i.e., shallow linguistic features and semantic features, are explored. Moreover, motivated and inspired by the characteristics of explored two feature categories for social media-based ADE identification, an improved random subspace method, called Stratified Sampling-based Random Subspace (SSRS), is proposed. Unlike conventional random subspace method that applies random sampling for subspace selection, SSRS adopts stratified sampling-based subspace selection strategy. RESULTS A case study on heart disease discussion forums is performed to evaluate the effectiveness of the SSRS method. Experimental results reveal that the proposed SSRS method significantly outperforms other compared ensemble methods and existing approaches for ADE identification. DISCUSSION AND CONCLUSION Our proposed method is easy to implement since it is based on two feature sets that can be naturally derived, and therefore, can omit artificial stratum generation efforts. Moreover, SSRS has great potential of being applied to deal with other high-dimensional problems that can represent original data from two different aspects.
               
Click one of the above tabs to view related content.