Healthcare disparities in multiethnic medical data is a major challenge; the main reason lies in the unequal data distribution of ethnic groups among data cohorts. Biomedical data collected from different… Click to show full abstract
Healthcare disparities in multiethnic medical data is a major challenge; the main reason lies in the unequal data distribution of ethnic groups among data cohorts. Biomedical data collected from different cancer genome research projects may consist of mainly one ethnic group, such as people with European ancestry. In contrast, the data distribution of other ethnic races such as African, Asian, Hispanic, and Native Americans can be less visible than the counterpart. Data inequality in the biomedical field is an important research problem, resulting in the diverse performance of machine learning models while creating healthcare disparities. Previous researches have reduced the healthcare disparities only using limited data distributions. In our study, we work on fine-tuning of deep learning and transfer learning models with different multiethnic data distributions for the prognosis of 33 cancer types. In previous studies, to reduce the healthcare disparities, only a single ethnic cohort was used as the target domain with one major source domain. In contrast, we focused on multiple ethnic cohorts as the target domain in transfer learning using the TCGA and MMRF CoMMpass study datasets. After performance comparison for experiments with new data distributions, our proposed model shows promising performance for transfer learning schemes compared to the baseline approach for old and new data distributation experiments.
               
Click one of the above tabs to view related content.