LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

0431 How can we avoid re-identification risk in big-data analysis? proposition of a new strategy of geographical subdivisions using gis tools

Photo by neom from unsplash

In order to look for relevant signals for detection of emerging occupational diseases among agricultural workers, we developed a data-mining approach applied on health-insurance data (see C. Maugard communication). Applied… Click to show full abstract

In order to look for relevant signals for detection of emerging occupational diseases among agricultural workers, we developed a data-mining approach applied on health-insurance data (see C. Maugard communication). Applied on the databases of the French dedicated social security system (MSA), this approach first aims to look for associations between chronic diseases and occupational activities (recorded as activity sector codes in the MSA contributors database). To avoid re-identification, workers location has not been provided, although it is recognised as closely related to cultural practices. Therefore, it was not possible to directly estimate individuals involvement in specific cultures (through existing parcel register and agricultural census for instance) and finally use cultures x pesticides to estimate pesticides exposures. To deal with this issue, we used an innovative approach to cut off the national territory into ”meshes”, to obtain a geographical variable accurate enough to assess cultures types while respecting a sufficient number of agricultural workers per meshes to avoid re-identification. This approach consists of an iterative process dividing each geographical unit into 4 parts while respecting a minimum threshold of workers in each mesh. The process continues until each mesh contains a homogeneous number of individuals. Taking into account the prevalence of the chronic diseases of interest, and typology of cultures, we defined a minimum number of individuals per mesh (n=1500). This methodological development allows us to get indirect information about location by MSA at a level interesting to identify cultures (proxy for pesticides use), but restricting the possibilities of individuals re-identification.

Keywords: 0431 avoid; risk big; identification risk; approach; identification; avoid identification

Journal Title: Occupational and Environmental Medicine
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.