LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Bayesian Networks for Data Integration in the Absence of Foreign Keys

Photo from wikipedia

In the era of open data, a single data source rarely contains all of the attributes we need for inference in specific applications. For example, a marketing department may aim… Click to show full abstract

In the era of open data, a single data source rarely contains all of the attributes we need for inference in specific applications. For example, a marketing department may aim to integrate retailer-specific purchase data with separate demographic data for purposes of targeted advertising – a capability not possible with either dataset alone. In this work, we address two key desiderata of an automated framework for probabilistic data integration over multiple data sources: (1) we require that each relational data source share at least one attribute with another relational data source, but we do not require these attributes to be foreign keys (e.g., attributes such as gender, age, and postal code are not foreign keys because they do not uniquely identify individuals in a data source) and (2) we require inference to be probabilistic to reflect inherent uncertainty in population-level predictions given the absence of foreign keys. While some frameworks such as Probabilistic Relational Models (PRMs) address point (2), they do not address point (1) since they rely on foreign keys to link tables. To achieve both desiderata simultaneously, we develop an automated framework to construct Bayesian networks for data integration capable of answering any probabilistic query spanning the attributes of multiple relational data sources. We demonstrate that our framework is able to closely approximate the inference of a global Bayesian network over a single relation that has been projected onto multiple local relations and further investigate properties of local relations such as the number of shared attributes and their cardinality to understand how these properties affect the quality of inference.

Keywords: absence foreign; data source; bayesian networks; foreign keys; data integration

Journal Title: IEEE Transactions on Knowledge and Data Engineering
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.