LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Technical perspective: Entity matching with Magellan

Photo from wikipedia

vices ecosystem. PyMatcher is intended for a “power user” who possess knowledge about entity matching, programming, and basic machine learning while CloudMatcher is targeted for “lay users” who may not… Click to show full abstract

vices ecosystem. PyMatcher is intended for a “power user” who possess knowledge about entity matching, programming, and basic machine learning while CloudMatcher is targeted for “lay users” who may not know how to program or possess machine learning knowledge. PyMatcher provides how-to guides that describe how to approach the development of entity matching workflows. These guides describe how to develop a solution for a small sample of data (by downsampling, blocking, and training a matcher) and how to scale the solution to work with production data. The entity matching workflow for CloudMatcher is similar to that of PyMatcher except that CloudMatcher actively learns from the user how to block tuples. Afterwards, it executes the blocking rules that are learnt to obtain a set of candidate pairs of tuples and again actively learns from the users what are the (non-)matching candidate pairs of tuples before deriving a model that can be applied to match tuples across two tables. In short, Magellan makes it easy to develop an entity matching solution and easy to interoperate with other tools to form a bigger data integration pipeline that solves larger problems. It is a showcase for practical software development tools that originate from data management research. It has been successfully applied to multiple entity matching problems in the real world, is used in production at many data science groups and companies, and is recently being commercialized, demonstrating that using data science ideas to build entity matching systems is highly promising. For more details, check out Magellan’s website at https://sites.google.com/site/ anhaidgroup/projects/magellan.

Keywords: matching magellan; entity matching; perspective entity; entity; technical perspective; solution

Journal Title: Communications of the ACM
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.