Supervised machine learning models are trained on historical data to learn a static mapping between their input and output variables. However, when they are deployed on continuously streamed data, whose… Click to show full abstract
Supervised machine learning models are trained on historical data to learn a static mapping between their input and output variables. However, when they are deployed on continuously streamed data, whose nature is likely to change over time (data or concept drift), model performance may suddenly and substantially degrade, forcing practitioners to continuously update the models to reflect the new data distribution. Few methods, however, are available to reliably detect data drift on heterogeneous data types (structured and unstructured), possibly without requiring labeled data at inference time. In this article, we review existing methods for dataset drift detection, discuss their applicability to deep neural networks, and experiment on a practical case study related to semistructured document analysis.
               
Click one of the above tabs to view related content.