AI teams across verticals vehemently agree that their data and models must be monitored in production. Yet, many teams struggle to define exactly what to monitor. Specifically, what data to collect “at inference time”, what metrics to track and how to analyze these metrics.
As AI systems become increasingly ubiquitous in many industries, the need to monitor these systems rises. AI systems, much more than traditional software, are hypersensitive to changes in their data inputs. Consequently, a new class of monitoring solutions has risen at the data and functional level (rather than the infrastructure of application levels). These solutions aim to detect the unique issues that are common in AI systems, namely concept drifts, biases, and more.
If you’ve been struggling to get some transparency into your AI models’ performance in production, you’re in good company. Monitoring complex systems is always a challenge. Monitoring complex AI systems, with their built-in opacity, is a radical challenge.
Below, we use the term AI system. By this, we mean any software system that incorporates at least one predictive model, leveraging machine learning, statistical modeling, or any other AI techniques. A few examples: An automatic fraud detection system, a recommendation system, an image classification system, and a sentiment analysis of social media posts.