Solutions
- Data Maturity Assessment
  
  Roadmap to improve data & AI maturity.
  
  Pulse
  
  Supercharge workflows with Gen AI.
  
  Arc
  
  AI & ML workload management
  
  Solutions catalogue
  
  Purpose-built generative AI solutions.
Services
- Data modernisation
  
  Create intelligent infrastructure.
  
  Data strategy
  
  Modernise your data & AI strategy.
  
  Generative AI
  
  Leveraging generative AI on AWS.
  
  AIOps
  
  Manage AI & ML workload lifecycles.
Industries
- Retail
  
  Drive retail success with AWS.
  
  Sports
  
  Enhancing how sport is experienced.
  
  Telcom
  
  Transforming telecommunications.
  
  Finance
  
  Reinventing the future of finance.
  
  Media & Entertainment
  
  Unlock the future of media.
  
  Healthcare & Life Sciences
  
  Innovating healthcare with AWS.
  
  View all
  
  Applying expertise across industries.
Content hub
- Case studies
  
  Learn about customer projects.
  
  Insights
  
  Find out the latest insights.
  
  Events
  
  The latest Firemind & AWS events.
  
  Press
  
  Latest news & achievements.
About us
- About us
  
  Who we are and what we do.
  
  Careers
  
  Discover your next opportunity.
  
  Partnerships
  
  Working together to achieve more.
  
  Blog
  
  Latest news from Firemind.
  
  Podcast
  
  Find all episodes from the Full Circle Podcast.
Get in touch

Amazon SageMaker model monitor – Detecting language data drift

It’s been a few months since we last spoke about language data and the AWS tools made to navigate NLP (Natural Language Processing) data drift. Within that time, the specialist AI/ML Solutions Architects within the Amazon SageMaker team have refined Amazon SageMaker Model Monitor to better detect and combat this drift. In this AWS insight, we look at an AWS approach to detecting data drift in text data, as well as running through some of the basic terminology.

What’s Natural Language Processing (NLP)?

NLP is a field of computer science that deals with applying linguistic and statistical algorithms to text in order to extract meaning in a way that is very similar to how the human brain understands language.

What’s data drift?

So, the short and skinny version would be, imagine looking at two versions of the same or similar datasets, both taken at different times. When you do a comparative review of each of these datasets, if the data histograms do not overlap significantly, we say that data has drifted.

So what have AWS been working on to help?

Amazon SageMaker Model Monitor is helping to continuously monitor the quality of your ML models in real time! With Model Monitor, you can configure alerts to notify and trigger actions if any drift in model performance is observed. Early and proactive detection of these deviations allows you to take corrective actions, such as collecting new ground truth training data, retraining models, auditing upstream systems, without having to manually monitor models or build additional tooling.

Model Monitor currently offers four different types of monitoring capabilities to detect and mitigate model drift in real time:

• Data quality – Helps detect change in data schemas and statistical properties of independent variables and alerts when a drift is detected.
• Model quality – For monitoring model performance characteristics such as accuracy or precision in real time – Model Monitor allows you to ingest the ground truth labels collected from your applications. Model Monitor then automatically merges the ground truth information with prediction data to compute the model performance metrics.
• Model bias – Model Monitor is integrated with Amazon SageMaker Clarify to improve visibility into potential bias. Although your initial data or model may not be biased, changes in the world may cause bias to develop over time in a model that has already been trained.
• Model explainability – Drift detection alerts you when a change occurs in the relative importance of feature attributions.

Data drift classifications

Covariate Shift

In a covariate shift, the distribution of inputs changes over time, but the conditional distribution P(y|x) doesn’t change. This type of drift is called Covariate shift because the problem arises due to a shift in the distribution of the covariates (aka features).

Label Shift

While Covariate shift focuses on changes in the feature distribution, label shift focuses on changes in the distribution of the class variable. This type of shifting is essentially the reverse of Covariate shift. An intuitive way to think about it might be to consider an unbalanced dataset. If the spam to non-spam ratio of emails in our training set is 50%, but in reality 10% of our emails are non-spam, then the target label distribution has shifted 👍.

Concept Shift

Concept shift is different from covariate and label shift in that it’s not related to the data distribution or the class distribution, but instead relates to the relationship between the two variables. For example, email spammers often use a variety of concepts to pass the spam filter models and the concept of emails used during training may change as time goes by.

Follow the full solution and step by steps

If you’d like to see the granular steps and follow the guide through creating baselines, evaluating scripts, using Model Monitor custom containers and more, click here.

If you’re in a position where you have exemplary amounts of data and no idea how to collect, archive and retrieve (whilst minimising drift) in a secure, fast and cost efficient manner, reach out to us today!

Get in touch

Want to learn more?

Seen a specific case study or insight and want to learn more? Or thinking about your next project? Drop us a message!

Solutions

Accelerating your Generative AI journey

Data Maturity Assessment

Pulse

Arc

Solutions Catalogue

Services

Unlock long-lasting business value

Data modernisation

Data strategy

Generative AI

AIOps

Industries

Driving innovation across sectors

Retail

Sports

Telcom

Financial Sevices

Media & Entertainment

Healthcare & Life Sciences

Case studies

Insights

Events

Press

Firemind Launches Pulse and Arc at AWS London

About us

Careers

Partnerships

Blog

Podcast

Cassio Milani joins Firemind as an Senior Solutions Architect