Solutions
- Data Maturity Assessment
  
  Roadmap to improve data & AI maturity.
  
  Pulse
  
  Supercharge workflows with Gen AI.
  
  Arc
  
  AI & ML workload management
  
  Solutions catalogue
  
  Purpose-built generative AI solutions.
Services
- Data modernisation
  
  Create intelligent infrastructure.
  
  Data strategy
  
  Modernise your data & AI strategy.
  
  Generative AI
  
  Leveraging generative AI on AWS.
  
  AIOps
  
  Manage AI & ML workload lifecycles.
Industries
- Retail
  
  Drive retail success with AWS.
  
  Sports
  
  Enhancing how sport is experienced.
  
  Telcom
  
  Transforming telecommunications.
  
  Finance
  
  Reinventing the future of finance.
  
  Media & Entertainment
  
  Unlock the future of media.
  
  Healthcare & Life Sciences
  
  Innovating healthcare with AWS.
  
  View all
  
  Applying expertise across industries.
Content hub
- Case studies
  
  Learn about customer projects.
  
  Insights
  
  Find out the latest insights.
  
  Events
  
  The latest Firemind & AWS events.
  
  Press
  
  Latest news & achievements.
About us
- About us
  
  Who we are and what we do.
  
  Careers
  
  Discover your next opportunity.
  
  Partnerships
  
  Working together to achieve more.
  
  Blog
  
  Latest news from Firemind.
  
  Podcast
  
  Find all episodes from the Full Circle Podcast.
Get in touch

Build event-driven data quality pipelines with AWS Glue DataBrew

As businesses collect more and more data to drive core processes like decision making, reporting and machine learning (ML), they continue to be met with difficult hurdles! Ensuring data is fit for use with no missing, malformed or incorrect content is first priority for many of these companies – and that’s where AWS Glue Databrew steps in!

Let’s build a fully automated, end-to-end event driven pipeline for data quality validation.

Now that’s a subtitle that gets us excited! As with any data journey, it’s a rocky road to forming structured, intelligent and readily queryable data.

AWS Glue DataBrew is a visual data preparation tool that makes it easy to find data quality statistics such as duplicate values, missing values and outliers in your data. You can also set up data quality rules in DataBrew to perform conditional checks based on unique business needs.

AWS Infrastructure diagram showing AWS services

High level architecture highlighting AWS step function workflows within AWS.

Step by step

The solution workflow contains the following steps:

1. When you upload new data to your Amazon Simple Storage Service (Amazon S3) bucket, events are sent to EventBridge.

2. An EventBridge rule triggers a Step Functions state machine to run.

3. The state machine starts a DataBrew profile job, configured with a data quality ruleset and rules. If you’re considering building a similar solution, the DataBrew profile job output location and the source data S3 buckets should be unique. This prevents recursive job runs. We deploy our resources with an AWS CloudFormation template, which creates unique S3 buckets.

4. A Lambda function reads the data quality results from Amazon S3 and returns a Boolean response into the state machine. The function returns false if one or more rules in the ruleset fail and returns true if all rules succeed.

5. If the Boolean response is false, the state machine sends an email notification with Amazon SNS and the state machine ends in a failed status. If the Boolean response is true, the state machine ends in a succeed status. You can also extend the solution in this step to run other tasks on success or failure. For example, if all the rules succeed, you can send an EventBridge message to trigger another transformation job in DataBrew.

You can follow the exact technical steps by visiting the blog post here. AWS have put together a thorough and detailed approach to testing this solution, even providing an AWS Serverless Application Model(AWS SAM) and example code.

Metin Alisho, Data Scientist at Firemind says: “Event driven pipelines make up the ‘bread and butter’ of many of our customer solutions. This AWS Glue DataBrew driven solution makes finding quality data between the outliers and duplicate values simple.”

Transforming your data, one column at a time!

These solutions using tools such as AWS Glue DataBrew mark just the beginning in what’s possible for data management and interpretation. Get in touch with us today to look at how we can work with your data.

Get in touch

Want to learn more?

Seen a specific case study or insight and want to learn more? Or thinking about your next project? Drop us a message!

Solutions

Accelerating your Generative AI journey

Data Maturity Assessment

Pulse

Arc

Solutions Catalogue

Services

Unlock long-lasting business value

Data modernisation

Data strategy

Generative AI

AIOps

Industries

Driving innovation across sectors

Retail

Sports

Telcom

Financial Sevices

Media & Entertainment

Healthcare & Life Sciences

Case studies

Insights

Events

Press

Firemind Launches Pulse and Arc at AWS London

About us

Careers

Partnerships

Blog

Podcast

Cassio Milani joins Firemind as an Senior Solutions Architect

Data Maturity Assessment

Pulse

Arc

Solutions catalogue

Data modernisation

Data strategy

Generative AI

AIOps

Retail

Sports

Telcom

Finance

Media & Entertainment

Healthcare & Life Sciences

View all

Case studies

Insights

Events

Press

About us

Careers

Partnerships

Blog

Podcast

Build event-driven data quality pipelines with AWS Glue DataBrew

Get in touch

Want to learn more?

Explore latest insights from Firemind

Cassio Milani joins Firemind as an Senior Solutions Architect

Personalising Retail Experiences with Generative AI: Beyond Recommendations

Firemind Launches Pulse and Arc at AWS London

Solutions

Services

Discover

About us

Resources