sujjad.tech

Awesome data quality resources

Awesome data quality resources

data-qualityawesomeawesome-resourcesdata

Awesome Data Quality Resources

A curated list of resources for testing, monitoring, and improving data quality across various data environments.

Table of Contents

Frameworks and Libraries

Open Source

  • elementary - Data monitoring and observability tailored to dbt. GitHub
  • mobydq - Tool for data engineering teams to run & automate data quality checks on their data pipeline. GitHub
  • ydata-quality - Python library for assessing data quality throughout stages of the data pipeline development. GitHub
  • great-expectations - Tool for data testing, documentation, and profiling. GitHub
  • deequ - Library by Amazon for defining unit tests for data with a focus on large datasets. Based on Apache Spark. GitHub
  • soda - Enables data testing through extended SQL queries. GitHub
  • dqm - Another data quality monitoring tool implemented using Spark. GitHub
  • owl-sanitizer - Lightweight data validation framework based on Spark. GitHub
  • griffin - Data Quality solution for distributed data systems at any scale in both streaming and batch data context. GitHub

Commercial

  • Bigeye - Continuous data quality monitoring and anomaly detection. Website
  • Soda - Data testing and monitoring platform. Website
  • Databand - Data pipeline observability and monitoring. Website
  • Monte Carlo - Data observability platform. Website
  • Sifflet - Data quality monitoring and observability. Website
  • Validio - Real-time data quality monitoring. Website
  • Lightup - Data quality checks and monitoring. Website
  • Lantern - Data quality and observability. Website
  • Metaplane - Data quality monitoring for data teams. Website
  • Datafold - Proactive data quality platform. Website
  • Acceldata - Data observability and quality management. Website
  • Anomalo - Automated data quality monitoring. Website
  • Marquez - Metadata service for collecting, aggregating, and visualizing a data ecosystem's metadata. GitHub

Books and Methodologies

  • Complete Data Quality Methodology (CDQM) - By Carlo Batini/Monica Scannapieco. Book
  • Data Quality Assessment Framework - By Arkady Maydanchik. Book
  • CIHI Information Quality Framework - From the Canadian Institute for Health Information. Resource
  • Enterprise Knowledge Management - By David Loshin. Book
  • MIKE2.0 - Open Source initiative for Enterprise Information Management. Website
  • Ten Steps to Quality Data and Trusted Information - By Danette McGilvray. Book
  • Total Information Quality Management (TIQM) - By Larry English. Book

Tools

Open Source Tools

  • Deequ - For defining unit tests for data. GitHub
  • dbt Core - Data transformation tool with built-in testing capabilities. GitHub
  • MobyDQ - Automates data quality checks. GitHub
  • Great Expectations - Data validation and profiling. GitHub
  • Soda Core - Python library for data reliability. GitHub
  • Cucumber - Behavior-driven development tool for data quality testing. GitHub

Commercial Tools

  • Ataccama - Comprehensive data quality and catalog suite. Website
  • Informatica - Data quality and observability platform. Website
  • Talend - Data quality solutions with real-time monitoring. Website
  • IBM InfoSphere QualityStage - Data quality and governance. Website
  • Precisely Trillium Quality - Enterprise data quality tool. Website
  • Adverity - Marketing data integration with data quality management. Website
  • Oracle Enterprise Data Quality - Robust data profiling and cleansing. Website

Articles and Guides

  • A Guide to Data Quality Tools: The 4 Leading Solutions - Zendata. Article
  • Top Data Quality Management Tools to Choose in 2024 - Mad Devs. Article
  • Data Quality Management: Tools, Pillars, and Best Practices - lakeFS. Article
  • Best Data Quality Tools for 2024: Top 10 Choices - Adverity. Article
  • The 8 Best Data Quality Management Tools and Software for 2025 - Solutions Review. Article
  • 9 Best Tools for Data Quality in 2024 - Datafold. Article
  • Data Quality Management Best Practices: A Short Guide - Zendata. Article