
Awesome data quality resources
data-qualityawesomeawesome-resourcesdata
Awesome Data Quality Resources
A curated list of resources for testing, monitoring, and improving data quality across various data environments.
Table of Contents
Frameworks and Libraries
Open Source
- elementary - Data monitoring and observability tailored to dbt. GitHub
- mobydq - Tool for data engineering teams to run & automate data quality checks on their data pipeline. GitHub
- ydata-quality - Python library for assessing data quality throughout stages of the data pipeline development. GitHub
- great-expectations - Tool for data testing, documentation, and profiling. GitHub
- deequ - Library by Amazon for defining unit tests for data with a focus on large datasets. Based on Apache Spark. GitHub
- soda - Enables data testing through extended SQL queries. GitHub
- dqm - Another data quality monitoring tool implemented using Spark. GitHub
- owl-sanitizer - Lightweight data validation framework based on Spark. GitHub
- griffin - Data Quality solution for distributed data systems at any scale in both streaming and batch data context. GitHub
Commercial
- Bigeye - Continuous data quality monitoring and anomaly detection. Website
- Soda - Data testing and monitoring platform. Website
- Databand - Data pipeline observability and monitoring. Website
- Monte Carlo - Data observability platform. Website
- Sifflet - Data quality monitoring and observability. Website
- Validio - Real-time data quality monitoring. Website
- Lightup - Data quality checks and monitoring. Website
- Lantern - Data quality and observability. Website
- Metaplane - Data quality monitoring for data teams. Website
- Datafold - Proactive data quality platform. Website
- Acceldata - Data observability and quality management. Website
- Anomalo - Automated data quality monitoring. Website
- Marquez - Metadata service for collecting, aggregating, and visualizing a data ecosystem's metadata. GitHub
Books and Methodologies
- Complete Data Quality Methodology (CDQM) - By Carlo Batini/Monica Scannapieco. Book
- Data Quality Assessment Framework - By Arkady Maydanchik. Book
- CIHI Information Quality Framework - From the Canadian Institute for Health Information. Resource
- Enterprise Knowledge Management - By David Loshin. Book
- MIKE2.0 - Open Source initiative for Enterprise Information Management. Website
- Ten Steps to Quality Data and Trusted Information - By Danette McGilvray. Book
- Total Information Quality Management (TIQM) - By Larry English. Book
Tools
Open Source Tools
- Deequ - For defining unit tests for data. GitHub
- dbt Core - Data transformation tool with built-in testing capabilities. GitHub
- MobyDQ - Automates data quality checks. GitHub
- Great Expectations - Data validation and profiling. GitHub
- Soda Core - Python library for data reliability. GitHub
- Cucumber - Behavior-driven development tool for data quality testing. GitHub
Commercial Tools
- Ataccama - Comprehensive data quality and catalog suite. Website
- Informatica - Data quality and observability platform. Website
- Talend - Data quality solutions with real-time monitoring. Website
- IBM InfoSphere QualityStage - Data quality and governance. Website
- Precisely Trillium Quality - Enterprise data quality tool. Website
- Adverity - Marketing data integration with data quality management. Website
- Oracle Enterprise Data Quality - Robust data profiling and cleansing. Website
Articles and Guides
- A Guide to Data Quality Tools: The 4 Leading Solutions - Zendata. Article
- Top Data Quality Management Tools to Choose in 2024 - Mad Devs. Article
- Data Quality Management: Tools, Pillars, and Best Practices - lakeFS. Article
- Best Data Quality Tools for 2024: Top 10 Choices - Adverity. Article
- The 8 Best Data Quality Management Tools and Software for 2025 - Solutions Review. Article
- 9 Best Tools for Data Quality in 2024 - Datafold. Article
- Data Quality Management Best Practices: A Short Guide - Zendata. Article