Data Quality Evaluation Tool – Holistic Data Quality Assessment based on ISO 25012/24

Instant camera prints on a table. Top view.
© istock / Filippo Bacci

Robust AI models depend on precise and consistent training and test data. Image datasets, however, often suffer from incomplete annotations, duplicates or contradictory labels. Fraunhofer FOKUS’ Data Quality Evaluation Tool automatically analyses image data and rates it against the international ISO 25012/24 standard.

Using the FiftyOne API, the tool first uses metrics and heuristics such as Intersection over Union for overlap checks, image hashes for duplicate detection and semantic tests for faulty annotations. These results are then translated into ISO-compliant quality metrics covering attributes like accuracy, consistency and completeness.

The application can be launched as a stand-alone Python script via a YAML configuration or as a subprocess within external workflows – e.g. in continuous workflows as part of an MLOps pipeline. Results are delivered as a JSON file accompanied by detailed logs that enable visual inspection of problematic data.

By detecting data issues early and in a standardised manner, the Data Quality Evaluation Tool lowers training costs, boosts model quality and provides machine-readable reports for downstream applications.