---
title: "MONAI - Evaluation and Benchmarking Working Group"
description: "Providing guidelines, infrastructure, and practical tools for quality-controlled validation and benchmarking of medical image analysis methods."
canonical: https://project-monai.github.io/wg_evaluation.html
audience: [engineer]
last_updated: 2026-06-11
source: wg_evaluation_benchmark.html
---
# Evaluation and Benchmarking  
Working Group

Mission Statement

The Evaluation and Benchmarking MONAI working group aims at providing guidelines, infrastructure, and practical tools for evaluation and benchmarking of medical image analysis methods. It focuses on leading the community towards the identification and adoption of best practices for evaluation and benchmarking and on identifying practical solutions to improve reproducibility.

## Highlights

### Recommendations

-   • [Metrics Reloaded: recommendations for image analysis validation](https://www.nature.com/articles/s41592-023-02151-z)
-   • [Understanding metric-related pitfalls in image analysis validation](https://www.nature.com/articles/s41592-023-02150-0)
-   • [Metrics Reloaded Toolkit](https://metrics-reloaded.dkfz.de/)

### Implementation of recommendations

-   • [MONAI Evaluation Metrics](https://github.com/Project-MONAI/MetricsReloaded/)
-   • [Metrics Documentation](https://monai.readthedocs.io/en/latest/metrics.html)

### Related resources

-   • [Biomedical Image Analysis ChallengeS (BIAS) Initiative](https://www.dkfz.de/en/imsy/research/biomedical-image-analysis-challenges-bias-initiative)
-   • [Rankings Reloaded](https://www.rankings-reloaded.de/)

## Group Leads

![Dr. Annika Reinke](/assets/img/people/annika-reinke.jpg)

Dr. Annika Reinke

Deputy Head and Group Lead Validation of Intelligent Systems

German Cancer Research Center (DKFZ)

Benchmark Working Group Chair

[View Profile](https://www.dkfz.de/en/employees/annika-reinke)

![Dr. Carole Sudre](/assets/img/people/carole-sudre.jpg)

Dr. Carole Sudre

Associate Professor

University College London

Benchmark Working Group Chair

[View Profile](https://profiles.ucl.ac.uk/39648-carole-sudre)

## Meeting Notes

### GitHub Wiki

-   • [Access all meeting notes](https://github.com/Project-MONAI/MONAI/wiki/Evaluation-and-Benchmarks-Working-Group-Meeting-Notes)

## Ongoing Projects

### Reporting Guidelines Taskforce (Lead - Olivier Colliot)

-   • Surveying current reporting practices and identifying areas for improvement
-   • Development of guidelines around results reporting with a focus on statistical aspects
-   • Identification of proper calculation and methods for various procedures (e.g., confidence intervals) across different tasks and validation metrics
-   • Implementation of recommended calculations for MONAI users

### Benchmarking Datasets Taskforce (Lead - Michela Antonelli)

-   • Data quality review for MICCAI 2025 lighthouse challenges
-   • Identification of key characteristics for benchmarking datasets
-   • Encouragement to develop new datasets according to best practice
-   • Identification of relevant historical datasets to be used for benchmarking
-   • Implementation of guidelines for upcoming datasets

## Collaboration Opportunities

### Community Engagement

-   • Join our regular surveys
-   • Contribute to evaluation metrics testing
-   • Share your expertise in validation and benchmarking
-   • Participate in standards development