Evaluation & Ranking - PUMA: Panoptic segmentation of nUclei and tissue in MelanomA

The PUMA Challenge consists of two evaluation tracks.

For both tracks, submissions must include algorithms for Task 1 (tissue segmentation) and Task 2 (nuclei segmentation). To see example evaluation code, JSON annotations, and TIFF annotations, visit: - PUMA Challenge Evaluation - Track 1 - PUMA Challenge Evaluation - Track 2 - PUMA Challenge Baseline- Track 1 - PUMA Challenge Baseline- Track 2

The prediction outputs should include:

Task 1: One .tif file with segmentation results and metadata, including XResolution, YResolution, SMinSampleValue (excluding background), and SMaxSampleValue.
Task 2: One .json file containing nuclei segmentation results.

The labels in the tissue segmentation output should follow the class map:

Submissions are evaluated on two metrics:

Task 1 (Tissue Segmentation): The micro Dice Score is calculated by concatenating all segmentation results along one axis and then averaging the Dice score across all tissue classes. Tissue_white_background is not taken along for metric calculation.
Task 2 (Nuclei Segmentation): The macro F1-Score is determined using a hit criterion based on confidence score and centroid distance for each nuclei class. The evaluation process is as follows:

Extract annotations: Nuclei predictions and ground-truth nuclei are extracted from JSON files, with centroids calculated for each polygon.
Filter predictions: Predictions are censored based on a 15-pixel radius. Only predictions within this distance from any ground-truth nuclei are considered for further matching.
Match predictions to ground-truth nuclei based on:
- If available, the highest confidence score.
- If not available, the nearest ground-truth nuclei is selected.
Censor matched ground truth: Once a match is made, the corresponding ground-truth nuclei is marked as used and not considered for further matches.
Class alignment: Check if the prediction and ground-truth nuclei classes align.
- If aligned, count as a True Positive (TP).
- If not aligned or no match is found, count as a False Positive (FP).
Remaining unmatched ground-truth nuclei are counted as False Negatives (FN).

Final rankings are based on the mean average rank across both tasks.

Good luck to all participants!