Submission¶
Please upload your predictions as a single CSV file (the filename is arbitrary) containing the columns question-id and response. Depending on the task, the submission will include between 272 and 300 question IDs, and the CSV file must contain exactly one row for each required question ID.
Important: For the neoplastic status and neoplastic behavior tasks, responses must exactly match one of the allowed values listed below; otherwise, the submission will fail validation. Participants whose models do not always produce valid responses for these tasks may choose to apply imputation strategies prior to submission, following the approach used in our paper for handling invalid or missing pathologist responses (see our preprint). For the tissue recognition and multiple-choice VQA tasks, empty or malformed responses will still be evaluated but will be treated as incorrect. For the diagnosis and free-response VQA tasks, all responses will be evaluated; empty or low-quality responses will not cause submission failure but will receive a correspondingly low score.
Our GitHub repository contains the same submission validation and evaluation code used by this platform, except for the reference standard and additional evaluation resources (e.g., the hierarchical tissue taxonomy). Participants are encouraged to consult this repository for submission validation and evaluation logic.
Per-task response requirements¶
Tissue recognition
Each response must be a single tissue or organ (e.g., breast). During evaluation, responses are preprocessed by (1) removing all parenthetical content, and (2) matching against a hierarchical tissue taxonomy using predefined synonym patterns. Matches are ranked first by match length (longer matches are preferred), and then by text position (earlier matches are preferred). Only the highest-ranking match is evaluated against the reference standard.
Neoplastic status
Each response must be one of: yes, no.
Neoplastic behavior
Each response must be one of: benign, uncertain, in situ, malignant.
Diagnosis
Each response must be a single diagnosis (e.g., ductal carcinoma in situ). Submissions are evaluated as provided, without additional diagnosis extraction. For reference, preprint describes how diagnoses were extracted from longer responses in our experimental setting.
Multiple-choice VQA
Each response must be a single letter from A to F. If multiple letters are provided, only the first letter is used for evaluation.
Free-response VQA
Each response must be a free-form text string.
Supplementary materials (optional)¶
Please provide a PDF file describing your method (required) and a URL linking to your publication or GitHub repository (optional).
Submission frequency¶
Currently, only four submissions are allowed every four weeks. If you need to submit more frequently, please contact us.