Preventing Proteomics Data Tombs Through Collective Responsibility and Community Engagement

Uladzislau Vadadokhau et al.

Sci Data. 2026 Jan 22. doi: 10.1038/s41597-026-06614-8. Online ahead of print.

Published on January 22, 2026

ABSTRACT

Public proteomics repositories now host vast amounts of mass spectrometry data, yet much of it remains difficult to reuse, risking “data tombs” that are open access but not practically re-analyzable. In spring 2025, a graduate-level course at the University of Helsinki tasked six student teams with reanalyzing six projects from the Proteomics Identification Database (label-free quantification only) using a common R-based workflow (rpx, mzR, QFeatures, DEP/MSqRob2/limma/OmicsQ packages) that was shared across all teams. The teams reproduced identification, optional quantification, normalization, imputation, and differential expression analyses, and compared the outcomes to the original studies. As expected, systemic barriers recurred across cases: (i) no sample and data relationship format for proteomics metadata in any of the cases; (ii) missing details regarding decoy sets for false discovery rate assessment; (iii) proprietary-only outputs or software (e.g., Thermo.msf, Progenesis) that impeded open reanalysis in interoperable, community-standard formats; (iv) missing data-independent acquisition spectral libraries or protein sequences database files (FASTA); (v) absent or vague normalization/imputation/statistical parameters; (vi) inconsistent file naming; and (vii) insufficient biological/technical replication in at least one project. These shortcomings yielded large discrepancies in the analysis results (e.g., 13,068 vs. 4,923 proteins; 108 vs. 11 differentially expressed proteins), and, in one instance, a highlighted protein lacked robust support in the deposited identifications. We observed that reproducibility in mass spectrometry-based proteomics hinges less on instruments than on transparent metadata, open formats, and executable analysis provenance. We propose that data creators provide a minimum re-analysis package, including raw data and open formats, community standards, basic quality control summaries, data-independent acquisition spectral libraries, and complete parameter/code sets with pinned versions or containers. Moreover, we recommend repository-level nudges toward making such packages mandatory. This educational exercise simultaneously trains the students as well as stress-tests the community data practices to prevent proteomics “data tombs”.

PMID:41571719 | DOI:10.1038/s41597-026-06614-8

Back to Publications

Recent Publication

Persistence of memory: lifespan dynamics of the human antiviral antibody reactome

Moriah M Mitchell et al.Nat Commun. 2026 Jun 22. doi: 10.1038/s41467-026-74680-y. Online ahead of print.

Recent Publication

A biocompatible copper based therapeutic nanoplatform for CD44 specific tumor targeted therapy and potent immune reprogramming

Yifeng Fang et al.Biomaterials. 2026 Jun 13;335:124373. doi: 10.1016/j.biomaterials.2026.124373. Online ahead of print.

Recent Publication

Preventing Proteomics Data Tombs Through Collective Responsibility and Community Engagement

Persistence of memory: lifespan dynamics of the human antiviral antibody reactome

A biocompatible copper based therapeutic nanoplatform for CD44 specific tumor targeted therapy and potent immune reprogramming

Tumor Invasive Border Index (TIBI) in colorectal cancer: linking infiltrative morphology to molecular insights

Inactive β1-integrin acts as a junctional scaffold for angiopoietin/TIE2/FOXO1 signaling

Prognostic role of histological depth of invasion in T1-2 oropharyngeal squamous cell carcinoma

Preventing Proteomics Data Tombs Through Collective Responsibility and Community Engagement

Persistence of memory: lifespan dynamics of the human antiviral antibody reactome

A biocompatible copper based therapeutic nanoplatform for CD44 specific tumor targeted therapy and potent immune reprogramming

Tumor Invasive Border Index (TIBI) in colorectal cancer: linking infiltrative morphology to molecular insights

Inactive β1-integrin acts as a junctional scaffold for angiopoietin/TIE2/FOXO1 signaling

Prognostic role of histological depth of invasion in T1-2 oropharyngeal squamous cell carcinoma

More Publications

Body-wide genetic deficiency of poly(ADP-ribose) polymerase 14 sensitizes mice to colitis

HDAC1 is involved in the destabilization of the HSF2 protein under non-stress and stress conditions

Multiplatform metabolomic interlaboratory study of a whole human stool candidate reference material from omnivore and vegan donors