Bridge2AI Cell Maps 4 AI (CM4AI) dataset
Do you want to explore the relationship of cell morphology or protein localization and perturbations? Or differentiation processes? Tired to get data sets without metadata that nobody understands? Looking for some more comprehensive information that what you get when you are just given a random folder of images withoiut labels? Don't worry: some of us got your back!
This growing document is going to guide you through all the aspects of the bridge2ai cell maps 4 ai data set.
History and Authors
The data set was created in 2023-2025 within the Lundberg Lab as part of a large consortia effort called "Cell Maps 4 AI" (CM4AI), part of the NIH Common Funds Bridge2AI. The data generation was organized by Jan Hansen from our lab, with Ulrika Axxelsson, Jenny Fall, Viviana Macarelli, and Marina Brogley being the data generators.
For more information on the larger goal project CM4AI check the webpage CM4AI.org.
Design aim behind this data set
The aim of this project was to map protein localization in - Triple negative breast cancer cells (MDA-MB-468), untreated or treated with Paclitaxel (a common chemotherapy drug applied in breast cancer treatment) or Vorinostat (a less common drug) - KOLF2.1J stem cells and derived neurons as well as cardiomyocytes.
The image data generated was in HPA-style and will possibly be contributed to future HPA versions. The KOLF2.1J basic stem cell data is already included in the protein atlas, where all images have been manually annotated.
Experimental details
MDA-MB-468:
Cells were seeded onto fibronectin-coated glass-bottom 96-well plates and, after growing the cells for one to multiple days, cells were fixed with 4% paraformaldehyde (PFA) solution for 15 min (as in the common HPA Subcellular section workflow). Cells were then stained in the common HPA immunofluorescence workflow (0.1% Triton X-100 permeabilization for 15 min, staining in PBS with 4% FBS - primary antibody: overnight, secondary antibody: 1.5 h, DAPI staining for 10 min).
KOLF2.1J and derived cell types:
Check the benchling folder "Projects/Bridge2AI Project - KOLF2.1J Stem Cells".
Microscopy:
-
MDA-MB-468: Leica SP8 microscope in Sweden to create high-resolution HPA style images:
- 63x magnification, water-immersion objective
- pixel size 0.07 micron/px
- field of view 2048x2048
- 3 z-planes to make sure we cover a good plane (since automated imaging)
- inter-z-plane spacing of 2 microns
- recordings were performed at several individual random positions in the well
-
KOLF2.1J: Operetta Sonata high-content imager in Sweden to create HPA style images:
- 63x magnification, immersion objective
- pixel size 0.0941867 micron/px
- field of view 2160x2160
- many z-planes to cover the whole cells from bottom to top
- inter-z-plane spacing of X microns (I believe 0.07)
- recordings were performed at several individual positions in the well. The Microsocpe searches for DAPI regions to make sure it finds regions with cells.
Structuring of data folders
Raw data:
MDA-MB-468:
This is typically zip-compressed raw .lif files from the microscopes.
Stem cells / KOLF2.1J:
These are Operetta export folders that have images and xml metadata files.