Skip to content

Cell cycle dataset

Do you want to explore the relationship of cell morphology, cell lines, tissues, and cell cycle position? Tired to get data sets without metadata that nobody understands? Looking for some more comprehensive information that what you get when you are just given a random folder of images withoiut labels? Don't worry: some of us got your back!

This growing document is going to guide you through all the aspects of the cell cycle data set.

Authors

The data set was created in 2025 by Jan N. Hansen and William D. Leineweber, upon joint initiative with Ankit Gupta. The data set is not yet published somewhere but will be published some day. Please let us know if you are going to use anything coming from that data set publicly so we can align publication strategies.

Design aim behind this data set

So far most cell cycle characterization and analysis in our lab has been based on U2OS FUCCI cells. U2OS cells is an osteosarcoma cancer cell line. FUCCI represents a cell cycle marker combination of CDT1 and GMNN, whose expression is high in G1 or G2/S phase, respectively. In the typically FUCCI cells, Fusion proteins of CDT1 or GMNN with fluorescent proteins are expressed so that the fluorescence of the cells can serve as a readout for these two cell cycle markers. In the common HPA FUCCI data set only microtubules, these two proteins (GMNN and CDT1), and the protein of interest were stained, only on U2OS FUCCI cells.

What we wanted to do: Develop a dataset with FUCCI markers across many different cell types, primary and cancer cells, and that features microtubules, nuclei, CDT1 and GMNN (at least). We have later advanced this data set to also acquired data from FFPE tissues and to also include an ER marker.

Available cell lines and tissues included

Cell lines: - KOLF2.1J, an induced pluripotent stem cell line - hTERT-RPE1 cells, serum-starved for 48 hours prior to fixation and thus NON-CYCLING. It is a primary, immortalized, non-cancer cell line from the retina. Can be used as a control (https://www.atcc.org/products/crl-4000) - CL40, a colorectal cancer cell line - SW1417, a colorectal cancer cell line - CACO2, a colorectal cancer cell line - RPTEC/TERT1, a primary, immortalized, non-cancer cell line from the proximal kidney tubulus - HeLa, a common cancer cell line - U2OS, an osteosarcoma cell line, best covered in the HPA. - A-549, a cancer cell line from the lung (https://www.atcc.org/products/ccl-185) - HK2, an immortalized proximal tubulus kidney cell line (https://www.atcc.org/products/crl-2190) - MDA-MB-468, a triple-negative breast cancer cell line (https://www.atcc.org/products/htb-132), also studied in the Bridge2AI CM4AI data set.

Tissues: - Tonsil tissue

Experimental details

All full wet-lab protocols for acquiring this data are stored in benchling under Projects/Cell_Cycle_Predictor/.

Cell line data

Images were acquired at different microscopes from the same samples to provide different data modalities.

  • Stellaris 8 microscope, to create high-resolution HPA style images:
    • 63x magnification, water-immersion objective
    • pixel size 0.07 micron/px
    • field of view 2048x2048
    • multiple z planes = 3D image to cover the whole cells from top to bottom
    • inter-z-plane spacing of 0.7 micron
    • recordings were performed at hundreds of individual random positions in the well
  • DMi8 Thunder microscope
    • 20x magnification, non-immersion objective
      • Whole well was recorded using a tile-scan with overlap between tiles
      • Each tile was saved as an individual image
      • Multiple widely-spaced z planes were recorded
      • pixel size 0.3233943332 micron/px
      • inter-z-plane spacing of 1.000715 micron
    • 40x magnification, non-immersion objective
      • Whole well was recorded using a tile-scan with overlap between tiles
      • Each tile was saved as an individual image
      • Multiple z planes were recorded
      • pixel size 0.1616971666 micron/px
      • inter-z-plane spacing of 1.000715 micron
  • Cephla Squid microscope
    • 10x magnification
      • Whole well was recorded using a tile-scan with overlap between tiles
      • Multiple z planes were recorded
      • pixel size 0.752 micron/px

Tissue data

FFPE tissue was processed and described as described in the benchling protocol (e.g., JNH057).

Technical image information that dry-lab people care about only

Image types:

  • 20x_thunder
    • pixel size 0.3233943332 micron/px
    • inter-z-plane spacing of 1.000715 micron
    • Channels
      • 1: Brightfield
      • 2: DAPI
      • 3: GMNN
      • 4: Microtubules
      • 5: CDT1
  • 40x_thunder
    • widefield epifluorescence microscope
    • pixel size 0.1616971666 micron/px
    • inter-z-plane spacing of 1.000715 micron
    • Channels
      • 1: Brightfield
      • 2: DAPI
      • 3: GMNN
      • 4: Microtubules
      • 5: CDT1
  • 63x (no ER marker or tissue)
    • confocal microscope 3D image stacks
    • pixel size 0.07 micron/px
    • inter-z-plane spacing of 0.7 micron
    • Channels
      • 1: DAPI
      • 2: GMNN
      • 3: Transmitted light = bright field kind of image
      • 4: Microtubules
      • 5: CDT1
  • 63x (with ER marker, applicable to cell line images for A-549, HK2, MDA-MB-468 = experiment id JNH057)
    • confocal microscope 3D image stacks
    • pixel size 0.07 micron/px
    • inter-z-plane spacing of 0.7 micron
    • Channels
      • 1: DAPI
      • 2: Transmitted light = bright field kind of image
      • 3: GMNN (G2 cell cycle marker)
      • 4: ER
      • 5: Microtubules
      • 6: CDT1 (G1 cell cycle marker)

Structuring of data folders

Raw data: This is typically zip-compressed raw .lif files from the microscopes.

Exported ome-tif or tif data: This is exported from the LASX software to allow easier access of the individual images. The folders contain subfolders by microscope or magnification, such as: - 20x_Thunder - 40x_Thunder - 63x - 63x_Tissue

In general, Identifiers about types of samples are extractable from file names - Example 63x - 20250730_JNH047_CL40_001.lif => <date>_<experiment/staining id linkable to benchling>_<cell line, here CL40>_<recording file nr, e.g., 001>.lif - 20250709_JNH044_KOLF21J_001.lif => <date>_<experiment/staining id linkable to benchling>_<cell line, here KOLF2.1J>_<recording file nr, e.g., 001>.lif - Example 40x/20x (sorry these were recorded by Will and thus dont follow a concise nomenclature) - 2025-07-11-RPE_40x_JNH044.lif contains the date, the cell line (hTERT-RPE1 cells = RPE), the magnification (40x), and the experiment / staining identifier (JNH044) - 2025-07-10-40x_HeLA-cell_cycle_JNH044.lif contains the date, the cell line (HeLa cells), and the experiment / staining identifier (JNH044) - the mangification 40x is extractable from the parent folder - 2025-07-15-KOLF-JNH044.lif contains the date, the cell line (KOLF2.1J), and the experiment / staining identifier (JNH044) - The magnification 20x is extractable from the parent folder name - As identifiers in the raw files for cell lines are a bit confusing here is some key to unlock them: - KOLF or KOLF21J, refers to KOLF2.1J - RPE or RPE1 or RPE1_Starved, refers to hTERT-RPE1 cells that were serum-starved for 48 hours prior to fixation and thus do not cycle. - CL40 refers to CL40 cells - SW1417 refers to SW1417 cells - CACO2 refers to CACO2 cells - RPTEC or RPTEC_1 or RPTEC_2 refers to RPTEC/TERT1 cells, a primary, immortalized, non-cancer cell line from the proximal kidney tubulus - HeLa or HeLA refers to HeLa cells, a common cancer cell line - U2OS or u2os or U2-OS or U-2OS refers to U2OS cells, an osteosarcoma cell line - A-549 - HK2 - MDA-MB-468 - more cell lines may be added in the future - For Tissues: A tissue type is referenced instead of a cell line - Tonsil

How to access the data?

Some of the images (63x cell line images without ER marker) have been further processed for making them readily available for ML development and other analyses. How to access those is outlined here. Other images so far are only available in raw data format (ome_tif or tif) as described above

cellcycle_image.csv file

This file contains all relevant information about the dataset images in CSV format.

Additional image works

There are other folders where you can find several items produced from the original images:

  • segmentations: this folder contains all nuclei and cell segmentations masks created using the HPACellSegmentator software.
  • crops: this folder contains each single cell crop from the above mentioned cell segmentations. In detail:
    • ...
    • A general crop_info.csv in the crops base folder where you can find the cell position in the crop, for all cells.

Utility scripts

WebDAV (lab storage) scripts

Additional resources

You might want to check this other resources related to several produced items in the HPA dataset: