Skip to content

Storage design study

This is a brief study undergone at 2023_07_14 to select a proper storage option for the lab. These are the gathered requirements:

Hard requirements

  • H1) Needs to be accessible through Linux, Windows and MacOS
  • H2) Needs to be accessible by any lab member regardless they are in Stanford or Sclifelab (required steps may vary but they need to be feasible by all lab members)
  • H3) Needs to grant an "automated" access possibility (mapped network drive, ssh access, API, etc...) so we can setup scripts/programs to directly store instrument/services outputs
  • H4) Scalable to at least a maximum limit of 1000TB. We are most probably not going to need it, but it's a good upper limit
  • H5) Solid backups

Soft requirements

  • S1) User space granurality: different limit per user and/or a way to control user usage to limit misuse/abuse
  • S2) Project space granurality: option to create accounts/limits per project, separated from the lab members personal files.
  • S3) Good transfer speed
  • S4) Possibility to share files with other lab members and guests (non-authenticated)
  • S5) Allow simple access from any computer (SSH, web application, etc...)
  • S6) Low cost

The current WebDAV over HTTPS solution was chosed due to the high-level of compliance with the requirements:

Current setup at Stanford

  • H1) Mostly compliant (might need third party software like winscp or Cyberduck)
  • H2) Compliant
  • H3) Compliant 100% in Linux, most of the time 100% in Windows too
  • H4) Compliant with some minor logistics/infrastructure issues (that's a lot of hard drives)
  • H5) Compliant
  • S1) Mostly compliant (script monitoring)
  • S2) Compliant
  • S3) Good speed in Stanford, regular internet speed from Scilifelab
  • S4) Compliant
  • S5) Mostly compliant (no direct SSH, but can be mounted as a network drive)
  • S6) Lowest possible price option

Storage solution

Hardware

The current solution Hardware consists in:

  • A regular workstation:

    • I7 12700 processor
    • 128GB RAM
    • NVIDIA RTX2000 ADA GPU
    • 1TB NVMe OS partition
    • 20TB local SATA HDD for working space
    • Ubuntu server 24.04 as OS
  • 2 Sabrent USB 10x HDD enclosures with 20x 20TB HDD disks

    • /data partition with a 200TB virtual group (10 disks)
    • /deep partition with a 200TB virtual group (10 disks)
  • A UPS battery for surge and power off protection

The current enclosures are full, but there's enough high-speed USB connector left to add 2 more enclosures, reaching 800TB in the future (if required)

Software

Sotware wise, the solution is a regular Nginx installation with WebDAV support. Basically:

  • Every lab member has a space under /data/dav/{username} with www-data:www-data permissions (protected by http basic authentication)
  • Every lab member has a /data/dav/{username}/www directory that is open to everyone (provided exact URL)
  • Every lab member has a /data/dav/{username}/web symlink to their /data/dav/{username}/www directory (protected by http basic authentication)

Refer to /etc/nginx/sites-enabled/services file to check the webserver specific configuration

Each lab member space is their own personal storage, which is usually soft capped to 1TB (see Backup section below). They can also work in specific projects (shared with other lab members or not) that might require more storage. The projects are stored under /data/dav/projects/ directory, each one in their own subdirectory, and the lab members involved in it get a symlink to it in their own personal storage.

Finally, there's several symlinks redirected to /deep/external/ that are used for different purposes:

  • To share complete and constantly changing directories with external collaborators (like the Beacon project)
  • To be used as upload directories for third parties (like the COMAS dataset ingestion, etc...).
  • In case of the /data/dav/public directory, that is a symlink to the space in deep partition that stores all link for published papers links Important! this data, as being physically allocated in the deep partition, is not part of the /data/dav weekly backed-up data.