Skip to content

Data Transfer to Sherlock

This guide covers best practices for transferring data to and from Stanford's Sherlock supercomputer scratch storage.

1. Using Rclone with ELL Vault

Rclone provides a robust way to transfer data between your local machine and Sherlock via the ELL Vault WebDAV storage.

Initial Setup

  1. Load the rclone module on Sherlock:

    module load rclone
    

  2. Configure rclone (if not already done):

    rclone config
    

  3. Set up a remote called ell_vault with the following settings:

  4. Protocol: WebDAV
  5. URL: https://ell-vault.stanford.edu/dav/{your_username}/
  6. Replace {your_username} with your Ell Vault username

Using Rclone

Once configured, you can copy files in either direction:

Copy from local to ELL Vault:

rclone copy /path/to/local/file ell_vault:remote/path --progress

Copy from ELL Vault to local:

rclone copy ell_vault:remote/path /path/to/local/destination --progress

Useful flags: - --progress: Shows real-time transfer progress - --dry-run: Preview what would be transferred without actually copying - -v: Verbose output for debugging

Example Workflow

# Load rclone module
module load rclone

# Copy data to ELL Vault with progress indicator
rclone copy $SCRATCH/experiment_data ell_vault:projects/experiment_2024 --progress

# Sync directories (only copies new/changed files)
rclone sync /local/data ell_vault:backup --progress

2. Using Sherlock's Data Transfer Nodes

For large-scale data transfers, Sherlock provides dedicated Data Transfer Nodes (DTNs) that offer better performance and reliability than login nodes.

Why Use DTNs?

  • Dedicated resources: No competition with interactive workloads
  • Better performance: Optimized for large-scale transfers
  • Reliability: Won't disconnect during long transfers
  • Recommended for: Substantial data movement (multi-GB files, large datasets)

Accessing DTNs

Hostname: dtn.sherlock.stanford.edu

Important notes: - DTNs do not provide interactive shell access - Default destination path is $SCRATCH, not $HOME - Explicitly specify paths when needed

Supported Transfer Methods

Using SCP

# Copy to Sherlock scratch (default)
scp large_dataset.tar.gz username@dtn.sherlock.stanford.edu:

# Copy to specific location
scp large_dataset.tar.gz username@dtn.sherlock.stanford.edu:/scratch/users/username/data/

# Copy from Sherlock
scp username@dtn.sherlock.stanford.edu:/scratch/users/username/results.tar.gz .

Using Rsync

# Sync directory to Sherlock with progress
rsync -avz --progress /local/data/ username@dtn.sherlock.stanford.edu:/scratch/users/username/data/

# Resume interrupted transfer
rsync -avz --progress --partial /local/data/ username@dtn.sherlock.stanford.edu:/scratch/users/username/data/

Using SFTP

# Interactive SFTP session
sftp username@dtn.sherlock.stanford.edu

# In SFTP session:
put local_file.txt
get remote_file.txt

Best Practices

  1. Use DTNs for large transfers: Login nodes are fine for small files, but use DTNs for significant data movement
  2. Specify full paths: Remember that DTN default path is $SCRATCH, not $HOME
  3. Use rsync for resumability: --partial flag allows resuming interrupted transfers
  4. Monitor progress: Use --progress flag to track long transfers
  5. Compress data when appropriate: Use tar/gzip before transfer for many small files

Example: Complete Transfer Workflow

# 1. Prepare data locally (compress if many files)
tar -czf experiment_data.tar.gz experiment_data/

# 2. Transfer to Sherlock scratch using DTN
rsync -avz --progress experiment_data.tar.gz username@dtn.sherlock.stanford.edu:/scratch/users/username/

# 3. On Sherlock, extract the data
ssh username@sherlock.stanford.edu
cd $SCRATCH
tar -xzf experiment_data.tar.gz

Additional Resources