Data Transfer to Sherlock

This guide covers best practices for transferring data to and from Stanford's Sherlock supercomputer scratch storage.

1. Using Rclone with ELL Vault

Rclone provides a robust way to transfer data between your local machine and Sherlock via the ELL Vault WebDAV storage.

Initial Setup

Load the rclone module on Sherlock:
```
module load rclone
```
Configure rclone (if not already done):
```
rclone config
```
Set up a remote called ell_vault with the following settings:
Protocol: WebDAV
URL: https://ell-vault.stanford.edu/dav/{your_username}/
Replace {your_username} with your Ell Vault username

Using Rclone

Once configured, you can copy files in either direction:

Copy from local to ELL Vault:

rclone copy /path/to/local/file ell_vault:remote/path --progress

Copy from ELL Vault to local:

rclone copy ell_vault:remote/path /path/to/local/destination --progress

Useful flags: - --progress: Shows real-time transfer progress - --dry-run: Preview what would be transferred without actually copying - -v: Verbose output for debugging

Example Workflow

# Load rclone module
module load rclone

# Copy data to ELL Vault with progress indicator
rclone copy $SCRATCH/experiment_data ell_vault:projects/experiment_2024 --progress

# Sync directories (only copies new/changed files)
rclone sync /local/data ell_vault:backup --progress

2. Using Sherlock's Data Transfer Nodes

For large-scale data transfers, Sherlock provides dedicated Data Transfer Nodes (DTNs) that offer better performance and reliability than login nodes.

Why Use DTNs?

Dedicated resources: No competition with interactive workloads
Better performance: Optimized for large-scale transfers
Reliability: Won't disconnect during long transfers
Recommended for: Substantial data movement (multi-GB files, large datasets)

Accessing DTNs

Hostname: dtn.sherlock.stanford.edu

Important notes: - DTNs do not provide interactive shell access - Default destination path is $SCRATCH, not $HOME - Explicitly specify paths when needed

Supported Transfer Methods

Using SCP

# Copy to Sherlock scratch (default)
scp large_dataset.tar.gz username@dtn.sherlock.stanford.edu:

# Copy to specific location
scp large_dataset.tar.gz username@dtn.sherlock.stanford.edu:/scratch/users/username/data/

# Copy from Sherlock
scp username@dtn.sherlock.stanford.edu:/scratch/users/username/results.tar.gz .

Using Rsync

# Sync directory to Sherlock with progress
rsync -avz --progress /local/data/ username@dtn.sherlock.stanford.edu:/scratch/users/username/data/

# Resume interrupted transfer
rsync -avz --progress --partial /local/data/ username@dtn.sherlock.stanford.edu:/scratch/users/username/data/

Using SFTP

# Interactive SFTP session
sftp username@dtn.sherlock.stanford.edu

# In SFTP session:
put local_file.txt
get remote_file.txt

Best Practices

Use DTNs for large transfers: Login nodes are fine for small files, but use DTNs for significant data movement
Specify full paths: Remember that DTN default path is $SCRATCH, not $HOME
Use rsync for resumability: --partial flag allows resuming interrupted transfers
Monitor progress: Use --progress flag to track long transfers
Compress data when appropriate: Use tar/gzip before transfer for many small files

Example: Complete Transfer Workflow

# 1. Prepare data locally (compress if many files)
tar -czf experiment_data.tar.gz experiment_data/

# 2. Transfer to Sherlock scratch using DTN
rsync -avz --progress experiment_data.tar.gz username@dtn.sherlock.stanford.edu:/scratch/users/username/

# 3. On Sherlock, extract the data
ssh username@sherlock.stanford.edu
cd $SCRATCH
tar -xzf experiment_data.tar.gz