Data Transfer to Sherlock
This guide covers best practices for transferring data to and from Stanford's Sherlock supercomputer scratch storage.
1. Using Rclone with ELL Vault
Rclone provides a robust way to transfer data between your local machine and Sherlock via the ELL Vault WebDAV storage.
Initial Setup
-
Load the rclone module on Sherlock:
module load rclone -
Configure rclone (if not already done):
rclone config -
Set up a remote called
ell_vaultwith the following settings: - Protocol: WebDAV
- URL:
https://ell-vault.stanford.edu/dav/{your_username}/ - Replace
{your_username}with your Ell Vault username
Using Rclone
Once configured, you can copy files in either direction:
Copy from local to ELL Vault:
rclone copy /path/to/local/file ell_vault:remote/path --progress
Copy from ELL Vault to local:
rclone copy ell_vault:remote/path /path/to/local/destination --progress
Useful flags:
- --progress: Shows real-time transfer progress
- --dry-run: Preview what would be transferred without actually copying
- -v: Verbose output for debugging
Example Workflow
# Load rclone module
module load rclone
# Copy data to ELL Vault with progress indicator
rclone copy $SCRATCH/experiment_data ell_vault:projects/experiment_2024 --progress
# Sync directories (only copies new/changed files)
rclone sync /local/data ell_vault:backup --progress
2. Using Sherlock's Data Transfer Nodes
For large-scale data transfers, Sherlock provides dedicated Data Transfer Nodes (DTNs) that offer better performance and reliability than login nodes.
Why Use DTNs?
- Dedicated resources: No competition with interactive workloads
- Better performance: Optimized for large-scale transfers
- Reliability: Won't disconnect during long transfers
- Recommended for: Substantial data movement (multi-GB files, large datasets)
Accessing DTNs
Hostname: dtn.sherlock.stanford.edu
Important notes:
- DTNs do not provide interactive shell access
- Default destination path is $SCRATCH, not $HOME
- Explicitly specify paths when needed
Supported Transfer Methods
Using SCP
# Copy to Sherlock scratch (default)
scp large_dataset.tar.gz username@dtn.sherlock.stanford.edu:
# Copy to specific location
scp large_dataset.tar.gz username@dtn.sherlock.stanford.edu:/scratch/users/username/data/
# Copy from Sherlock
scp username@dtn.sherlock.stanford.edu:/scratch/users/username/results.tar.gz .
Using Rsync
# Sync directory to Sherlock with progress
rsync -avz --progress /local/data/ username@dtn.sherlock.stanford.edu:/scratch/users/username/data/
# Resume interrupted transfer
rsync -avz --progress --partial /local/data/ username@dtn.sherlock.stanford.edu:/scratch/users/username/data/
Using SFTP
# Interactive SFTP session
sftp username@dtn.sherlock.stanford.edu
# In SFTP session:
put local_file.txt
get remote_file.txt
Best Practices
- Use DTNs for large transfers: Login nodes are fine for small files, but use DTNs for significant data movement
- Specify full paths: Remember that DTN default path is
$SCRATCH, not$HOME - Use rsync for resumability:
--partialflag allows resuming interrupted transfers - Monitor progress: Use
--progressflag to track long transfers - Compress data when appropriate: Use tar/gzip before transfer for many small files
Example: Complete Transfer Workflow
# 1. Prepare data locally (compress if many files)
tar -czf experiment_data.tar.gz experiment_data/
# 2. Transfer to Sherlock scratch using DTN
rsync -avz --progress experiment_data.tar.gz username@dtn.sherlock.stanford.edu:/scratch/users/username/
# 3. On Sherlock, extract the data
ssh username@sherlock.stanford.edu
cd $SCRATCH
tar -xzf experiment_data.tar.gz