Running Jobs with Restricted Data#
Job submission and management uses the same commands and syntax as the HPC cluster. The same partitions (queues), fairshare, and resource limits apply to jobs. However, there are some key differences in workflow when using restricted datasets.
Here we discuss running jobs, unique workflow features for ResHPC, and references to general documentation.
Submitting Jobs#
Job submission and management uses the same commands and syntax as the HPC cluster. For details on submitting and managing SLURM jobs, please see Submitting SLURM Jobs.
Encrypting Restricted Data#
Restricted source data must be encrypted, and remain encrypted when stored in the project's /group directory. Source data may only be decrypted after it is copied to the project's /scratch directory, and only as needed for analysis.
Please note that many restricted datasets will be delivered to you in an encrypted format. Some genomics software can work directly with these encrypted files. However, there are a variety of encryption programs available if needed.
Please contact help-rcc@mcw.edu with questions.
Data Staging and Workflow#
Staging data for a job is very similar to using the HPC cluster. A key difference is the need to encrypt restricted data when not in use.
- Copy encrypted data from project's
/groupsource directory to/scratchdirectory. - Decrypt files in project's
/scratchdirectory and submits jobs that use the unencrypted data. - Run jobs and copy results from
/scratchdirectory back to project's/groupresults directory. - Continue with further computations using the unencrypted data in project's
/scratchdirectory. - Finish workflow and delete files from project's
/scratchdirectory.
Warning
Do not leave unencrypted data in project's /scratch directory. For example, if you're going on vacation, delete the unencrypted data.