Setup

It is advisable to start with a clean shell (no Offline setup), because the jobsub_client package requires a different Python version then Mu2e Offline.

  setup mu2e
  setup mu2egrid

Preparing a list of fcl files

One has to make a list of fcl files to run, with their full pathnames. If the files were kept in their original bluearc location, just "ls" them, like:

ls /mu2e/data/users/gandr/fclds/20161121-ce-reco/000/*.fcl > ce-reco-jobs.txt

For files uploaded to a /pnfs location and registered in SAM, one can use mu2eDatasetFileList. Continuing with the fcl upload page example,

setup mu2efiletools
mu2eDatasetFileList cnf.`whoami`.my-test-s1.v0.fcl > pion-jobs.txt

Job submission

The mu2eprodsys command is used to submit grid jobs. A single execution of mu2eprodsys submits a single Condor cluster of jobs. We have been asked to limit the size of clusters to 10,000 jobs. If the total number of jobs to be run exceeds 10,000, the list of fcl files should be split into chunks with each chunk not exceeding 10,000 files. One can use the Linux split command to do this.

The required parameters are:

Some of the options

Run mu2eprodsys --help to see all the options.

Outstage location

By default the outputs will be placed into

  /pnfs/mu2e/scratch/users/$USER/workflow/$WFPROJECT/outstage/

where $USER is the submitter user name, and $WFPROJECT is specified by the --wfproject parameter (the default is "default").

If --role=Production is used (default for the mu2epro user), the outputs will go into

  /pnfs/mu2e/persistent/users/$USER/workflow/$WFPROJECT/outstage/

instead.

Example

  mu2eprodsys --setup=/cvmfs/mu2e.opensciencegrid.org/Offline/v5_6_7/SLF6/prof/Offline/setup.sh \
  --wfpro=pion-test \
  --fcllist=pion-jobs.txt \
  --dsconf=v567 \
  --expected-lifetime=5h

Monitoring

You can use the mu2e_clusters script from the mu2egrid package, or the jobsub_q command, to check the status of your jobs. If you see that some jobs are on hold (the "H" state), look for HoldReason in the output of jobsub_q --long --jobid xxx.y@fifebatchN.fnal.gov (substitute an appropriate job ID). If you see a PERIODIC_HOLD message, that means the job tried to use more resources (memory/disk/time) than you asked for. jobsub_rm the offending jobs and re-submit after adjusting the requirements. If HoldReason is not a PERIODIC_HOLD, open a servicedesk ticket.

Next: check results.


Fermilab at Work ]  [ Mu2e Home ]  [ Mu2e @ Work ]  [ Mu2e DocDB ]  [ Mu2e Search ]

For web related questions: Mu2eWebMaster@fnal.gov.
For content related questions: gandr@fnal.gov
This file last modified Monday, 21-Nov-2016 19:04:33 CST
Security, Privacy, Legal Fermi National Accelerator Laboratory