It is advisable to start with a clean shell (no Offline setup),
jobsub_client package requires a different
Python version then Mu2e Offline.
setup mu2e setup mu2egrid
One has to make a list of fcl files to run, with their full pathnames. If the files were kept in their original bluearc location, just "ls" them, like:
ls /mu2e/data/users/gandr/fclds/20161121-ce-reco/000/*.fcl > ce-reco-jobs.txt
For files uploaded to a /pnfs location and registered in SAM, one
Continuing with the fcl upload
setup mu2efiletools mu2eDatasetFileList cnf.`whoami`.my-test-s1.v0.fcl > pion-jobs.txt
mu2eprodsys command is used to submit grid jobs.
A single execution of
mu2eprodsys submits a single
Condor cluster of jobs. We have been asked to limit the size of
clusters to 10,000 jobs. If the total number of jobs to be run
exceeds 10,000, the list of fcl files should be split into chunks
with each chunk not exceeding 10,000 files. One can use the
split command to do this.
The required parameters are:
--setupthe setup script that determines which version of Offline to use. To run jobs offsite, the Offline release must reside in
cvmfs. For on-site running one can put code on the
--fcllistThe list of uploaded fcl files in /pnfs
--dsconfAn arbitrary string that will be used for the "configuration" field of output files.
Some of the options
--dsownerthe "owner" (in the dataset naming convention sense) of output datasets. Defaults to the user who runs
mu2eprodsys, unless the user is "mu2epro". In the latter case
--dsowneris set to "mu2e" to produce "official" datasets.
--wfprojectcan be used to group results in the outstage area. The use of this option is highly recommended. For example,
--expected-lifetimeshould be used to set an upper limit on the wallclock job duration. The value is passed directly to
jobsub_submit, see its documentation for more information.
mu2eprodsys --help to see all the options.
By default the outputs will be placed into
where $USER is the submitter user name, and $WFPROJECT is specified by
--wfproject parameter (the default is "default").
--role=Production is used (default for the mu2epro user),
the outputs will go into
mu2eprodsys --setup=/cvmfs/mu2e.opensciencegrid.org/Offline/v5_6_7/SLF6/prof/Offline/setup.sh \ --wfpro=pion-test \ --fcllist=pion-jobs.txt \ --dsconf=v567 \ --expected-lifetime=5h
You can use the
mu2e_clusters script from the
mu2egrid package, or
jobsub_q command, to check the status of your jobs. If you see that some jobs
are on hold (the "H" state), look for
HoldReason in the output of
jobsub_q --long --jobid xxx.y@fifebatchN.fnal.gov
(substitute an appropriate job ID). If you see a PERIODIC_HOLD message, that means the job
tried to use more resources (memory/disk/time) than you asked for.
jobsub_rm the offending jobs and re-submit after
adjusting the requirements. If HoldReason is not a
PERIODIC_HOLD, open a servicedesk ticket.
|Security, Privacy, Legal|