It is advisable to start with a clean shell (no Offline setup),
because the jobsub_client
package requires a different
Python version then Mu2e Offline.
setup mu2e setup mu2egrid
One has to make a list of fcl files to run, with their full pathnames. If the files were kept in their original bluearc location, just "ls" them, like:
ls /mu2e/data/users/gandr/fclds/20161121-ce-reco/000/*.fcl > ce-reco-jobs.txt
For files uploaded to a /pnfs location and registered in SAM, one
can use mu2eDatasetFileList
.
Continuing with the fcl upload
page example,
setup mu2efiletools mu2eDatasetFileList cnf.`whoami`.my-test-s1.v0.fcl > pion-jobs.txt
The mu2eprodsys
command is used to submit grid jobs.
A single execution of mu2eprodsys
submits a single
Condor cluster of jobs. We have been asked to limit the size of
clusters to 10,000 jobs. If the total number of jobs to be run
exceeds 10,000, the list of fcl files should be split into chunks
with each chunk not exceeding 10,000 files. One can use the
Linux split
command to do this.
The required parameters are:
--setup
the setup script that determines which
version of Offline to use. To run jobs offsite, the Offline
release must reside in cvmfs
. For on-site running
one can put code on the /mu2e/app
disk.--fcllist
The list of uploaded fcl files in /pnfs
--dsconf
An arbitrary string that will be used for
the "configuration" field of output files.
Some of the options
--dsowner
the "owner" (in the dataset naming
convention sense) of output datasets. Defaults to the user who
runs mu2eprodsys
, unless the user is "mu2epro".
In the latter case --dsowner
is set to "mu2e"
to produce "official" datasets.
--wfproject
can be used to group results in the
outstage area. The use of this option is highly recommended.
For example, --wfproject=beam
or --wfproject=cosmic
.
--expected-lifetime
should be used to set an upper
limit on the wallclock job duration. The value is passed directly
to jobsub_submit
, see its documentation for more
information.
Run mu2eprodsys --help
to see all the options.
By default the outputs will be placed into
/pnfs/mu2e/scratch/users/$USER/workflow/$WFPROJECT/outstage/
where $USER is the submitter user name, and $WFPROJECT is specified by
the --wfproject
parameter (the default is "default").
If --role=Production
is used (default for the mu2epro user),
the outputs will go into
/pnfs/mu2e/persistent/users/$USER/workflow/$WFPROJECT/outstage/
instead.
mu2eprodsys --setup=/cvmfs/mu2e.opensciencegrid.org/Offline/v5_6_7/SLF6/prof/Offline/setup.sh \ --wfpro=pion-test \ --fcllist=pion-jobs.txt \ --dsconf=v567 \ --expected-lifetime=5h
You can use the mu2e_clusters
script from the mu2egrid
package, or
the jobsub_q
command, to check the status of your jobs. If you see that some jobs
are on hold (the "H" state), look for HoldReason
in the output of
jobsub_q --long --jobid xxx.y@fifebatchN.fnal.gov
(substitute an appropriate job ID). If you see a PERIODIC_HOLD message, that means the job
tried to use more resources (memory/disk/time) than you asked for.
jobsub_rm
the offending jobs and re-submit after
adjusting the requirements. If HoldReason is not a
PERIODIC_HOLD, open a servicedesk ticket.
Security, Privacy, Legal |