Introduction to Mu2e grid jobs
Mu2e uses the jobsub_client
package provided by
Fermilab Computing Division to run jobs on the grid. Therefore
general information about
running grid jobs
and
jobsub
applies.
The mu2egrid
package provides Mu2e-specific code
required for using jobsub. All scripts in the package support the
--help
option. If it is present on the command line, all
other options will be ignored. All scripts also
support --dry-run
and --verbose
options to
show what will be done without performing the action. A basic support
for G4beamline
and MARS
jobs is
described here.
The rest of this documentation focuses
on running framework (art
) jobs.
Please consider the following points before running grid jobs:
- Is there a standard dataset that you can use instead of
starting from scratch? Some are listed
here.
- The length of the job. It is recommended to have jobs shorter
than about 8 hours but longer than 15 minutes.
- The total size of its input and output files. Worker nodes at
Fermilab have 30GB of disk space per core.
- The size of those outputs files that are intended for tape
storage. It is inefficient to store small files on tape. Good
file sizes are from a few hundred MB to a few GB. If jobs are
too slow to produce sufficiently large files, one should
consider concatenating outputs
before writing them to tape.
- Memory required to run the job. See
Estimating per-job resources.
- If the current job writes out framework files for use by
subsequent job stages, the above criteria should be applied to
analyze the whole processing chain. It is important to remember
that later stages can easily read multiple input files per job,
but can not "split" an existing art file. For example, the
configuration to produce standard g4s4 conversion electron
datasets cnf.mu2e.....fcl runs only 1000 events per job,
resulting in short (XXX minutes) jobs that produce small (XXX
MB) output files. However digi+tracking jobs with background
mixing would be too slow for singificantly larger g4s4 inputs.
- Before submitting any jobs to the grid, make sure that
tape-based input datasets, if any,
are pre-staged to disk.
To run some quick tests it is sufficient to
- Generate a set of fcl files.
- Submit mu2eprodsys grid jobs.
To be able to use the full functionality of the system, such as
recovering failed jobs and/or storing outputs on tape, more steps are required:
- Generate a set of fcl files that completely define the jobs to be run.
- Register the fcl dataset with SAM, and make fcl files available on /pnfs.
- Make sure that input data files, if any, are pre-staged to disk.
- Submit mu2eprodsys grid jobs.
- Check results, identify and re-run failed jobs.
- Store outputs on tape:
Next: generate fcl files.
This file last modified Thursday, 15-Nov-2018 12:06:52 CST