Mu2e Home
Deprecated art Services
Search
Mu2e@Work

This page describes services that are distributed as part of art and which are now deprecated. Please switch to using their replacements as soon as possible. The replacements are described on the page describing services provided by art.
  1. Timing
  2. SimpleMemoryCheck


Timing

This service is deprecated. For it's replacement see the documentation for the TimeTracker service.

There are two sources of timing information available in art, the Timing service and timers that are part of the scheduler. The Timing service is off by default but can be turned on in the .fcl file. The timers in the scheduler are always on and the detail-level in the printout can be controlled in the .fcl files.

By default, the only timing information printed out is a short summary at the end of job, for example

TimeReport ---------- Time  Summary ---[sec]----
TimeReport CPU = 83.994231 Real = 83.998495
This printout comes from the scheduler and is obtained using the system call getrusage. The printed numbers represent the sum of times spent in the per-event methods of each module, including producers, filters, analyzers, and output modules. I believe, but am not sure that the per-event time in the source module is included. Another way to say the same thing is that the time spent in constructors, begin/end Job/Run/SubRun are not counted in these numbers.

It is not completely clear to me what the precision of the underlying clocks are. The underlying system call, getrusage, reports times to the microsecond but it is not clear which of the low order bits are significant. A survey of web sites has only confused the matter. My current best guess is that the precision is sub-ms and may be on the order of 10's of micro seconds.

Much more detailed end-of-job timing information is available from the scheduler by adding the following fragment after the close of the services block in your .fcl file:

services.scheduler.wantSummary: true

Much more detailed, but less precise, information is available by using the timing service. This service is off by default but, when enabled, it prints out crude timing information after each call to each module's per-event member function; it prints elapsed wall clock time, not CPU time. The underlying system call is gettimeofday, which reports times to 1 micro-sec but which the man page says, has a resolution of 10 ms. Moreover the underlying timer is subject to jumps or drift induced by NNTP synchronization events. There is no way to enable the timing service from the command line. To enable the default behaviour, put the follownig fragment in the services section of your .fcl file:

services : {
   Timing : {}
}
To supress the information printed after each call to each module, and get only an end of job summary, set the summaryOnly parameter to true:
services : {
   Timing : { summaryOnly : true }
}
The sourcecode to make the printout from scheduler can be found at: If you want more details about what the times reported by the scheduler actually mean, you can start exploring from here.

The source code for the timing service is available at:

At present the timing information is only available in the log file. So if you want a histogram of time per event within a particular module, you will have to parse the log file. There is a request in to provide an API to access this information from within a user written module or service.

Caveats About Event Timing

There are two important caveats about event timing. When running Mu2e GEANT4 simulations, there are outlier events that take up to a few hundred times longer than the median event. These events occur at the percent level. When you run a short grid job in order to estimate the time for a much longer job, be aware that your estimates could be off because too many or too few of these events are in the small sample sample. Be sure to run long enough test jobs to average this out; a test job that executes for 30 minutes should usually be safe.

I presume that a similar issue exists for running full reconstruciton on simulated events but I have not yet quantified it.

A corollary of the above is that the numbers from the timing report should be more robust than the estimate you would get by runninng two test jobs, one with few events and one with many events, in order to measure the startup overhead. This method is vulnerable to one of the events in the first job being a long-cpu-time outlier.

The second issue is that different grid slots have different CPU power. Most of the GP Fermigrid nodes have the same CPU power per slot as do detsim or mu2egpvm*. But there are some older machines around, some of which are slower per core but some of which are faster per core ( but with fewer cores per chip ). If you run jobs opportunistically, which the mu2egrid scripts do by default, then you may be exposed to machines with either more or less CPU power than those on which you ran your test jobs.

Taken together, these two effects can lead to a factor of two difference in execution time among grid processes in one grid job cluster.


SimpleMemoryCheck

This service is deprecated. For it's replacement see the documentation for the MemoryTracker service.

The SimpleMemoryCheck service monitors memory use either at the end of each module or at the end of each event; it reports whenever memory use increases. Memory use may increase for legitimate reasons, such as encountering a big event, or because of a memory leak. This service does not distinguish between these reasons but we should usually be able to identify memory leaks by plotting memory size vs number of processed events for a long enough job.

This service can be enabled or disabled from the command line:

  > mu2e --memcheck   -c input.fcl
  > mu2e --nomemcheck -c input.fcl

The service may also be configured in an .fcl file, which allows more detailed control of the printout:

services :
{
  SimpleMemoryCheck :
  {
    oncePerEventMode    : true
    showMallocInfo      : false
    ignoreTotal         : 5
    moduleMemorySummary : false
  }
}
If the service is enable/disabled both in the .fcl file and on the command line, the command line takes precedenence.

If oncePerEventEventMode is false, information is printed at the end of each module; if it is true information is printed at only at the end of each event. If showMallocInfo is true, then additional information is added to each printout. The ignoreTotal parameter specifies the number of events to skip before making reports; this is useful to supress startup noise. If moduleMemorySummary is true, it prints additional end of job summary information.

The source code for the SimpleMemoryCheck service is available at:

This service monitors memory allocated to the program and not memory used by the program; this means that two runs of the same program need not yield identical results! When we need a more powerful tool, our colleagues who develop and maintain art recommend valgrind, which can distinguish between memory leaks and other (mis)uses of memory. Valgrind is a commercial product that requires a license to run; the lab has several licenses. Our colleagues who maintain art have a license, are able to build the Mu2e code and to run it under valgrind.


Fermilab at Work ]  [ Mu2e Home ]  [ Mu2e @ Work ]  [ Mu2e DocDB ]  [ Mu2e Search ]

For web related questions: Mu2eWebMaster@fnal.gov.
For content related questions: kutschke@fnal.gov
This file last modified Wednesday, 29-Jun-2016 11:57:26 CDT
Security, Privacy, Legal Fermi National Accelerator Laboratory