Reconstruction Modes
|
||
Working groups |
Blessed plots and figures |
Approving new results and publications |
Approval web pages - new results |
Approval web pages - new publications |
Mu2e Acronyn Dictionary |
Fermilab Meeting Rooms |
Fermilab Service Desk |
ReadyTalk : Home |
ReadyTalk : Help |
ReadyTalk : Toll Free Numbers |
This page discusses the two methods of reconstruction in art, scheduled reconstruction, which uses trigger paths, and unscheduled reconstruction, which is usually called reconstruction on demand. It is presumed that the reader is reasonably familiar with the art run-time configuration system.
You may also want to look at the art wiki page on art framework parameters
On this page, the use of paths is illustrated using an example, how to use filters to write one subset of events to one output file and a different subset of events to a second output file. The two subsets may be disjoint or they may contain events in common; it is legal that some events may be written to no output file. The example shows the case of writing two output files but art alllows one to write many output files in one job.
Consider the following problem. You wish to run a job that has:
process_name: filter1 source: { # Configure some services here. } physics: { producers : { aProducer: { module_type: MakeA } bProducer: { module_type: MakeB } } analyzers : { checkAll: { module_type: CheckAll } } filter : { selectMode0: { module_type: Filter1 mode: 0 } selectMode1: { module_type: Filter1 mode: 1 } } mode0: [ aProducer, bProducer, selectMode0 ] mode1: [ aProducer, bProducer, selectMode1 ] analyzermods: [ checkAll ] outputFiles: [ out0, out1 ] trigger_paths : [ mode0, mode1 ] end_paths : [ analyzermods, outputFiles ] } outputs: { out0: { module_type: RootOutput fileName: "file0.root" SelectEvents: { SelectEvents: [ mode0 ] } } out1: { module_type: RootOutput fileName: "file1.root" SelectEvents: { SelectEvents: [ mode1 ] } } }The color key used above is explained here. The following names are identifiers reserved to art: process_name, source, physics, producers, analyzers, filters, trigger_paths, end_paths, outputs. To be a little more precise, FHiCL names obey scoping rules similar to C++; therefore the identifier process_name is really only reserved to art within the outermost scope; but it would seem to be needlessly confusing to use process_name as the name of a parameter within some other scope. The names trigger_paths and end_paths are artifacts of the first use of the CMS framework, to simulate the several hundred parallel paths within the CMS trigger; their meaning should be come clear after reading the remainder of this page.
The following are module labels: aProducer, bProducer, checkAll, selectMode0, selectMode1, out0, out1 . For a module label you may choose any name so long as it is unique within a job and is not one of the names reserved to art.
The following are names of paths: mode0, mode1, outputFiles, analyzermods. For the name of a path you may choose any name so long as it is unique within a job and is not one of the names reserved to art. Any name that is a top level name inside of the physics parameter set is either a reserved name or it is the name of a path.
When understanding a FHiCL document it is important to recognize which identifiers are module labels and which are path names. It is also important to recognize that paths are lists of module labels, while the two reserved names, trigger_paths and end_paths are lists of paths. Finally, it is important to distinguish between a class that is a module and instances of that module class, each uniquely identified by a module label.
Art has several rules that were recommended practices in the old framework but which were not strictly enforced by that framework. Art enforces some of these rules and will, soon, enforce all of them:
This example happens to separate the analyzer modules and the output modules into separate paths; that might be convenient at some times but it is not necessary. One would also get the same behaviour from,
xxx: [ checkAll, out0, out1 ] end_paths : [ xxx ]On the other hand, keeping trigger paths separate has real meaning.
Art's scheduling stategy is described below. Some of the details are remnants of compromises and conflicting interests with CMS. One of the top level rules in the scheduler is that all producers and filters should be run first, using the ordering rules specified below. After that, all analyzer and output modules will be run. Moreover, analyzer modules and output modules may not modify the event may not have side effects that influence the behaviour of other analyzer or output modules. Therefore art is free to run all analyzer and output modules in any order. The full description of the scheduler strategy is given below:
For simple cases, in which there is one trigger path with only a few modules in the path, and one end path with only a few modules in the path, the extra level of bookkeeping is just extra typing with no obvious benefit. The benefit comes when many work groups wish to run their modules on the same events during one art job; perhaps this is a job skimming off many different calibration samples or perhaps it is a job selecting many different streams of interesting Monte Carlo events. In such a case, each work group needs only to define their own trigger path and their own end path, without regard for the requirements of other work groups; each work group also needs to ensure that their paths are added to the end_paths and trigger_paths variables. Art will then automatically, and correctly, schedule the work without redoing any work twice and without skipping work that must be done. This feature came for free with art and, while it imposes a small burden for novice users doing simple jobs, it provides an enormously powerful feature for advanced users. Therefore it was retained in art when some other features were removed.
The art operating mode described above is known as scheduled reconstruction, in which the order of modules is given by a user supplied schedule, the trigger_paths. There is a second operating mode, reconstruction on demand, also called unscheduled reconstruction. This mode is not currently used by Mu2e but we might decide to use it at a later date. Some of the features of reconstruction on demand are critical for dealing with some of the practical problems faced by a large experiment like CMS. It is not yet clear if the tradeoffs will result in the same decision for a much smaller experiment such as Mu2e.
In reconstruction on demand, it is not necessary to provide any trigger_paths. It is necessary to provide:
services.scheduler.allowUnscheduled: trueIn this mode art behaves as follows:
One of the big advantages of reconstruction on demand is that the end user never needs to delare the required order of producers; art can figure it out on its own. Therefore the concern about two trigger_paths having an inconsistent order of modules is moot. On a small experiment such as Mu2e, which might only ever have a handful of trigger_paths within one art job, this is small win. For large experiments that may have many trigger_paths within one job, it is a big win.
If we take one step further, and ask that all EDAnalyers and all output modules declare in their constructors what data products they require as inputs, then it is possible for reconstruction on demand to identify opportunities for parallel evalulation of EDProducers. If sub-event multithreading evolves into a useful feature, this extension to reconstruction on demand would allow it naturally.
Consider again the case that a requested data products both exists in the input event and can be produced by a registered EDProducer. After the producer has run, both data products will be present in the event but they are easily distinguished because art labels each data product with a four part data product ID and one part of this ID is the process_name. If an art process reads an input file, and if any of the data products from that input file have an ID with a process_name field that matches the process_name of the current process, then art will throw an exception. Therefore the two data products in question are guaranteed to have IDs that differ at least in their process_name field.
The CLEO III experiment also had a reconstruction on demand system and their system also had the rule that if an appropriate EDProducer was available, it would be run to supercede data already in the input file. This way of thinking is essentially: if you don't want an EDProducer to run, don't configure it into the producer set. If you do put it in the producer set, it will take priority over existing data products. s
CMS experience has shown that, in order for reconstruction on demand to work well, it is important that modules ask for their input data products using a well qualified name. Normally this is the getByLabel method of art::Event. Consider the case of writing a cluster finder for the calorimeter system; the input to this system will be a list of hit Avalanche Photo Diodes (APDs). There might be several different modules that can produce a list of hit APDs; perhaps one module makes simulated APD hits from MC truth information while another module unpacks raw data to produce APD hits. Perhaps their might be several standard configurations of this last module, one with tight pedestal cuts on one with loose pedestal cuts. The same cluster finder module can run on all of these different inputs, without recompilation, using the following pattern in the cluster finder module:
// In the member data. std::string _caloReadoutModuleLabel; // In the intializer list of the constructor: _caloReadoutModuleLabel(pset.get<string>("caloReadoutModuleLabel","CaloReadoutHitsMaker")), // In the analyze method art::Handle<CaloHitCollection> caloHits; event.getByLabel(_caloReadoutModuleLabel,caloHits);The default behaviour of this code fragment is to get the following object from the event: an object of type CaloHitCollection that was produced by a module whose label is "CaloReadoutHitsMaker". In this pattern the label of the EDProducer module is run-time configurable so exactly the same module can be run on different inputs. The art metadata system will store the information about the source of the input hits that were used by the cluster finder.
This note needs to be extended to include the use of filters when reconstruction on demand is enabled. It also needs to talk about the meaning of "keep *_*_*_*" in an output module when reconstruction on demand is enabled: this will trigger running all of the registered producers and all of their data products will be written to the output file.
For a discussion about the keep/drop syntax when using Scheduled Reconstruction, see the discussion of configuring output modules.
Security, Privacy, Legal |