Mu2e Home
Making Data Products
Search
Mu2e@Work


Introduction

A Data Product is anything that you can add to an event or see in an event. Examples include the generated particles, the simulated particles produced by Geant4, the hits produced by Geant4, tracks found by the reconstruction algorithms, clusters found in the calorimeters and so on.

This page contains a short description of how to add a data product to an event and how to define a new type of data product. For more complete information, consult the the CMS documentation for making new products.

A Minimal Module

The code fragment below shows a minimal example of an EDProducer module that adds a StrawHitCollection to the event. If you look back through the nested header files, you will see the StrawHitCollection is just a typedef for std::vector<mu2e::StrawHit> and that StrawHit is a very simple class, really nothing more than a simple struct.

#include "art/Framework/Core/EDProducer.h"
#include "art/Framework/Core/ModuleMacros.h"
#include "MCDataProducts/inc/StrawHitCollection.hh"

namespace mu2e{

 class MyClass : public art::EDProducer {

  public:
    explicit MyClass(fhicl::ParameterSet const& pSet)
    {
      produces<StrawHitCollection>();
    }
    virtual ~MyClass() { }

    virtual void produce(art::Event& e );


  };

  void MyClass::produce(art::Event& event ) {

    unique_ptr<StrawHitCollection> p(new StrawHitCollection);

    // Some sort of loop to fill the collection:
    for ( int i=0; i<10; ++i){
       p->push_back(StrawHit(...));
    }

    event.put(std::move(p));
  }

} // end of namespace mu2e

using mu2e::G4;
DEFINE_ART_MODULE(G4);

In the above fragment there is a member function of MyClass named produce (singular) and a member function of the base class named produces (plural); the second function is called in the constructor of MyClass. The following text refers to both - so pay attention to which of the two is being discussed. The following pattern describes any producer module:
  1. The class must inherit from art::EDProducer.
  2. The constructor must tell the framework what it produces; it does so via the call to produces<StrawHitCollection>(). This is described in more depth below.
  3. Data products are added to an event inside the produce method. A three step pattern is used:
    1. Create an unique_ptr to an empty object.
    2. Fill the object.
    3. Give the unique_ptr to the event.
    It might be possible to create a fully formed object in step 1; in that case there is no step 2.
  4. The code must invoke the macro DEFINE_ART_MODULE as shown in the last two lines. This line may appear anywhere in the file after the definition of the class. Mu2e has adopted the convention of putting it at the end of the file.
  5. After the call to event.put(...), the variable p no longer points at anything. If you try to use it, you will get a run-time error. Therefore you should run diagnostics and other things that read your data product before the call to event.put(...).

You might try the following: call event.put(....) and then get the data product out of the event using one of the get methods. This will not work. The reason is that a data product is not actually registered with the event until the produce method of the module returns. The logic behind this restriction is that if a module fails, then none of its data products should be available via the get interface; therefor event.put(...) only schedules the data product for addition to the event and that addition occurs when the module returns from the produce call.

More about produces<T>();

In the constructor, there is a call to a function template produces<T>(). This tells the framework that when the produce method of this class is called, it is expected to add a data product of type T to the event. If the produce method is expected to add more than one data product to the event, then there must be a corresponding call to produces for each data product.

If the produce method tries to add a product for which it did not make a produce<T>() call, then the framework will throw. The default response to this exception is to stop event processing and to shut down as gracefully as possible; normally this means that your histogram files an log files will be flushed and closed properly.

One natural question is "what should I do if this particular event has no StrawHits"? One needs to distinguish two cases here. If it is perfectly normal that some events will produce no StrawHits, then you should put an empty StrawHitCollection into the event. The event data model is perfectly happy to hold empty collections. If it is an error for any event to produce no StrawHits, then you should issue an appropriate error message using the message logger. If it is sufficiently severe error, then you should throw an appropriate exception.

In an earlier version of this document it was stated that the framework would throw an exception if a model failed to produce one of its data products advertised via produces calls. This is not true and never was true - the older document was wrong. It remains true that, for objects that are collection types, the recommended procedure is to put an empty collection into the event rather than to put nothing into the event; this greatly simplifieds code that reads your output.

If you are wondering where the produces function lives, it comes from deep down in an inheritance chain. First look in the header file for the base class, EDProducer. That class inherits from some other class; check its header file. After several levels you will find the base class that defines produces.

If one module wishes to produce two or more data products of the same data type, these can be distinguished using the instance name argument to produces and put:

SampleProducer(fhicl::ParameterSet const& ps){
 produces<T>("version1");
 produces<T>("version2");
}

void SampleProducer::produce(art::Event& e ){
   std::unique_ptr result1(new SampleCollection);
   std::unique_ptr result2(new SampleCollection);
   // ... fill the collections ...
   e.put(std::move(result1),"version1");
   e.put(std::move(result2),"version2");
}


where the text strings must be unique but have no other requirements.

Declaring new Data Products

In the above description it was presumed that the class to be added to the event was already known to the framework. A class is made known to the framework using the genreflex system from ROOT, as described below.

Declaring a data product to the system uses two files, named classes_def.xml and classes.h. By convention these files are located in the src subdirectory of each data product package, for example RecoDataProducts/src/classes_def.xml and RecoDataProducts/src/classes.h In principal every cvs module could define its own data products but we have chosen, instead, to segregate the data products in a small number of packages. This enforces the separation of data classes and algorithm classes and makes is possible to load the data product libraries without having to load the much more complex algorithm classes.

If the only data product we had were StrawHitCollection, then classes_def.xml would look like:

<lcgdict>
 <class name="mu2e::StrawHit"/>
 <class name="mu2e::StrawHitCollection"/>
 <class name="art::Wrapper<mu2e::StrawHitCollection>"/>
</lcgdict>
and classes.h would look like:

#include ...

#include "ToyDP/inc/StrawHitCollection.hh"

template class art::Wrapper<mu2e::StrawHitCollection>;

The rule for classes.h is that the Wrapper line must be present for every class that can be given to the event using a call to event.put(...); in this case that is just the StrawHitCollection. The non-wrapper lines must be present for that class, StrawHitCollection, and for all of the classes that are among the persistent data of StrawHitCollection, either directly or indirectly. This applies recursively until only primitive objects are found ( that is, we do not need lines for int, double, float, char and so on).

There is one exception to the rule that you must recursively declare all classes that are data members of your class. You must not declare them if they are already found in another dictionary that is known art. For example none of the Mu2e dictionaries includes a reference to CLHEP::Hep3Vector or CLHEP::HepLorentzVector; these are found in

$ART_DIR/source/art/art/Persistency/CLHEPDictionaries/classes_def.xml.

Other dictionaries defined in the art source area:

art/Framework/IO/ProductMix/classes_def.xml
art/Persistency/CetlibDictionaries/classes_def.xml
art/Persistency/WrappedStdDictionaries/classes_def.xml
art/Persistency/CLHEPDictionaries/classes_def.xml
art/Persistency/FhiclCppDictionaries/classes_def.xml
art/Persistency/Common/classes_def.xml
art/Persistency/StdDictionaries/classes_def.xml
art/Persistency/Provenance/classes_def.xml

If we had decided that it made sense to add a single StrawHit as a data product, then we would also need to write the wrapper line for StrawHit. Instead we decided that if you would like to store a single StrawHit, you need to store it by creating a collection with only one member and storing that collection.

Every class for which there is a wrapper line in class_def.xml must also be declared in classes.h; but classes from the non-wrapper lines of classes_def.xml should not be present in classes.h. The appropriate #include must also be present for the header file of the classes that appear in the dictionary section of classes.h.

There is a second class of things that must be present in classes.h. If any data product has a data member that is an an instantiation of a templated class, then the templated class must be present in classes.h. Look, for example, at Offline/ToyDP/src/classes.h. The class mu2e::SimParticleCollection has a data member of type std::map<MapVectorKey,mu2e::SimParticle>; that class has a data member of type std::pair<MapVectorKey,mu2e::SimParticle>. Both of these classes must be declared in classes.h.

There is a syntax to make only a subset of the data members of a class persistent. There is also a syntax to tell the framework to make a data product purely transient: that is, it can be added to the event so that other modules may use it, but it will never be written out. For details see the next two sections and see also the CMS documentation for making new products.

Transient Data Products

It is possible to tell the framework that it should allow data products of a certain type to be added to the event but that it should never write out data products of that type. This is useful, for example, for data products that are full of bare pointers. To declare the class MyClass as a transient data product you need to add one line to classes_def.xml

 <class name="MyClass" persistent="false"/>

and one line to classes.h,

#include "MyClass.hh"
One should not provide the lines for the art::Wrapper to either of these files. Moreover it is not necessary to provide lines in classes_def.xml that describe the classes used as data members inside MyClass. When an output module encounters this data product it will not try to persist the data product.

See also the CMS documentation for making new products.

Transient Data Members within a Persistable Class

It is also possible to declare that a data member of a class is transient. This is done in classes_def.xml. Suppose that the class MyClass has a data member with the name _field of type T. The data member can be declared transient using the syntax:

 <class name="MyClass">
    <field name="_field" transient="true"/>
 </class>

In this case the data member _field will not be written to the output file but the remaining data members of MyClass will be. When these objects are read back, the data member _field will be invalid and the user needs to know not to access this data member until it can be properly initialized by some other method. Ideally MyClass should protect against illegal access either by initializing on demand or by throwing.

If there are no persisted objects of type T in the any of the data products, then it is not necessary to declare the type T in classes_def.xml.

See also the CMS documentation for making new products.

Identifiers of a Data Product

Each data product within an event is unqiuely identified by a 4 part identifier, with the parts separated by an underscore character:
 DataType_ModuleLabel_InstanceName_ProcessName
  1. DataType is a "friendly" version of the name of the data type that is stored in the product. The name includes all namespace information. The friendly part is the way that it deals with collection types:
    • If a product is of type T, then the friendly name is "T".
    • If a product is of type mu2e::T, then the friendly name is "mu2e::T".
    • If a product is of type std::vector<mu2e::T>, then the friendly name is "mu2e::Ts".
    • If a product is of type std::vector< std::vector<mu2e::T> >, then the friendly name is "mu2e::Tss".
    • If a product is of type cet::map_vector<mu2e::T>, then the friendly name is mu2e::Tmv. See below for a discussion about where underscores may not be used; this example is safe because of the substituion of mv for map_vector.
  2. ModuleLabel identifies th e module that created the product; this is the module label, which distinguishes multiple instances of the same module within a produces; it is not the class name of the module.
  3. InstanceName is a label for the data product that distinguishes two or more data products of the same type that were produced by the same module, in the same process. If a data product is already unique within this scope, it is legal to leave this field blank. The instance label is the optional argument of the call to "produces" in the constructor of the module (xxxx below):
          produces<T>("xxxx");
          
  4. ProcessName is the name of the process that created this product. It is specified in the fcl file that specifies the run time configuration for the job (ReadBack02 below):
          process_name : ReadBack02
          

Because the full name of the product uses the underscore character to delimit fields, it is forbidden to use underscores in any of the names of the fields. Therefore none of the following may contain underscores:

You can also read about which names need to match each other.

Writing only Selected Events and Selected Data Products

It is possible to configure an art job so that it writes selected events to one or more different output files. It is also possible to configure each output file so that only selected data products are written to that file. These operations are described in the web page on configuring output files.


Fermilab at Work ]  [ Mu2e Home ]  [ Mu2e @ Work ]  [ Mu2e DocDB ]  [ Mu2e Search ]

For web related questions: Mu2eWebMaster@fnal.gov.
For content related questions: kutschke@fnal.gov
This file last modified Thursday, 15-Nov-2018 12:06:53 CST
Security, Privacy, Legal Fermi National Accelerator Laboratory