Mu2e Home
Inter-Product References
Search
Mu2e@Work


Introduction

One of the longstanding technical problems facing HEP software has been how to make pointers persistable; that is, if my code uses a pointer to express a relationship between two objects, how can I write those two objects to disk or tape, read them back in and have the pointer restored so that it correctly points at the other object? The issue is that the pointee will have one address in memory when the job is first run but it will have a different address in memory when it is read back in a later job; so the value of the pointer must be reset to match the new location of the pointee. Ideally the pointer should be restored in an automatic way that requires no special action on the part of the physicist reading the persistent pointer and only minimal special action on the part of the physicist creating the persistent pointer.

Since the adoption of C++ by the HEP community, many HEP experiments have developed their own solutions to this problem, all of which work at some level but none of which are entirely satisfactory. The art team, working with Mu2e, NOvA and the Liquid Argon TPC experiments, has developed a solution that builds on the experiences, good and bad, of many previous experiments. This solution has two main components, each described below, art::Ptr<T> and art::Assns<A,B,D> ; there is also a special case of art::Ptr<T> that, in some cases, allows a more compact persistent form, art::PtrVector<T>. A critical feature of all of these tools is that, when the pointee exists in memory, the persistent pointer behaves as a naive user would expect; if, for any reason, the pointee does not exist, then the persistent pointer will throw an exception; it is possible to test for the existence of the pointee.


art::Ptr<T>

The art::Ptr<T> will be introduced by way of a concrete example from Mu2e code.

When Mu2e runs Geant4, it exports information out of Geant4 and stores that information as art data products. One of the data products is a full mother daughter history of all particles known to Geant4; this includes all of the particles created by event generators and imported into Geant4; and it includes all of the particles created by Geant4. The information about one such particle is stored as an object of type SimParticle, MCDataProducts/inc/SimParticle.hh. The ensemble of all such particles is stored as an object of type SimParticleCollection, MCDataProducts/inc/SimParticleCollection.hh. A SimParticleCollection is a art data product and can be obtained directly from the art::Event. An equivalent statement is that it is a first tier object within the event-data model.

Another data product is a collection of objects that record that a particular SimParticle took a step of some length inside a particular sensitive volume. One such step is stored as an object of type StepPointMC, MCDataProducts/inc/StepPointMC.hh. Each StepPointMC records the starting point of the step, the length of the step, the energy deposited in the material during the step and so on. It also records the the identity of the SimParticle making the step, which is recorded as a persistable pointer to the appropriate SimParticle. If one has a StepPointMC object, one may access the SimParticle that took the step as follows:

   StepPointMC const& step = ....; // get this from somewhere
   SimParticle const& sim  = *step.simParticle();
As usual in most Mu2e code, objects in the event have a lifetime that is long compared to your code that is reading the event, so it is recommended to receive these objects by const reference, not by value.

The class MCDataProducts/inc/StepPointMC.hh has a data member and an accessor:

  private:
  art::Ptr<SimParticle> _track;

  public:
  art::Ptr<SimParticle> const& simParticle() const { return _track; }
The data member is the persistable pointer and the template argument tells us that it points to an object of type SimParticle. To learn what else one can do with an art::Ptr, inspect the header file, art/Persistency/Common/Ptr.h. Some examples include:
   StepPointMC const& step = ....; // get this from somewhere
   art::Ptr<SimParticle> const & simPtr(step.simParticle());

   if ( simPtr.isAvailable() ) {
      // Do something only if the pointee is available
   } else {
      // Do something else if the pointee is not available
   }

   // These accessors throw if the pointee does not exist.
   SimParticle const&  sim   = *simPtr;
   SimParticle const*  sim1  = simPtr();

   art::ProductID const& id            = simPtr.id();
   art::Ptr<SimParticle>::key_type key = simPtr.key();

   // Return a pointer to the pointee, if it exists;
   // if it does not exist, return a null pointer.
   // It is the end user's responsibility to check for non-null.
   SimParticle const* sim1(simPtr.get());

The SimParticle class also uses art::Ptr's to implement its mother/daughter history and to link to generated particles. Each SimParticle either has a mother SimParticle or it has an associated GenParticle, always one but never both.


   private:
     // Data members
     art::Ptr<GenParticle>               _genParticle;
     art::Ptr<SimParticle>               _parentSim
     std::vector<art::Ptr<SimParticle> > _daughterSims;

   public:
     // Accessors
     art::Ptr<GenParticle> const&               genParticle() const { return _genParticle;  }
     art::Ptr<SimParticle> const&               parent()      const { return _parentSim;    }
     std::vector<art::Ptr<SimParticle> > const& daughters()   const { return _daughterSims; }

     // Where was this particle created: in the event generator or in G4?
     bool isSecondary()   const { return _parentSim.isNonnull(); }
     bool isPrimary()     const { return _genParticle.isNonnull(); }

     // Some synonyms for the previous two accessors.
     bool hasParent()     const { return _parentSim.isNonnull(); }
     bool fromGenerator() const { return _genParticle.isNonnull(); }
     bool madeInG4()      const { return _genParticle.isNull();    }

An art::Ptr has an operator->() that behaves just like that of any other pointer type:

   if ( step.simParticle().isPrimary() ){
     cout << "Generator id is: " << step.simParticle()->genParticle().generatorId() << endl;
   }

The two code fragments above illustrate two different notions of validity. The test simPtr.isAvailable() checks both that the requested data product is available and that the requested key is present in the data product. The tests _genParticle.isNull() and _genParticle.isNonnull() only check whether or not the value of the key is the reserved value used to indicate a default constructed art::Ptr. This latter check is sufficient for the isPrimary() and isSeconary() methods of SimParticle; the isAvailable() test must be used when the existence of the data product, or of the key within the data product, is in doubt.

In the examples shown so far, all of the art::Ptr objects have been found inside data products. But this is not necessary; one may create and use an art::Ptr as a variable or argument in any code; the art::Ptr need not be in a data product but the object to which it points must be.

An art::Ptr object is copyable so one may create a collection of art::Ptr's, for example:

  std::vector<art::Ptr<T> >
In such a collection, the pointees may live in many different data products ( all of which must be of the same data type ). This is done, for example, in /HitMakers/src/MakeStrawHit_module.cc, which looks at many StepPointMCCollections and forms one StrawHitCollection from the ensemble of all StepPointMC.

The art::Ptr technology has, by design, several limitations. One can write an art::Ptr to an object if and only if two things are true:

  1. The object is an element of one of the supported a collection types, such as std::vector, std::map and cet::map_vector.
  2. The collection type is a data product; that is, it is a first tier object within the art::Event.
An equivalent statement is that is one can only write an art:Ptr to a second tier object and one may do so only if the first tier object is an appropriate collection type. One cannot write an art::Ptr that points a first tier object; instead one should get the data product directly from the event. And one cannot write an art::Ptr that points to a third or lower tier event-data object; to access such objects, follow the art::Ptr to the second tier object and call the appropriate method of the second tier object; and so on to access lower tier objects.

These limitations are present in order to keep art::Ptr simple enough that is robust and is maintainable by a small staff. Moreover, it is straightforward for user code to access objects that cannot be directly accessed using an art::Ptr.

There is an important new feature planned for art::Ptr. In the near future it will be possible to write an art::Ptr that points at a second tier object that lives within either the Run or SubRun objects associated with the current art::Event object. At present, the pointee must live within the art::Event object.

Creating an art::Ptr

There are two distinct use cases for making an art::Ptr object.

Both use cases are illustrated in the saveSimParticleStart method of
    Mu2eG4/inc/TrackingAction.hh
    Mu2eG4/src/TrackingAction.cc
This method is called from the PreUserTrackingAction method and it creates a new SimParticle inside the SimParticleCollection. If a SimParticle is a primary particle, then it will hold an art::Ptr<GenParticle> that points to the corresponding GenParticle in the GenParticleCollection, which is already in the event. If a SimParticle is a secondary particle it will have an art::Ptr<SimParticle> that points to its mother particle within the same SimParticleCollection; however the SimParticleCollection will not exist as a data product until after G4 is finished with the event. Therefore two different constructors are used. If a particle is a primary particle it will have a null art::Ptr<SimParticle> if it is a secondary particle, it will have a null art::<GenParticle>.

After the ... lines, the following code is taken from TrackerAction.cc:

   // Passed in as an argument on each call to saveSimParticleStart
   const G4Track* trk = ...;

   // The next three items are computed elsewhere and passed into
   // the TrackingAction class at the start of each event.

   // The productId for the SimParticleCollection was reserved in the constructor of
   // G4_module.cc before event processing began.
   art::ProductID _simID = ...;

   art::Event const * _event = ...;
   art::Handle<GenParticleCollection> const* _gensHandle = ...;

   // The remainder of this example is taken verbatim ( but with irrelevant code deleted ):

   int id       = trk->GetTrackID();
   int parentId = trk->GetParentID();

   // GenParticle numbers start at 0 but G4 track IDs start at 1.
   int generatorIndex = ( parentId == 0 ) ? id-1: -1;

   art::Ptr genPtr;
   art::Ptr parentPtr;
   if ( parentId == 0 ){
     genPtr = art::Ptr(*_gensHandle,generatorIndex);
   } else{
     parentPtr = art::Ptr( _simID, parentId, _event->productGetter(_simID));
   }

Note that genPtr and parentPtr require different constructors because there is no handle to the SimParticleCollection at this stage in the execution of the program. One should prefer the genPtr form of the constructor because, as soon as genPtr is instantiated, it is fully formed and is available to be used. However parentPtr, as constructed, is not usable and it will not be usable until the start of the next module to be executed.

If you create a class that has an art::Ptr as a data member, and if that class will be part of a data product, remember to add the required lines to classes_def.xml and classes.h, as discussed in the instructions for making data products. For examples look at MCDataProducts/src/classes_def.xml and MCDataProducts/src/classes.h; remember that the Wrapper lines are only needed for data products ( ie first tier objects ), not for objects within data products.

art::PtrVector<T>

It was discussed in the previous section that one may create an std::vector of art::Ptr<T> objects. In the general case, each art::Ptr object in the vector may point an object in a different data product. In the special case that all of the art::Ptr objects point to objects in a single data product, art provides a specialized class, art::PtrVector<T>. This class has a persistent representation that is smaller than that of a std::vector<art::Ptr<T> >; it is smaller because it only needs to store the art::ProductID once. The transient representation does not have a smaller memory footprint than does a std::vector<art::Ptr<T> >; indeed, under the covers, an art::PtrVector<T> holds a std::vector<art::Ptr<T> >. For more details see art/Persistency/Common/PtrVector.h.

In an earlier implementation of art::PtrVector, the one inherited from CMS, the transient representation also benefited from a reduced memory footprint, but at the expense of greater execution time. When the art development team started to add new features to art::Ptr and art::PtrVector, they decided to sacrifice the transient memory footprint in favour of faster execution. Mu2e signed off on this trade-off.


art::Assns<A,B,D>

This class template will be introduced by reference to a use case that will soon come up in the Mu2e reconstruction code.

Consider a reconstruction job in which one module finds and fits tracks in the TTracker, another module finds and classifies clusters in the calorimeter and a third module determines if any of the tracks, when extrapolated to the calorimeter, intersect any of the calorimeter clusters. Two key features of this use case are that the track-cluster match is done in a separate module and that the module operates on track and cluster data products already present in the event. It would be convenient to represent track-cluster matches using some sort of bi-direction persistable pointer.

Art provides a solution in the art::Assns class template. For definiteness of notation, lets presume that the track and cluster classes produced by the first two modules are named RecoTrack and RecoCalCluster; and also presume that these objects live in data products named RecoTrackCollection and RecoCalClusterCollection; none of these classes currently exist in the Mu2e code. The module that computes track-cluster matches can choose to store its output as a data product of type

art::Assns<RecoTrack,RecoCalCluster>
This data product is a collection of objects, each of which expresses an association between one RecoTrack and one RecoCalCluster; under the covers it holds pairs of art::Ptr<RecoTrack> and art::Ptr<RecoCalCluster>. The art::Assns class template supports 1:1, 1:many, many:1 and many:many associations; it implicitly supports 1:0 and 0:1 associations via the absence of an association object.

A physicist who wants to inspect track-cluster matches can choose to do so in one of three ways. One may loop over all associations; one may write an outer loop over RecoTracks and then an inner loop over all matched RecoCalClusters; or one may write an outer loop over RecoCalClusters and then an inner loop over all matched RecoTracks. When writing these loops, it is not important whether the art::Assns object was declared as shown above or with its template arguments reversed:

art::Assns<RecoCalCluster,RecoTrack>
When using an art::Assns object, one asks for each side of the relationship by type, not by ordinal number of template argument; that is, unlike std::pair, it does not have the notion of first and second.

The art::Assns class template provides one more important feature. The module that computes track-cluster matches could instead have chosen to store its output as a data product of type

art::Assns<RecoTrack,RecoCalCluster,MatchInfo>
where MatchInfo is an arbitrary user defined class. Presumably one would use it to store information such as the footprint, of the track on the calorimeter, the Chi-squared of the match and so on. The third template argument is optional.

Another use case for which an art::Assns would be a good solution is the following: form a simulated event, creating many data products; reconstruct the event as if it were real data, creating many data products; in a final module, or modules, determine which RecoTracks and RecoCalClusters match to which SimParticles. These results are naturally stored as data products of type:

art::Assns<RecoTrack,SimParticle,MatchInfoSimTrack>
art::Assns<RecoCalCluster,SimParticle,MatchInfoSimCluster>
where the two MatchInfo classes are arbitrary user defined classes that hold some information about the quality of the match. As with the track-cluster use case, the code that finds the relationships between the simulated and reconstructed particles is done in a separate module that operates on data products already present in the event.

Mu2e is not yet using art::Assns but we expect too soon. As we get experience, we will write additional documentation, including examples of creating and using art::Assns. In the mean time, if you would like to learn more about art::Assns, consult the art documentation on Inter-Product References.

Comments on Some Rejected Ideas

In the previous section, it was explained why it is illegal to create a data product with an empty art::Ptr that is to be filled in later by a different module. This section will comment on some other ideas for the track-cluster match use case and explain why the art::Assns solution is preferred.

Rejected Option 1

One could expand the MatchInfo class to include an art::Ptr<RecoTrack> and an art::Ptr<RecoCalCluster>; with this change the matching code could add an std::vector<MatchInfo> to the event. This would have worked and, provided one only wanted to loop over associations, it would be very close the features provided by art::Assns. The big difference is that art::Assns provides code to simplify the other two looping models: an outer loop over RecoTracks with and inner loop over matched RecCalClusters, and vice versa. Experience in other experiments has shown that writing such loops from first principles is a common source of errors that produce incomplete but otherwise correct output; such errors are notoriously hard to recognize. With art::Assns, these looping constructs are written correctly, in one place, for all types of associations.

Rejected Option 2

One could expand the RecoCalCluster class by adding an art::PtrVector<RecoTrack> and a std::vector<MatchInfo>, running the track reconstruction first and then integrating the track-cluster matching into the cluster finding algorithm. The main problem with this approach is that cluster finding and track-cluster matching are logically separate operations that should not be coupled through accidental constraints of the event-data model. The recommended approach uses three modules to implement three logically separate steps in the data reconstruction chain and each step puts its own output into the event. With the rejected approach, as one evolves the MatchInfo class, all code that knows about RecoCalCluster objects must be recompiled; this always complicate the code development cycle.

The rejected approach also introduces an artificial asymmetry in looping over matches. The code to implement an outer loop over RecoCalClusters with an inner loop over match RecoTracks will look completely different than the code to implement an outer loop over RecoTracks with an inner loop over matched RecoCalClusters. Experience with other experiments has shown that artificial asymmetries that exist only because of accidental code constraints are a common source of errors.

Another feature of the recommended approach is that it simplifies test driving alternate track-cluster matching algorithms; one may run several such modules in one job, with each algorithm operating on exactly the same input and each algorithm having a well defined place to put its output. In the rejected alternative, there is just one well defined place to write the output of the matching algorithm, as part of RecoCalClusters object. This can be made to work but the symmetries present in the ideas are not reflected in the code.


Should I Use an art::Ptr and or an art::Assns?

There is conflicting advice on this.

Rob Kutschke's advice:

When possible, and when the reference is really a one-directional thing, prefer a Ptr over and Assns. This choice makes the end user code much simpler: the end user just follows a Ptr as if it were a bare pointer; therefore it's very easy to teach. I think that hitting a new user with an art Assns early in the teaching process will be very difficult but, to be fair, I have not yet tried.

One downside of this approach is that it complicates event mixing. If a data product class contains embedded art::Ptr objects, then event mixing code needs to know where the Ptr's are located and, at mix-in time, update them by hand.

So this really boils down to a choice of where to take the pain: at the end user code or in code written by experts. My advice is to let the experts take the pain.

When is it possible or not possible to use a Ptr? One of the fundamental design rules of art is that, once a data product is put into an event, that data product may never be modified. This rule is present to help ensure a robust audit trail of how data products were created. Therefore, it is illegal to create a data product that includes a empty art::Ptr and, in a later module, to fill that art::Ptr with real information. Therefore, if one wishes to associate objects from two data products at that are already in the event, the only choice is art::Assns.

Otherwise, if an art::Ptr or an art::PtrVector will completely solve the problem at hand, both now and in the future, then it should be preferred over art::Assns. The reason is that an art::Ptr is simpler both to create and to read; presumably this makes it less error prone. They only difficult part in making the choice is looking into possible future uses for your code; provided each piece of code does a small, well defined thing, the choice should usually be clear.

Marc Paterno's advice:

Always prefer an Assns over a Ptr. Similar questions are well researched in in the database world. The unanimous conclusion of the data base experts is that an Assns is the right answer.


Technical Appendix

Inside an art::Ptr

Few Mu2e physicists will need to understand the insides of an art::Ptr; this section is provided for reference. An art::Ptr has two parts, a persistent part, which behaves exactly like any other persistable event-data object, and a transient part that is divorced from the persistency mechanism. The persistent part consists of an art::ProductID and a key; the art::ProductID uniquely identifies the data product in which the pointee lives; the key uniquely identifies the pointee within the data product. Under the covers, the product ID and the key are simply a tuple of integral types and are persisted as are any other integral data.

The transient part of an art::Ptr also has two parts, a bare pointer to const that points to the pointee and a pointer to an object that can compute the bare pointer given the persistent information. When an art::Ptr is read back from an event-data file, the bare pointer is set to zero and the function pointer is properly initialized. When an art::Ptr is used, the Ptr code first checks to see if the bare pointer is non-null; if it is non-null, the Ptr simply returns it; if it is null, the Ptr calls the function to initialize the pointer and then returns the pointer; it is this function that will throw if the pointee cannot be found.

Requirements on Container Types

Once the art team provides appropriate documentation, this section should be changed to point there.

The section describing art::Ptr<T> states that an art::Ptr may only point at a second tier object within an art::Event and that it may only do so if the first tier object (the data product) is of a container type that satisfies certain requirements. The requirements on the collection type are:

  1. It must have a begin method that returns an appropriate interator type.
  2. The interator must be a normal iterator in the sense of a call to std::advance(iterator,n); the iterator must have traits that describe it as an input iterator, or better, such as a random access iterator.


Fermilab at Work ]  [ Mu2e Home ]  [ Mu2e @ Work ]  [ Mu2e DocDB ]  [ Mu2e Search ]

For web related questions: Mu2eWebMaster@fnal.gov.
For content related questions: kutschke@fnal.gov
This file last modified Monday, 07-Jan-2013 15:01:04 CST
Security, Privacy, Legal Fermi National Accelerator Laboratory