Mu2e Home
Persisting Pointers
Search
Mu2e@Work

This material is obsolete. We have adopted art::Ptr as the solution.


Introduction

Ever since HEP graduated from fortran to languages with pointers, there has been a problem restoring pointers when reading back data that had been written to a file. The various experiments have tried a great variety of solutions all have been found lacking in some way. The ideal solution:
  1. places no additional burden on the physicist programmer; that is, the pointers are "just there", just like they were before the objects were written to disk.
  2. is efficient, in the sense that the work required to recover the pointers is done only if and when needed.
  3. is affordable and robust: we can afford to pay the software engineers to build and maintain solution; we do not have long delays when we discover that we need new features.
  4. does not depend on global state; for doing things like event mixing and overlays of signal MC with data from random triggers, the concept of "the event" might be poorly defined.
If you give up any one of these requirements, the solution becomes much easier.

For Mu2e I propose a solution in which:

The logic is that, at least for the time being, person time is much, much more expensive than CPU time. If we discover later on that it is important to recover efficiency, the tools to do so have been designed in.

At present, the solution is not quite complete and it does require two extra steps on the part of the physicist-programmer. One extra step occurs when first creating the object; that will remain. The other extra step occurs when reading data from an event; this extra step will go away in a later version of the framework.


The Idea Behind the Proposed Solution

It is important to distinguish the transient (in-memory) representation of an object from its persistent (on-disk) representation. Mu2e will require that that the persisent representation of all objects be "plain old data" (POD). The transient representation of an object can be anything that can be computed given the persistent data plus a pointer or reference to the current event. Therefore pointers are not permitted in the persistent data but they are permitted in the transient data.

Consider an object representing a digitized waveform produced in MC processing; in general this will be produced from one or more StepPointMC objects. It is useful to record which StepPointMC objects contributed to each digitized waveform object. Each of the contributing StepPointMC objects can be described uniquely by specifying the data product in which it is found and the offset, or index, within that data product. So a single precursor can be described by a class like,

struct DPIndex{
    ProductID    id;
    std::size_t index;
};
where ProductId is provided by the framework and is just a few ints. So a DPIndex is a POD and may be persisted. A general set of precursors, possibly coming from different data products, can be described by:
     std::vector<DPIndex> precursors;
In the general case, the precursor objects might come from different data products; this could happen if we merge StepPointMC objects from two files, perhaps some from a signal file and some from a cosmic ray file. In such a case, a given digi might result from the overlap of a signal track and a background track.

The following shows a fragment of what a digitized waveform class might look like if we did not have to worry about persisting pointers:

struct DigiWaveform{

  StrawIndex       strawIndex;                   // Straw identifier
  unsigned short   timestamp;                    // Timestamp at the start of the first bin.
  std::vector<unsigned short> adc;               // Pulseheights in 10 ns bins.
  std::vector<StepPointMC const*> stepPointMCs;  // Pointers to precursors.

  DigiWaveform( StrawIndex s, 
                unsigned short t, 
                std::vector<unsigned short> const& a,
                std::vector<StepPointMC const*> const& p):
        strawIndex(s),
        timestamp(t),
        adc(a),
        stepPointMCs(p)
 {
 }
};
But we do have to persist pointers. The following shows a fragment of a digitized waveform class that uses DPIndex to persist pointers.
struct DigiWaveform{

  // Persistent data.
  StrawIndex       strawIndex;           // Straw identifier
  unsigned short   timestamp;            // Timestamp at the start of the first bin.
  std::vector<unsigned short> adc;       // Pulseheights in 10 ns bins.
  std::vector<DPIndex> stepPointIndices; // The StepPointMC objects that contributed to this.

  // Need some accessors for the private data.

private:

  // This data is not persisted but can be recovered as needed.
  // On readback, the bool is intialized to false and the vector to empty.
  mutable bool pointersValid;
  mutable std::vector<StepPointMC const*> stepPointMCs;

};
So far it's pretty simple. Why are the private data members mutable? User code has only const access to the data products. So non-mutable data can never be changed. We want to be able to rebuild the pointers as needed so they must be mutable; similarly the validity flag needs to be able to change. We should never make public, persitable data mutable because that breaks the audit trail provided by the provenance.

Adding some constructors and accessors makes the class much, much busier. But the extra code is mostly boilerplate that will not change much from one persistent class to the next. A more complete fragment is below:

struct DigiWaveform{

  // Persistent data.
  StrawIndex       strawIndex;           // Straw identifier
  unsigned short   timestamp;            // Timestamp at the start of the first bin.
  std::vector<unsigned short> adc;       // Pulseheights in 10 ns bins.
  std::vector<DPIndex> stepPointIndices; // The StepPointMC objects that contributed to this.

  // One constructor.
  DigiWaveform( StrawIndex s, 
                unsigned short t, 
                std::vector<unsigned short> const& a,
                std::vector<StepPointMC const*> const& p,
                edm::Handle<StepPointMCCollection>& c):
        strawIndex(s),
        timestamp(t),
        adc(a),
        stepPointMCs(p),
        pointersValid(true),
 {
  // construct stepPointIndices from p and c.
 }

  // A second constructor.
  DigiWaveform( StrawIndex s, 
                unsigned short t, 
                std::vector<unsigned short> const& a,
                std::vector<DPIndex> const& idx,
                edm::Event const  *event = 0 ):
        strawIndex(s),
        timestamp(t),
        adc(a),
        stepPointIndices(idx),
        pointersValid(false),
        stepPointMCs()
 {
    if ( event ){
         // construct stepPointMCs from stepPointIndices and the event.
    }
 }


  // Accessor for the pointers.
  std::vector const& getStepPointMCs( edm::Event const& event) const{

     if ( pointersValid ) return stepPointMCs;

     // Fill the pointers.
     resolveDPIndices<StepPointMCCollection>( event, stepPointIndices, stepPointMCs);

     return stepPointMCs;
  }

  // A second accessor that does not need an event but might throw.
  std::vector const& getStepPointMCs() const{
     if ( pointersValid ) return stepPointMCs;
     throw ...;
  }

  // A safety checker, should be rarely needed.
  bool pointersOK() const { return pointersValid; }

private:

  // This data is not persisted but can be recovered as needed.
  // On readback, the bool is intialized to false and the vector to empty.
  mutable bool pointersValid;
  mutable std::vector<StepPointMC const*> stepPointMCs;

};
Some comments on this:
  1. The templated function resolveDPIndices knows how to fill out the pointer array given the event and the vector of DPIndices. The template argument is the data type of the data product ( not of the element within the data product).
  2. In general, the accessors to the pointers will return valid data or will throw.
    1. resolveDPIndices will throw if the data type of the ProductId does not match the template argument.
    2. resolveDPIndices will throw if the data type of the pointers in stepPointMCs is not the data type of an element of template argument.
    3. resolveDPIndices will throw if it cannot find the data product.
    4. resolveDPIndices will throw if the offset is larger than the size of the data product.
  3. If an above threshold ADC count is pure digitizer noise, it will not point back to any StepPointMC. So this class allows stepPointIndices to be empty and a valid stepPointMCs to be empty. In your class it might be logical to decide that empty arrays to be errors and to throw when that occurs.
  4. This pattern has two inconveniences:
    1. There are extra arguments in the constructor. I don't think that this is an onerous requirement and I think that the alternatives are worse. This won't change.
    2. The accessor to the pointers requires an event as an input argument. If this is deemed too complicated, there are several solutions. These solutions reduce efficiency since they needlessly fill in pointers that are not needed. However it is probably the right short term solution because it should save person-time.


An Example

The solution as it exists today is illustrated in the class CrudeStrawHits: The solution uses two helper function templates, And the solution is illustrated in the cvs HitMaker module:

The class CrudeStrawHits contains several public data members that describe a crude hit in a straw. Since the main purpose of this class is to serve this data, they are public, not private. A crude straw hit can come from zero or more precusors. If a hit is just salt and pepper noise, it will not have any precursors. Right now we are making hits directly from StepPointMC objects and each CrudeStrawHit can come from one or more StepPointMC objects. In the future we will make CrudeStrawHit objects from unpacked digis and each CrudeStrawHit will come from exactly one unpacked digi object.

To allow a CrudeStrawHit to describe its precursors, the class contains two data members,

    enum precursor_type { undefined, unpackedDigi, stepPointMC};
    precursor_type precursorType;
    std::vector precursorIndices;
A DPIndex is a class that contains two members, a ProductID and an int. The ProductID uniquely indentifies a data product in an event and the int is an index into the data product. The ProductID


Fermilab at Work ]  [ Mu2e Home ]  [ Mu2e @ Work ]  [ Mu2e DocDB ]  [ Mu2e Search ]

For web related questions: Mu2eWebMaster@fnal.gov.
For content related questions: kutschke@fnal.gov
This file last modified Tuesday, 16-Aug-2011 14:03:30 CDT
Security, Privacy, Legal Fermi National Accelerator Laboratory