If there is an SEU event that upsets some FPGA configuration
memory, then there is a loss of data from that region while that device is
reloaded and re-enters the data stream. Do you have enough information about
that data loss to properly analyze data during that period or is there a
detector wide dead time associated with any re-load?
The
Readout Controller is implemented in flash based FPGAs from Microsemi®.
Tests performed at LANSCE by Microsemi®
and at several places in Europe by LHC experiments have shown no configuration
memory SEUs in those devices, at energies and fluences
much higher than ours, so I consider this device immune to SEUs (just the way
is being marketed).
Cyclone®
devices used for the digitizers automatically calculate a CRC on each data
frame loaded into the configuration memory. Upon detecting an error, the memory
is automatically scrubbed and affected CRC registers and CRAM arrays
corrected. Scrubbing is fast (μsecs) but
the CRC check on the entire device can take much longer. That time is set by
user and at a minimum can be 4ms, but that requires some complicated FPGA
routing. For lax routing, the CRC check takes ~120 ms.
The data
will be flagged and the DAQ made aware of that. In principle, to reduce the
impact of an error, algorithms could identify finer subsets of data as
corrupted. Here I consider 120 ms of data
as lost each time an FPGA fails the CRC check.
Are such data losses consistent with your physics requirements
(this is, I suppose, equivalent to the question of how sensitive the physics
analysis is to permanent loss of some group of straws)?
The
measured CRAM SEU rate for 28 nm technology is 7.9E-15 cm2/bit/n.
The mu2e flux at the electronics is 4400 n/cm2/s. The typical
configuration memory for a Cyclone® device is 30 Mbits. We did a measurement at UC Berkeley and saw no
configuration issue for factor of 5 higher rate. Neutrons in that test were
2.45 MeV, so I assume we are only susceptible to the 25% of our neutrons above
2.45 MeV. Note this is an upper bound: the threshold for SEUs on
configuration memory for 28 nm devices is thought to be more like
10 MeV. The CRAM SEU rate per digitizer with these conservative
assumptions is
7.95·10-15 cm2/bit/n
× 30 Mbit × 25% × 4400 n/cm2/s = 2.62·10-4/s
With 1296
Cyclones® in the system, and 120 ms
of data loss on each failure, on average <0.05 digitizers is off at any
instant. Whereas physics is not seriously impacted until a whole station worth
of straws is out: 1152 straws or 72 digitizers.
I am a bit worried
by setting your threshold so high (100 Hz) – that has two costs that I can
see – more jitter in the time measurement (vs. a much lower threshold) and the
possibility that your design may assume a much lower data rate than you
eventually have to use to handle data at a much lower threshold (assuming you
discover or decide that you need to run at a lower threshold). So:
a)
Do you have test results that convince you that you can do
well enough in position and time division with such a high threshold?
We do not have convincing resolution results yet. But we have
achieved noise that, according to simulations, will give us the performance we
need well within our available bandwidth.
The
100 Hz noise rate was chosen for our early prototypes, where we did not
have the full readout chain and are limited in bandwidth. We do plan to run
lower threshold in the final configuration. There are two additional points to
make here:
1)
More than likely in the final
system we will require the two ends of the straw to run in coincidence. Even a
lax 100ns time window will drastically reduce the noise, and each side can run
at hundreds of kHz.
2)
We can eliminate noise hits by
more sophisticated noise suppression in the FPGA (for instance looking at time
above threshold)
b) Do you have enough bandwidth in the
system to handle a much higher hit rate (e.g. 100 kHz)?
Yes. We
are designing for an assumed straw rate of 300 kHz
and the limitation is the ROC to DTCM optical link. The current plan is to run
this link at 2.5 Gbps, even though the link is
capable of 5 Gbps. We have not pushed the rate
yet, but more than likely we will be able to run at 5 Gbps,
in which case the 300 kHz becomes 600 kHz.