![]() |
Upload Examples
|
![]() |
![]() |
Working groups |
Blessed plots and figures |
Approving new results and publications |
Approval web pages - new results |
Approval web pages - new publications |
Mu2e Acronyn Dictionary |
Fermilab Meeting Rooms |
Fermilab Service Desk |
ReadyTalk : Home |
ReadyTalk : Help |
ReadyTalk : Toll Free Numbers |
For example, going through the decisions for the fields in the name:
data_tier.owner.description.configuration.sequencer.file_format
Your "rename" string will look like:
mcs.batman.trgt_geo_stopped_v0.geom0..artThe ".." is intentional to let jsonMaker know to generate the missing sequencer. Note that by changing the ".." to "." you will have the string that is the name of your dataset
mcs.batman.trgt_geo_stopped_v0.geom0.artThis will be put in the dh.dataset field and is the most common way you will refer to this dataset.
< Next you need to pick the file family. In this case the files were not generated and documented by the collaboration, so the first part should be "usr" and the files are Monte Carlo art files, so they go in "sim", therefore the file family is "usr-sim".
The next step is to write a little generic json file to provide the other required fields that the jsonMaker cannot supply. Call it temp.json:
{ "mc.generator_type" : "stopped_particle", "mc.simulation_stage" : 3, "mc.primary_particle" : "muon" }note there are commas between the field-value pairs and that strings are quoted, but numbers are not. This information can also be provided on the command line directly by the "-i" switch.
Then run the jsonMaker.
setup mu2e source setup.sh [setup a mu2e Offline release] setup dhtools [add jsonMaker to the path, must be after setup.sh] kinit [in case copying to dcache]Run a test (no -x switch) on one file to make sure the final command will work
jsonMaker -f usr-sim -j temp.json -v 5 \ -r mcs.batman.trgt_geo_stopped_vo.geom0..art \ one_of_your_data_filesIf there are any errors, they will be printed at the end. They will need to be fixed.
If OK, then commit to the full run. The switch "-c" asks for the data and the json file to be copied to the FTS area, under the appropriate subdirectory according to the file family.
jsonMaker -f usr-sim -x -c -j temp.json \ -r mcs.batman.trgt_geo_stopped_vo.geom0..art \ *all_your_data_files*There are other options for how to run the jsonMaker, please run "jsonMaker -h" or see the reference here. For example, if you files are already in scratch dCache (/pnfs/mu2e/scratch/..) then you can "mv" inside of the scratch dCache to the FTS, also in scratch dCache, which would be more efficient than copying them. You can ask jsonMaker to just write out the json files (-x -d with no -c or -m). It can generate a file containing a list of move commands that can be given to ifdh, so thay can be run with one lock. With -g, jsonMaker will also execute this command. You can always consult with the offline group if you have questions or a special case. Uploading errors can be fixed, but that can be complex, so it is far better to ask questions before rather than after.
for non-art files, jsonMaker will run very quickly. For art files, it has to runa mu2e executable to extract the run numbers. This takes 2s per file, and can take up to 60s if the file is large. In general, we recommend limiting single runs of jsonMaker to 10K files. Larger datasets can be broken into smaller subsets which can be run separately. It may be easiest to do this with the file list input style (-s) instead of command line wildcards.
{ "parents" : [ "mcs.batman.trgt_geo_stopped_vo.geom0.12345678_123456.art" ] }
The process in this case is the same as in example 1, with one item added. You need to tell jsonMaker how to determine which json file belongs with which data file. There are two methods, pairing by the fact that if the data file is foo, then the json file is foo.json. The other method is to pair the json file to whatever data file is in the same directory. In this second case, there can only be one data file and json file in each directory.
the command is the same as example 1, but with a pairing directive in "-p" and the json files added to the in put on the command line.
jsonMaker -f usr-sim -x -c -j temp.json -p dir \ -r mcs.batman.trgt_geo_stopped_vo.geom0..art \ *all_your_data_files* *all_your_json_files*
Your "rename" string will look like:
nts.batman.trgt_geo_stopped_v0.geom0..root
Next you need to pick the file family. In this case the files were not generated and documented by the collaboration, so the first part should be "usr" and the files are Monte Carlo root ntuple files, so they go in "nts", therefore the file family is "usr-nts".
The next step is to write a little generic json file to provide the other required fields that the jsonMaker cannot supply. jsonMaker will sense this is MC by the data_tier and require that you supply these fields. Call it temp.json:
{ "generator_type" : "stopped_particle", "simulation_stage" : 3, "primary_particle" : "muon" }
Then run the jsonMaker.
jsonMaker -f usr-nts -x -c -j temp.json \ -r nts.batman.trgt_geo_stopped_vo.geom0..root \ *all_your_data_files*
Please see the other examples for details of how to run jsonMaker for your particular case, but in general there are couple of options to point out here. One is "-e" which allows renaming of the data file in place. "-d" defaults to writing the json file in the local dir.
setup mu2e source setup.sh [setup a mu2e Offline release] setup dhtools [add jsonMaker to the path, must be after setup.sh] jsonMaker -f usr-sim -x -e -j generic.json \ -r mcs.batman.trgt_geo_stopped_vo.geom0..art \ your_data_file ifdh cp mcs* /pnfs/mu2e/scratch/users/batman/outdir
in this case, after all processes are done and you've checked the output in dChace, you can move the data files and their json to the fts directory. To avoid putting too many files in one subdirectory, we have subdirectories below /pnfs/mu2e/scratch/fts/usr-sim. Please spread out the files among those directories. The data file and its json need to go into the same directory.
If you believe things are running smoothly, you can move the data and json directly into the uploader. jsonMaker -f usr-sim -x -m -j generic.json \ -r mcs.batman.trgt_geo_stopped_vo.geom0..art \ your_data_file
If you generating files that are not art files, then jsonMaker will not have the run and subrun to give the files a unique sequencer. One way to handle this is through the "-t" switch. You could add -t "${CLUSTER}_${PROCESS}" or a tag based on the first run and event in the ntuple. You could also rename the file and its json according to the rename scheme (then do not use -r or -e) and include your own sequencer. Finally, it might be easiest to write the ntuples to scratch dCache and then run jsonMaker on the full set of files interactively, so it can assign sequence numbers logically.
jsonMaker -f usr-etc -x -c -j temp.json -v 5 \ -r bck.batman.trgt_geo_stopped_v0.geom0..tgz \ your_mc_tar_files*.tgz
the sequencer field is left blank in the rename string, which will cause jsonMaker to fill that in with a counter.
In the examples, the simulated data, and ntuples and and tarballs of the log files were uploaded with coordinated dataset names - the same descriptions and configuration fields. This can run into a little conflict in backup up of tarballs. For example, suppose there are multiple steps in making the ntuple, each with their own set of log files. A reasonable solution is to keep adding to your backup dataset, keeping the same descriptions and configuration fields, but modifying the sequencer with "-t".
jsonMaker -f usr-etc -x -c -j temp.json -v 5 -t "step2" \ -r bck.batman.trgt_geo_stopped_v0.geom0..tgz \ your_other_mc_tar_files*.tgzListing the bck.batman.trgt_geo_stopped_v0.geom0.tgz dataset will look like:
bck.batman.trgt_geo_stopped_v0.geom0.000.tgz bck.batman.trgt_geo_stopped_v0.geom0.001.tgz bck.batman.trgt_geo_stopped_v0.geom0.step2-000.tgz bck.batman.trgt_geo_stopped_v0.geom0.step2-001.tgzYour logically coordinated datasets are then all
*.batman.trgt_geo_stopped_v0.geom0.*
This example shows how to upload a set of MC fcl files. The file family is "usr-etc" since these are not art or root data files. The data_tier has changed to "cnf" (for config) and the file_format has changed to "fcl". Since these are part of a MC production chain, the MC parameters defined the generic.json can be defined and will be required. The command is:
jsonMaker -f usr-etc -x -c -j temp.json -v 5 \ -r cnf.batman.trgt_geo_stopped_v0.geom0..fcl \ your_fcl_files*.fcl
It is a backup, so data_tier "bck". Since the dataset will include your user name, your description and configuration only have to be unique to you, so pick anything logical, say "target_analysis" for the description and "09_2014" for the configuration.
in this case you don't have to supply the generator info so you don't need a generic json file at all. The command becomes:
jsonMaker -f usr-etc -x -c -v 5 \ -r bck.batman.target_analysis.09_2014..tgz \ your_dir_analysis_tar_files*.tgz
setup mu2e setup sam_web_client samweb count-files "dh.dataset=sim.mu2e.example-beam-g4s1.1812a.art"
You can see how many of your files in SAM have actually gone to tape:
setup mu2e setup dhtools samOnTape sim.mu2e.example-beam-g4s1.1812a.art
You can make a list of files in their permanent locations, suitable for feeding to mu2egrid:
setup mu2e setup dhtools samToPnfs sim.mu2e.example-beam-g4s1.1812a.art > filelist.txt
The transfers occur after all the metadata has been gathered. If the files are not art format, then this should run very quickly, less than 1s per file. If jsonMaker is running on art files, it will run the executable to extract run and event ranges, which can take up to 1 min for multi-GB files. You can see the rate by running jsonMaker without "-x" as a non-destructive dry run.
If your datasets are larger than the above limits, you probably want to split the upload into pieces and run them as separate jsonMaker commands. If you have named your files by their final dataset name, or if jsonMaker is renaming the file and the files are art format, then the following is not an issue. If jsonMaker is renaming the files and can't name them accordind to run and run section, like it does with art files, then it has to rename them by a sequencer which is just a counter. If you break your datasets into 1000-file sections, jsonMaker will want to name the first 1000 by the sequencer 0000-0999 and the second also by 0000-0999 and these names will be duplicates. In this case, you can rename the files with your own sequencer before giving them to jsonMaker, so it won't generate the sequencer, or you can add a digit to the sequencer with "-t 0" for the first set and "-t 1" for the second, etc.
![]() |
|
Security, Privacy, Legal |
![]() |