For High Energy Physics, the go-to framework for big data analysis has been CERN’s ROOT framework. ROOT is a massive C++ library that even predates the STL in some areas. It is1 also a JIT C++ interpreter called Cling, probably the best in the business. If you have heard of the Xeus C++ Kernel for Jupyter, that is built on top of Cling. ROOT has everything a HEP physicist could want: math, plotting, histograms, tuple and tree structures, a very powerful file format for IO, machine learning, Python bindings, and more. It also does things like dictionary generation and arbitrary class serialization (other large frameworks like Qt have similar generation tools).
You may already be guessing one of the most common problems for ROOT. It is huge and difficult to install – if you build from source, that’s a several hour task on a single core. It has gotten much better in the last 6 years, and there are several places you can find ROOT, but there are still areas where it is challenging. This is especially true for Python; ROOT is linked to your distro’s Python (both python2 and python3 if your distro supports it, as of ROOT 6.22); but the common rule for using Python is “don’t touch your system Python” - so modern Python users should be in a virtual environment, and for that ROOT requires the system site-packages option be enabled, which is not always ideal. And, if you use the Anaconda Python distribution, which is the most popular scientific distribution of Python and massively successful for ML frameworks, the general rule even for people who build ROOT themselves has been: don’t. But now, you can get a fully featured ROOT binary package for macOS or Linux, Python 2.7, 3.6, 3.7, or 3.8 from Conda-Forge, the most popular Anaconda community channel! Many more HEP recipes have now been added to Conda-Forge, as well! ROOT now also provides a conda docker image, too!
Intro to Conda
If you don’t already have Anaconda or Conda, you can go to
anaconda.com and download Anaconda, or you can install
miniconda, which is just the
Conda package manager without anaconda
installed in the base environment. If
you manage your system, there are also
yum and apt packages.
There are also Docker images.
If you want to do this in an entirely automated way, for example on a new system or on a continuous integration (CI) system, the following commands will set up miniconda:
# Download the Linux installer
wget -nv http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# Or download the macOS installer
wget -nv https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
# Install Conda (same for macOS and Linux)
bash miniconda.sh -b -p $HOME/miniconda
source $HOME/miniconda/etc/profile.d/conda.sh # Add to bashrc - similar files available for fish and csh
If you use binder, that already uses Conda if you have an environment.yml
file. See
an example that uses ROOT here,
or launch it by clicking this button:
Proper usage
For Conda, you really should be working in an environment. Here is how you prepare a new environment with ROOT preinstalled:
conda create -n my_root_env root -c conda-forge
This will make a new environment called my_root_env
2, and install ROOT into
it. You can specify other packages too, like a version of Python (2.7, 3.6, or
3.7), anaconda
(which will install 100 or so scientific Python packages), or
individual packages. ROOT will automatically add all of its dependencies, like
Pythia8 and NumPy.
Advanced: How to use an environment file instead (click to expand)
It is even better to use an environment.yml
file. This is a list of channels
and dependencies that you can distribute with your project. This is an example
of a simple environment.yml
file:
name: my_root_env
channels:
- conda-forge
dependencies:
- root
Then, you run Conda like this:
conda env create
You can use -f <file>
to select a different file than the environment.yml
in
the current directory. You can use -n <name>
to select a different name, or
-p <path>
to use a specified path instead of a named environment in the
default path.
If you want to capture your exact environment in a reproducible manner, with all package versions, run this:
conda env export > environment.yml
To enter the environment:
conda activate my_root_env
The first time you enter the environment, you should add the conda-forge channel
to the search list (otherwise, you will have to add -c conda-forge
every time
you install or update something):
conda config --env --add channels conda-forge
To leave the environment:
conda deactivate
Installing into the current environment
If you are already in an environment (even the base environment, that’s not a good idea generally, but supported by Conda-ROOT), then you will want to do something like this:
conda install root -c conda-forge
If you want to enable conda-forge as a searched channel globally so that you don’t have to add this flag every time you do anything, run:
conda config --add --env channels conda-forge
This really just adds a line to the current environment’s condarc, or
~/.condarc
if you do not include the --env
.
Things to try
Almost everything in ROOT should be supported; this was built with lots of options turned on. Here are a few things to try:
root
: you can start up a session and see the splash screen; Control-D to exit.python
followed byimport ROOT
will load PyROOT.root --notebook
will start a notebook server with a ROOT kernel choice.rootbrowse
will open a TBrowser session so you can look through files.root -l -q $ROOTSYS/tutorials/dataframe/df013_InspectAnalysis.C
will run a DataFrame example with an animated plot.root -b -q -l -n -e "std::cout << TROOT::GetTutorialDir() << std::endl;"
will print the tutorial dir.root -b -l -q -e 'std::cout << (float) TPython::Eval("1+1") << endl;'
will run Python from C++ ROOT.
You can find tutorials in the ROOT documentation, such as for RDataFrame, here. Also check RooFit.
Caveats
General
The ROOT package will prepare the required compilers (see below). Everything in
Conda is symlinked into $CONDA_PREFIX
if you build things by hand; tools like
CMake should find it automatically. While thisroot.*
scripts exist, they
should not be used. Graphics, rootbrowse
, etc. all should work. Any Conda-ROOT
issues can be reported to the root-feedstock.
ROOT was built with and will report -std=c++17
from root-config
.
Linux
On Linux, there really aren’t any special caveats, just a few general to Conda
itself, and the compilers package. When ROOT is in the active environment, g++
and $CXX
are the Conda compilers, GCC 7.3.
macOS
The caveats on macOS were removed on 9-25-2019; you no longer need a special 10.9 SDK. You should simply have any SDK (so install Xcode), and you should be good to go. Again, like linux, new compilers are added (Clang 8).
Feel free to refer to the conda build documentation if you want to build anything.
Building a library that uses ROOT
If you want to provide a package that uses ROOT, you probably do not want to
replace the system compilers on the command line. To support this, ROOT was
broken into several packages. You can install the root_base
package to just
get ROOT. The root-dependencies
package stores all the dependencies (note: the
full list includes things like Qt and is larger than ROOT itself). The
root-binaries
package stores the ROOT executables. And finally, the full
root
package includes compilers, jupyter, and a few other things.
See the recipe
for definitions.
How it was made possible
This was a monumental feat, but it was enabled by the new technologies from
Conda and Conda-Forge. The Conda 4.6 release provides much better support for
environments, and the unified activation allows packages to rely on environment
changes. While ROOT is very careful to respect your environment (the only
variable it directly sets is ROOTSYS
to be nice), it helps with compiler
packages and more work together. Anaconda in version 5.0 changed to a unified
and modern compiler stack, and Conda-Forge spent months converting all of the
packages from the old, diverse compilers to the single compiler stack. ROOT’s
Linux packages were available on the day this project was completed.
For more technical details, see the talk here by Chris to the packaging working group of the HSF.
Future, history, and thanks
This was done in collaboration with the ROOT team. Many fixes were pushed to
ROOT to make this possible, and are in the 6.16.00 release and upcoming 6.16.02
release. There is an ongoing effort to integrate the Conda machinery into the
ROOT nightly testing, so that we won’t get caught by surprise in an update and
so that nightly builds of master will be available (probably in the hep
channel).
This project was made possible by Chris Burr, with the help of Henry Schreiner, Enrico Guiraud, and Patrick Bos. Other members of the ROOT team that helped contribute are Guilherme Amadio, Axel Naumann, and Danilo Piparo.
Special thanks to the previous way to get ROOT on Conda, the NLeSC formula by Daniela Remenska. That was a great effort, but since it did not run with CI and it required a massive number of custom dependencies instead of relying on Conda-Forge to package those dependencies, it was impossible to maintain, and remained stuck on older versions of Python and ROOT. It predated Conda-Forge and conda-build 3, which made the current project possible. It was one of the inspirations of this project, though, and deserves a special place of honor.
Support for this work in part was provided by the National Science Foundation cooperative agreement OAC-1836650 (IRIS-HEP) and OAC-1450377 (DIANA/HEP). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.