https://doi.org/10.1140/epjp/s13360-023-03861-1
Regular Article
HEP-Frame: an efficient tool for big data applications at the LHC
1
High-Assurance Software Laboratory, INESC TEC, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
2
Department Physics, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
3
Algoritmi Center, Department Informatics, University of Minho, Campus de Gualtar, 4710-057, Braga, Portugal
a
andre.martins.pereira@inesctec.pt
Received:
19
September
2022
Accepted:
5
March
2023
Published online:
25
March
2023
HEP-Frame is a new C++ package designed to efficiently perform analyses of datasets from a very large number of events, like those available at the Large Hadron Collider (LHC) at CERN, Geneva. It mainly targets high-performance servers and mini-clusters, and it was designed for natural science researchers with a user-friendly interface to access structured databases. HEP-Frame automatically evaluates the underlying computing resources and builds an adequate code skeleton when creating a data analysis application. At run-time, HEP-Frame analyses a sequence of datasets exploring the available parallelism in the code and hardware resources: it concurrently reads inputs from a user-defined data structure and processes them, following the user-specific sequence of requirements to select relevant data; it manages the efficient execution of that sequence; and it outputs results in user-defined objects (e.g., ROOT structures), stored together with the used input dataset. This paper shows how a domain expert software development can benefit from HEP-Frame, and how it significantly improved the performance of analyses of large datasets produced in proton-proton collisions at the LHC. Two case studies are discussed: the associated production of top quarks together with a Higgs boson ( ) at the LHC, and a double- and single-top quark productions at the high-luminosity phase of the LHC (HL-LHC). Results show that the HEP-Frame awareness of the analysis code behaviour and structure, and the underlying hardware system, provides powerful and transparent parallelization mechanisms that largely improve the execution time of data analysis applications.
© The Author(s) 2023
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.