https://doi.org/10.1140/epjp/s13360-023-03674-2
Regular Article
Towards an automated data cleaning with deep learning in CRESST
1
Max-Planck-Institut für Physik, D-80805, München, Germany
2
Institut für Hochenergiephysik der Österreichischen Akademie der Wissenschaften, A-1050, Wien, Austria
3
Atominstitut, Technische Universität Wien, A-1020, Wien, Austria
4
INFN, Laboratori Nazionali del Gran Sasso, I-67100, Assergi, Italy
5
Faculty of Mathematics, Physics and Informatics, Comenius University, 84248, Bratislava, Slovakia
6
Physik-Department, Technische Universität München, D-85747, Garching, Germany
7
Eberhard-Karls-Universität Tübingen, D-72076, Tübingen, Germany
8
Department of Physics, University of Oxford, OX1 3RH, Oxford, UK
9
LIBPhys-UC, Departamento de Fisica, Universidade de Coimbra, P3004 516, Coimbra, Portugal
10
Walther-Meißner-Institut für Tieftemperaturforschung, D-85748, Garching, Germany
11
GSSI-Gran Sasso Science Institute, I-67100, L’Aquila, Italy
12
Dipartimento di Ingegneria Civile e Meccanica, Universitá degli Studi di Cassino e del Lazio Meridionale, I-03043, Cassino, Italy
Received:
1
November
2022
Accepted:
2
January
2023
Published online:
30
January
2023
The CRESST experiment employs cryogenic calorimeters for the sensitive measurement of nuclear recoils induced by dark matter particles. The recorded signals need to undergo a careful cleaning process to avoid wrongly reconstructed recoil energies caused by pile-up and read-out artefacts. We frame this process as a time series classification task and propose to automate it with neural networks. With a data set of over one million labeled records from 68 detectors, recorded between 2013 and 2019 by CRESST, we test the capability of four commonly used neural network architectures to learn the data cleaning task. Our best performing model achieves a balanced accuracy of 0.932 on our test set. We show on an exemplary detector that about half of the wrongly predicted events are in fact wrongly labeled events, and a large share of the remaining ones have a context-dependent ground truth. We furthermore evaluate the recall and selectivity of our classifiers with simulated data. The results confirm that the trained classifiers are well suited for the data cleaning task.
© The Author(s) 2023
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.