https://doi.org/10.1140/epjp/s13360-026-07726-1
Regular Article
HybridBigDataClassifier: a hybrid deep learning approach for scalable big data classification with imbalanced datasets
1
Research Scholar , JNTUK KAKINADA Computer Science and Engineering, Jawaharlal Nehru Technology University, Kakinada, AP, India
2
Superviser in JNTUK Kakinada Computer Science and Engineering Department, Professor and Head of Department CSE, Sheshadri Rao Gudlavalleru Engineering College, Gudlavalleru, AP, India
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
16
September
2025
Accepted:
20
April
2026
Published online:
4
May
2026
Abstract
While automated classification systems generally thrive in environments with copious high-dimensional data, a characteristic found across diverse fields ranging from healthcare and finance to cybersecurity, the general availability of massive datasets is a double-edged sword. For scalability and generalisation under class imbalance, traditional machine learning algorithms face challenges. Although deep learning models have shown great promise, many current architectures are not designed to operate in a distributed fashion or struggle with class imbalance, as they tend to favour the majority class in skewed distributions. This results in biased predictions and suboptimal performance in critical real-world applications. To overcome these limitations, in this work we introduce HybridBigDataClassifier, a scalable deep learning framework built on Apache Spark, and a new architecture, HybridDeepNet. It comprises residual learning blocks with DQN-like layers to improve feature extraction and decision robustness, while accelerating training convergence. A dynamic class-weighting mechanism is also integrated to mitigate class imbalance during model training, thereby avoiding reliance on external resampling strategies. It is distributed, making it suitable for big data pipelines. The proposed HybridBigDataClassifier, built on the HybridDeepNet architecture, was evaluated on five heterogeneous, imbalance-prone benchmark datasets (healthcare, fraud detection, census income, network intrusion, and e-commerce) and consistently achieved higher accuracy (up to 0.987), F1-score (up to 0.93), and ROC-AUC (up to 0.98) than standard deep models, confirming its suitability for large-scale, imbalanced classification. Experimental results substantiate that, across all primary metrics, HybridDeepNet significantly outperforms state-of-the-art deep learning baselines. Ablation studies confirm the contribution of each component to performance improvements. In conclusion, HybridBigDataClassifier is a novel, highly robust, generalisable, and deployment-ready classifier for over- and under-sampled real-world big-data classification problems, with class imbalance and computational scalability as the dominant pain points. It even emphasises its possible applicability in general areas of intelligent machinery.
Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
© The Author(s), under exclusive licence to Società Italiana di Fisica and Springer-Verlag GmbH Germany, part of Springer Nature 2026
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

