Efficient Big Data Machine Learning with Apache Spark: Comparative Evaluation of Regression and Classification Models

Authors

  • A Ganesh Author
  • R R Shantha Spandana Author
  • E Pavithra Author

DOI:

https://doi.org/10.64751/11tw3352

Keywords:

Big Data, Biological system modeling, Data models, Cluster computing, Computational modeling, Sparks, Numerical models, Scalability, Predictive models, Public transportation

Abstract

The scale of big data requires machine learning (ML) systems capable of operating large amounts of data in little time. The implementation makes use of distributed computation and in-memory processing of Apache Spark to evaluate a variety of machine learning methods applied to regression and classification tasks. The experiments were conducted on three datasets, such as NYC Taxi Trip Duration, Netflix Prize, and Higgs boson, which were received through Kaggle. Linear Regression, Random Forest (RF), Gradient-Boosted Trees (GBT), Support Vector Regressor (SVR), and K-Nearest Neighbors (KNN) were compared by evaluating Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), training time and testing time. The classification methodology compared to classify data using Accuracy, Precision, Recall, F1-Score, AUC, and Confusion Matrix were LR, RF, GBT, SVM, and KNN. The findings demonstrated that both GBT and RF required a little more processing time and achieved higher accuracy and lower error rates. Selection of features using PCA improved the performance of the feature selection, and PCA-combined GBT reached 99 percentage classification accuracy and reduced training time. GBT achieved the lowest RMSE and MAE in regression tasks and was able to obtain the full prediction accuracy at the test data. Flask-based web interface was developed to make real-time predictions, which enable effective and accurate big data analytics.

Downloads

Published

2026-04-08

How to Cite

Efficient Big Data Machine Learning with Apache Spark: Comparative Evaluation of Regression and Classification Models. (2026). International Journal of AI Electronics and Nexus Energy, 2(2), 19-26. https://doi.org/10.64751/11tw3352

Similar Articles

1-10 of 140

You may also start an advanced similarity search for this article.