A Comparative Study of Statistical and Machine Learning Techniques for Predicting Customers Shopping Behavior

Alaa A Elnazer; Fawzia Abdu Alsalam Al Tboli; Gehad Elgebaly; Mahjoub A Elamin

Acta Scientific Nutritional Health (ASNH)(ISSN: 2582-1423)

Research Article Volume 10 Issue 5

A Comparative Study of Statistical and Machine Learning Techniques for Predicting Customers Shopping Behavior

Alaa A Elnazer¹*, Fawzia Abdu Alsalam Al Tboli², Gehad Elgebaly³ and Mahjoub A Elamin⁴

¹Department of Marketing, College of Business, Imam Mohammad Ibn Saud Islamic
University (IMSIU), Riyadh 11432, Saudi Arabia
²Department of Statistics, Faculty of Science, Benghazi, University of Benghazi,
Libya
³Department of Economics, Faculty of Business Administration, Delta University for
Science and Technology, Gamasa, Egypt
⁴Department of Mathematics, University College of Umluj, University of Tabuk,
Saudi Arabia

*Corresponding Author: Alaa A Elnazer, Department of Marketing, College of Business, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia.

Received: January 27, 2026; Published: May 08, 2026

Reprints View PDF Related Articles

Abstract

TThis study develops a comprehensive predictive framework by systematically comparing five classification models—Logistic Regression (LR), K-Nearest Neighbors (KNN), Random Forest (RF), Artificial Neural Networks (ANN), and Extreme Gradient Boosting (XGBoost)—using the Online Shoppers’ Purchasing Intention dataset. A diverse set of performance metrics, including Accuracy, Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), R², Correlation Coefficient (CC), Coefficient of Variation (COV), and Error Coefficient (EC), were employed to evaluate and benchmark the models. Descriptive statistics and correlation analysis provided a foundational understanding of the behavioral attributes shaping purchasing outcomes, while inferential analyses, including ANOVA and the Wilcoxon Signed-Rank Test, confirmed statistically significant differences among models and validated the robustness of the comparative framework. The findings suggest that the Random Forest model was the best in most of the evaluation measures as it had the lowest RMSE, the highest correlation with the actual outcomes, and the most stable. Even though Artificial Neural Networks displayed similar levels of accuracy, the Random Forest was more consistent and reduced the number of predictive errors, which highlights why this algorithm can be used to find out the customer behavior in very complex and nonlinear scenarios. The results show that ensemble techniques are significant in prediction of e-commerce and that hybrid methods have the potential to increase the accuracy and generalization. The research has both methodological and practical significance because it provides a strict standard of the classification algorithms and offers practical information to online retailer which needs to optimize its decision making process, customer satisfaction and long term customer loyalty.

References

Abdullah-Al-Tanvir M., et al. “A gradient boosting classifier for purchase intention prediction of online shoppers”. (2023).

Armstrong JS and Collopy F. “Error measures for generalizing about forecasting methods: Empirical comparisons”. International Journal of Forecasting1 (1992): 69-80.

Balasundaram E., et al. “A hybrid approach for customer segmentation and loyalty prediction in e-commerce”. Prabandhan: Indian Journal of Management10 (2024): 56-69.

Bartroff J., et al. “Sequential experimentation in clinical trials: Design and analysis (Vol. 298)”. Springer (2012).

Benesty J., et al. “Pearson Correlation Coefficient. In Noise Reduction in Speech Processing”. Springer (2009).

Best H and Wolf C. “Logistic regression”. In The SAGE handbook of regression analysis and causal inference (2015): 153-171.

Bottou L. “Large-scale machine learning with stochastic gradient descent”. In Proceedings of COMPSTAT 2010 (2010). Physica-Verlag.

Breiman L. “Random forests”. Machine Learning 45 (2001): 5-32.

Cai K and Rodavia MR. “XGBoost analysis based on consumer behavior”. Frontiers in Computing and Intelligent Systems 6 (2023): 1-10.

Chai T and Draxler RR. “Root mean square error (RMSE) or mean absolute error (MAE)?” Geoscientific Model Development 7 (2014): 1247-1250.

Chen T and Guestrin C. “XGBoost: A scalable tree boosting system”. In Proceedings of the 22nd ACM SIGKDD (2016): 785-794. ACM.

Dey D., et al. “The proper application of logistic regression model in complex survey data: A systematic review”. BMC Medical Research Methodology 25 (2025): Article 15.

Dormann C F., et al. “Collinearity: A review of methods to deal with it”. Ecography1 (2013): 27-46.

Ertan E and Akay K U. “Identifying a class of ridge-type estimators in binary logistic regression models”. Statistics 5 (2024): 1092-1116.

Everitt BS and Skrondal A. “The Cambridge dictionary of statistics”. Cambridge University Press 4 (2010).

Friedman JH. “Greedy function approximation: A gradient boosting machine”. Annals of Statistics5 (2001): 1189-1232.

Friedman JH. “Stochastic gradient boosting”. Computational Statistics and Data Analysis4 (2002): 367-378.

Guyon I and Elisseeff A. “An introduction to variable and feature selection”. Journal of Machine Learning Research 3 (2003): 1157-1182.

Hair JF., et al. “Multivariate data analysis (8^th)”. Cengage Learning (2019).

Han J., et al. “Data mining: Concepts and techniques”. Morgan Kaufmann (2012).

James G., et al. “An introduction to statistical learning”. Springer (2013).

Kim S and Kim H. “A new metric of absolute percentage error for intermittent demand forecasts”. International Journal of Forecasting3 (2016): 669-679.

LeCun Y., et al. “Deep learning”. Nature 521 (2015): 436-444.

Li Y., et al. “Customer online behavior analysis and purchase prediction in e-commerce”. Electronic Commerce Research and Applications 40 (2020): 100935.

Midha M., et al. “Empathetic analytics: Understanding depression through AI”. In APCIT 2024. IEEE (2024).

Neter J., et al. “Applied linear regression models”. Richard D. Irwin (1983).

Pagan M., et al. “Investigating the impact of data scaling on the k-nearest neighbor algorithm”. Computer Science and Information Technologies2 (2023): 135-142.

Pedregosa F., et al. “Scikit-learn: Machine learning in Python”. Journal of Machine Learning Research 12 (2011): 2825-2830.

Peng C Y J., et al. “An introduction to logistic regression analysis and reporting”. Journal of Educational Research1 (2020): 3-14.

Pham LT., et al. “Evaluation of random forests for short-term daily streamflow forecasting”. Hydrology and Earth System Sciences 25 (2021): 2997-3015.

Qu Y., et al. “Product-based neural networks for user response prediction”. In ICDM 2016. IEEE (2016).

Song P and Liu Y. “An XGBoost algorithm for predicting purchasing behaviour”. Tehnički vjesnik 5 (2020): 1467-1471.

Sreesouthry S., et al. “Loan prediction using logistic regression”. Annals of the Romanian Society for Cell Biology 4 (2021): 2790-2794.

Stoltzfus J C. “Logistic regression: A brief primer”. Academic Emergency Medicine10 (2011): 1099-1104.

Syaliman KU., et al. “Improving the accuracy of features weighted k-NN”. ICoSET (2020): 326-330.

Willmott C J and Matsuura K. “Advantages of MAE over RMSE”. Climate Research1 (2005): 79-82.

Yang L., et al. “RF-LightGBM”. arXiv (2021).

Zaghloul M., et al. “Predicting e-commerce customer satisfaction”. Journal of Retailing and Consumer Services 79 (2024): 103865.

Zhang S., et al. “Traffic accidents severity using ordinal logistic regression”. In ICAI 2024 (2024): 1007-1012.

Zhu S., et al. “Evaluation of random forests for streamflow forecasting”. Hydrology and Earth System Sciences6 (2021): 2997-3013.

Citation

Citation: Alaa A Elnazer., etal. “A Comparative Study of Statistical and Machine Learning Techniques for Predicting Customers Shopping Behavior". Acta Scientific Nutritional Health 10.5 (2026): 16-37.

Copyright

Copyright: © 2026 Alaa A Elnazer., etal. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

+91-88850-18660

Journal Menu

ASNH Home

Editorial Board

Review Board

Current Issue

Articles in Press

Archive

Special Issue Archive

Cover Letter

Manuscript Template

Metrics

Acceptance rate30%

Acceptance to publication20-30 days

Impact Factor1.316

Indexed In

Subscribe to our newsletter

News and Events

Publication Certificate
Authors will be provided with the Publication Certificate after their successful publication

Last Date for submission
Authors are requested to submit manuscripts on/before June 18, 2026, for the upcoming issue of 2026.

Contact US

Acta Scientific Nutritional Health (ASNH)(ISSN: 2582-1423)

Research Article Volume 10 Issue 5

Abstract

References

Citation

Copyright

+91-88850-18660

Journal Menu

Metrics

Indexed In

Subscribe to our newsletter

News and Events

Contact US

Acta-Publications

Follow Us On