Implementation of Machine Learning Algorithms for Customer Churn Prediction

Loukili, Manal; Messaoudi, Fayçal; El Youbi, Raouya

doi:10.61186/jist.34208.11.43.196

Manuscript ID : 2022022034208 Visit : 8453 Page: 196 - 208

10.61186/jist.34208.11.43.196

20.1001.1.23221437.2023.11.43.5.4

Article Type: Case Study

Implementation of Machine Learning Algorithms for Customer Churn Prediction

Subject Areas : Machine learning

Manal Loukili ^{1
*} , Fayçal Messaoudi ² , Raouya El Youbi ³

1 - National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco
2 - National School of Business and Management, Sidi Mohamed Ben Abdellah University, Fez, Morocco
3 - National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco

Received: 2022-02-20 Accepted : 2022-08-10 Published : 2023-08-20

Keywords: Machine Learning, Churn Prediction, Consumer Behavior, Bagging SVM, k-NN, Random Forest,

Abstract :

Churn prediction is one of the most critical issues in the telecommunications industry. The possibilities of predicting churn have increased considerably due to the remarkable progress made in the field of machine learning and artificial intelligence. In this context, we propose the following process which consists of six stages. The first phase consists of data pre-processing, followed by feature analysis. In the third phase, the selection of features. Then the data was divided into two parts: the training set and the test set. In the prediction process, the most popular predictive models were adopted, namely random forest, k-nearest neighbor, and support vector machine. In addition, we used cross-validation on the training set for hyperparameter tuning and to avoid model overfitting. Then, the results obtained on the test set were evaluated using the confusion matrix and the AUC curve. Finally, we found that the models used gave high accuracy values (over 79%). The highest AUC score, 84%, is achieved by the SVM and bagging classifiers as an ensemble method which surpasses them.

References:

[1] M. Loukili, F. Messaoudi, and M. El Ghazi, "Supervised Learning Algorithms for Predicting Customer Churn with Hyperparameter Optimization", International Journal of Advances in Soft Computing & Its Applications, Vol. 14, No. 3, 2022, pp. 49-63. doi: 10.15849/IJASCA.221128.04.
[2] K. Matuszelański, and K. Kopczewska, "Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach". J. Theor. Appl. Electron. Commer. Res. 2022, 17, pp. 165-198. https://doi.org/10.3390/jtaer17010009.
[3] H. Abbasimehr, M Setak, and M Tarokh, "A neuro-fuzzy classifier for customer churn prediction", International Journal of Computer Applications, Vol. 19, No. 8, 2011, pp. 35-41.
[4] A. K. Ahmad, A. Jafar, and K. Aljoumaa, "Customer churn prediction in telecom using machine learning in big data platform". Journal of Big Data, Vol. 6, No. 1, 2019, pp. 28 .
[5] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, "Churn prediction : Does technology matter", International Journal of Intelligent Technology, Vol. 1, No. 2, 2006, pp. 104-110.
[6] I. Brându¸soiu, G. Toderean, and H. Beleiu, "Methods for churn prediction in the pre-paid mobile telecommunications industry", in 2016 International conference on communications (COMM), IEEE, 2016, pp. 97-100.
[7] K. Coussement, and D. Van den Poel, "Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques", Expert systems with applications, Vol. 34, No. 1, pp. 313-327.
[8] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, "Computer assisted customer churn management: State-of-the-art and future trends", Computers & Operations Research Vol. 34, No. 10, 2007, pp. 02-29.
[9] K. Dahiya, and S. Bhatia, "Customer churn analysis in telecom industry", in 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), Trends and Future Directions, 2015, pp. 1-6.
[10] L. Bottou, "Large-scale machine learning with stochastic gradient descent", in Proceedings of COMPSTAT’2010, 2010, Physica-Verlag HD, pp. 177-186.
[11] S. Suthaharan, "Support Vector Machine in Machine learning Models and Algorithms for Big Data Classification", Integrated Series in Information Systems, Springer, New York, Vol. 36, 2016, pp. 207-235.
[12] S. F. Sabbeh, "Machine-learning techniques for customer retention: A comparative study", International Journal of Advanced Computer Science and Applications, Vol. 9, No. 2, 2018.
[13] H. C. Kim, S. Pang, H. M. Je, D. Kim, and S. Y. Bang, "Support vector machine ensemble with bagging", Berlin, Heidelberg, Springer, 2002, pp. 397-408.
[14] H. Abbasimehr, M. Setak, and M. J. Tarokh, "A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction", Int. Arab J. Inf. Technol, Vol. 11, No. 6, 2014, pp. 599-606. [15] S. Tavassoli, and H. Koosha, "Hybrid Ensemble Learning Approaches to Customer Churn Prediction", Kybernetes, 2021.
[16] A. Mishra, and U. S. Reddy, "A comparative study of customer churn prediction in telecom industry using ensemble-based classifiers", in 2017 International Conference on Inventive Computing and Informatics (ICICI), 2017, IEEE, pp. 721-725.
[17] N. Ali, D. Neagu, and P. Trundle, "Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets", SN Applied Sciences, Vol. 1, 2019, pp. 1-15.
[18] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, "Random Forests and Decision Trees, International", Journal of Computer Science Issues, Vol. 9, No. 5, 2012, pp. 272-275.
[19] A. Alamsyah, and N. Salma, "A Comparative Study of Employee Churn Prediction Model", in 2018 4th International Conference on Science and Technology, IEEE, 2018, pp. 1-4.
[20] M. Loukili, F. Messaoudi, and M. El Ghazi, "Sentiment Analysis of Product Reviews for E-Commerce Recommendation based on Machine Learning", International Journal of Advances in Soft Computing & Its Applications, Vol. 15, No. 1, 2023, pp. 1-13.

Full-Text:

Title

Implementation of Machine Learning Algorithms for Customer Churn Prediction

Manal Loukili1*, Fayçal Messaoudi2, Raouya El Youbi1

1.National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco

2.National School of Business and Management, Sidi Mohamed Ben Abdellah University, Fez, Morocco

Received: 20 Feb 2022/ Revised: 04 Jul 2022/ Accepted: 10 Aug 2022

Abstract

Keywords: Machine Learning; Churn Prediction; Bagging SVM; k-NN; Random Forest.

1- Introduction

The exponential growth in the number of operators in the market, due to globalization and advances in the telecommunications industry, is increasing competition. In this competitive era, it has become imperative to maximize profits regularly, for this reason, various approaches have been proposed, especially acquiring new customers, up-selling existing customers and increasing the retention period of current customers. Retaining existing customers is the simplest and least expensive strategy compared to the others. In order to adopt this strategy, companies need to reduce the eventual churn of customers. The main reason for this loss of customers is dissatisfaction with the services provided to consumers and the support mechanism. To solve this problem, the solution is to predict which customers are likely to churn [1]. Predicting churn is a crucial objective that helps in establishing customer retention and loyalty strategies. Along with the increasing competition in service delivery markets, the risk of customer churn is also increasing explosively. As a result, it has become imperative to implement strategies to keep track of loyal customers (non-churners). Churn models aim to identify early signals of churn and attempt to predict which customers leave voluntarily. Thus, many companies have realized that their existing database is one of their most valuable assets [2] and according to Abbasdimehr [3], churn prediction is a very useful tool to predict at-risk customers.

This article is organized as follows: the next section describes the problematic. Section 3 summarizes some related work. Section 4 represents a brief review of the selected classification techniques used for this study. The different steps of our methodology are discussed in Section 4. The results are presented in section 5. And section 6 to present the results and analyze the performance of each model. Finally, section 7 concludes the article.

2- Problem Description

To overcome the above problem, the company must correctly predict the customer's behavior. Churn management can be done in two ways: reactive and proactive.

The reactive strategy is to wait for the cancellation request received from the customer and then to offer interesting plans to retain the customer. The proactive strategy, on the other hand, prevents the customer from unsubscribing. This is because the possibility of unsubscribing is anticipated, and plans are offered to customers accordingly. This results in a binary classification problem where churners are distinguished from non-churners.

To deal with this problem, machine learning algorithms have emerged as a very powerful technique for predicting information on the previously captured database [4], including linear regression, support vector machines, naive bayes, decision trees, random forests, etc. In machine learning models, the information is then used to predict the churn.

In machine learning models, after preprocessing, feature selection plays a major role in increasing the accuracy of classification. Researchers have developed a large number of approaches for feature selection that reduce dimensionality, overfitting, and computational complexity. For churn prediction, these features are taken from the given input vector and used for churn prediction. In this paper, to solve this problem, we used the following machine learning algorithms: Support vector machine, K-Nearest Neighbors, and Random Forest.

Support vector machines: This algorithm can be used in cases where there are two classes of objects (e.g., churners and non-churners). It can also be used when there are more than two classes of objects (e.g., churners, potential churners, and non-churners).

K-nearest neighbor: This algorithm is suitable for time series data, categorical data, and sparse datasets.

Random forest: This is a supervised machine learning algorithm which is based on classification and regression trees. It is designed to handle both categorical and continuous data types.

3- Related Work

This section briefly summarizes some related work proposed by leading researchers for the prediction of churn in the telecommunications sector.

The authors in [5] analyzed the variables that impact customer churn. They also conducted a comparative study of three machine learning models such as regression, neural network, and regression trees. The results showed that the decision tree performs better than the others because of its rule-based architecture. However, the accuracy obtained can be further improved by using one of the existing feature selection methods.

The authors in [6] adopted three machine learning approaches, namely support vector machine, neural network, and Bayesian networks for attrition rate prediction. A principal component analysis is taken into account in the feature selection process which allows for a reduction in data dimensions. However, the feature selection process can be improved to increase the accuracy of the classification by applying an optimization algorithm. The gain measure and the ROC curve were used to evaluate the performance.

In another study [7], the authors tried to solve the customer loss prediction problem using a support vector machine, a random forest and logistic regression. The performance of the SVM was approximately equal to that of the logistic regression and the random forest, but once optimal parameter selection was considered, the SVM outperformed the logistic regression and the random forest in terms of PCC and AUC.

Paper [8] presents an overview of all the machine learning models considered, as well as a detailed analysis of the feature selection techniques in use. The authors found that in the prediction models the decision tree had a higher efficiency than the others. In feature selection, optimization techniques also play an essential role that improves the prediction techniques. After a comparative analysis of existing methods.

In [9], the authors adopted two machine learning models, decision tree and logistic regression on a churn prediction dataset. They used the WEKA tool for experimentation. But the customer churn problem can be solved more effectively by using other machine learning methods.

4- A Brief Review of the Machine Learning Classification Algorithms Used

4-1- Bagging Support Vector Machine

Support Vector Machine or SVM is a supervised learning technique that aims to analyze data to detect patterns. There are two types of support vector machines: linear and nonlinear [10]. If the data domain can be divided linearly (e.g., straight line or hyperplane) to separate the classes in the original domain, it is referred to as a linear support vector machine. Nonlinear support vector machine is used when the data domain cannot be split linearly and can be translated to a space called the feature space where the data domain can be divided linearly to distinguish the classes [11]. On the basis of a set of training data, SVM attempts to determine the optimal separating hyperplanes between examples of distinct classes by representing observations as points in a high dimensional space. New instances are represented in the same space and assigned to a class depending on their closeness to the dividing gap [12].

Bagging, also called Bootstrap aggregating, is an ensemble learning approach that helps in the improvement of performance and accuracy of a machine learning algorithm. It is mainly used to minimize a prediction model's variance and to deal with bias-variance tradeoffs. In [13] Various simulated results for IRIS data categorization and hand-written digit identification demonstrate that the proposed SVM ensembles with bagging significantly outperform a single SVM in terms of classification accuracy. When it comes to the customer churn issue, ensemble-learning techniques have been used as shown in [14], [15], [16].

4-2- K-Nearest Neighbors

When there is little or no prior knowledge about the distribution of the data, K-Nearest Neighbors (k-NN) classification is one of the most fundamental and straightforward classification procedures and should be one of the initial options for classification research [17]. The k-NN classification arose from the necessity to do discriminant analysis when valid parametric estimates of probability densities are unknown or impossible to calculate.

The k-NN method predicts the values of new data points based on “feature similarity”, which implies that the new data point will be assigned a value depending on how closely it resembles the points in the training set. k-NN does not attempt to build an internal model, and no calculations are done until classification time. k-NN merely stores instances of the training data in the features space, and an instance's class is chosen by the majority vote of its neighbors. The class most prevalent among its neighbors is assigned to the instance. k-NN finds neighbors based on distance utilizing Euclidian, Manhattan, or Murkowski distance measures for continuous variables and hamming distance measures for categorical data [18].

4-3- Random Forest

Random forests, also known as random choice forests, are an ensemble learning approach for classification, regression, and other tasks that work by building a large number of decision trees during training. It contains multiple decision trees, each reflecting a unique instance of the random forest's classification of data input. The random forest approach examines each case independently, selecting the one with the most votes as the chosen prediction. The classification findings in [18] suggest that Random Forest outperforms Decision Tree (J48) for the same number of characteristics and big data sets, i.e., with a higher number of instances, but Decision Tree (J48) is useful for small data sets (less number of instances). In addition to that the study in [19] shows that, the best classification model out of naïve Bayes, decision tree, and random forest is random forest because of its high accuracy of 97.5 % when compared to the classification model of decision tree, which has an accuracy of 88.7 %.

5- Methodology

The steps and advantages of the proposed technique are as follows (Fig.1):

The gravitational search algorithm was used to select the features and reduce the dimensions of the data set, unlike the above existing approaches where the prediction accuracy is low due to inadequate feature selection.

After data preprocessing, some of the most important machine learning techniques used for predictions, including SVM, were applied. To avoid overfitting, cross-validation was performed, unlike other techniques where the overfitting prevention mechanism is not considered.

The power of ensemble learning was then utilized to optimize the algorithms and obtain better results, unlike the previously mentioned techniques where the performance of ensemble learning is not taken into account, which explains the low accuracies obtained.

The algorithms were then evaluated on the test set using the confusion matrix and the AUC curve to compare the best performing algorithm for the given data set.

A diagram of a data processing process

Description automatically generated with medium confidence

Fig. 1 Proposed system architecture

5-1- Presentation of the Data Set Used

The data set that we used in our experiments is “Telco Customer Churn” which is available on the Kaggle site, which contains a data set of 7043 customers. The data set includes information about:

• Customers who have left the company in the last month - the column is called Churn.

• The services to which each customer has subscribed: telephone, multiple lines, internet, online security, online backup, device protection, technical support and streaming TV and movies.

• Customer account information: how long they have been a customer, type of contract, method of payment, electronic billing, monthly charges, and total charges.

• Demographic information about customers: gender, age range, and whether they have partners and dependents.

The database consists of 21 attributes including a target value called Churn. The data set of the customer attributes alongside with their description is presented in Table. 1, Table. 2 presents the type of these attributes.

Table 1: Margin specifications

Attribute Name	Type	Description
CustomerID	Unique key	A code specific to each customer
Gender	Categorical	The gender of the customer
SeniorCitizen	Categorical	Whether the client is young or old
Partner	Categorical	Whether the client is married or not
Dependents	Categorical	If the client has someone dependent on him
Tenure	Integer	The number of months during which the customer is loyal
PhoneService	Categorical	Whether the customer has a telephone service or not
MultipleLines	Categorical	Whether the customer has a multitude of lines or not
InternetService	Categorical	If the customer has an internet service
OnlineSecurity	Categorical	If the customer has online security
OnlineBackup	Categorical	If the client has an online backup
DeviceProtection	Categorical	If the customer has device security
TechSupport	Categorical	If the customer has technical support
StreamingTV	Categorical	If the customer has on-demand television
StreamingMovie	Categorical	If the customer has the movies on demand
Contract	Categorical	The contract renewal period
PaperlessBilling	Categorical	Whether the customer has paperless or non-paperless billing
PaymentMethod	Categorical	The payment method of the customer
MonthlyCharges	Integer	The monthly charge of the client
TotalCharges	Integer	The total charge of the client
Churn	Categorical	If the customer cancels his contract or not

Table 2: Type of attributes

	Index
	Demographic Information
	Services
	Account information
	Target

5-2- Data Analysis

After importing data from the “.csv” file, the df.info() command was executed to display information about the database.

It has been noticed that there is a problem with the types of the attributes "SeniorCitizen" and "TotalCharges" in which "SeniorCitizen" needs to be converted to a string type, and "TotalCharges" needs to be converted to an integer type. The conversion of these attributes to their appropriate types will be carried out as the first step.

After the conversion, it was observed that the "TotalCharges" column has 11 missing values. It is known that the "TotalCharges" variable can be calculated by multiplying the two variables "Tenure" and "MonthlyCharges". However, for all the entries in the "TotalCharges" column, the corresponding "Tenure" value is 0, indicating that these customers are in their first month. Therefore, the value of "MonthlyCharges" will be directly assigned to them as their "TotalCharges" value.

The database has been cleaned and is now prepared for visualization.

5-3- Data Visualization

a) Qualitative Variables

The qualitative variables were visualized using Python, as depicted in the figures below Fig. 2-Fig. 18:

A blue and red squares

Description automatically generated with low confidence

Fig. 2 Male/female distribution

A picture containing text, screenshot, rectangle, square

Description automatically generated

Fig. 3 Young/old people distribution

A picture containing text, screenshot, rectangle, square

Description automatically generated

Fig. 4 Single/engaged people distribution

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 5 Independent/ dependent people distribution

A picture containing text, screenshot, rectangle, plot

Description automatically generated

Fig. 6 Distribution of the customers having a telephone line at disposition

Fig. 7 Distribution of customers with several telephone lines available

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 8 Type of the customer's internet service provider

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 9 Distribution of customers with online security

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 10 Distribution of customers with an online backup available

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 11 Distribution of customers with a protective device

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 12 Distribution of customers with technical support

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 13 Distribution of customers with on-demand television available

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 14 Distribution of customers with on-demand movies available

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 15 Distribution of customers according to the duration of their contract

A picture containing text, screenshot, rectangle

Description automatically generated

Fig. 16 Distribution of customers according to their billing

A picture containing text, screenshot, colorfulness, rectangle

Description automatically generated

Fig. 17 Distribution of customers according to their type of payment

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 18 Distribution of customers according to their loyalty

b) Quantitative Variables

The quantitative variables were visualized as shown in the figures below Fig. 19-Fig. 21:

A picture containing text, screenshot, diagram, plot

Description automatically generated

Fig. 19 Distribution number of months of loyalty

A picture containing diagram, screenshot, rectangle, line

Description automatically generated

Fig. 20 Distribution of the amount invoiced to the customer each month

A picture containing text, screenshot, diagram, line

Description automatically generated

Fig. 21 Distribution of the amount invoiced to the customer in total

5-4- Study of Correlation

The figures that follow represent the correlation between each of the variables and the target variable Fig. 22-Fig. 36:

A screenshot of a graph

Description automatically generated with low confidence

Fig. 22 Gender and churn correlation

A picture containing screenshot, text, rectangle, colorfulness

Description automatically generated

Fig. 23 Age and churn correlation

A picture containing screenshot, text, rectangle, square

Description automatically generated

Fig. 24 Marital status and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 25 Financial status and churn correlation

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 26 Telephone coverage and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 27 Type of phone coverage and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 28 Type of internet service and churn correlation

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 29 Online security coverage and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 30 Online backup coverage and churn correlation

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 31 Protective devise coverage and churn correlation

Fig. 32 Technical support coverage and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 33 TV on demand coverage and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 34 Movies on demand coverage and churn correlation

A picture containing text, screenshot, rectangle, colorfulness

Description automatically generated

Fig. 35 Movies on demand’s duration of contract and churn correlation

A picture containing text, screenshot, rectangle, diagram

Description automatically generated

Fig. 36 Method of payment and churn correlation

The Fig. 37 shows the correlation between the churn variable and the other variables:

Fig. 37 The correlation between churn and other variables

Based on the correlation diagram and the results obtained from the visualization of the independence of each variable with the target variable, it has been observed that there are variables with weak correlation. However, these variables will be retained as the objective is not to generalize the model, but rather to specifically fit it to this particular company. This approach may result in the model being overfitted to match the company's specific characteristics.

5-5- The Learning Phase

The learning in this study is based on three machine learning algorithms: Bagging SVM, Random Forest and the k-NN algorithm.

The first step in the learning is to convert all qualitative variables into quantitative variables.

After the execution, the "df_dum" DataFrame now consisted solely of quantitative variables.

Subsequently, the various methods and algorithms available in the "Scikit-Learn" library were implemented using Python language.

5-6- Algorithms Implementation

Bagging SVM K-Nearest Neighbors

Random Forest

6- Results and Performance Analysis

6-1- Confusion Matrix

In order to evaluate the performance of the applied models and the prediction rate of customer churn on the test set, various metrics were utilized, including precision, recall, accuracy, and F-score..They measure the ability of the predictive models to correctly predict the customer churn. The four indicators previously mentioned are calculated from the information captured using the confusion matrix and are presented in Table 3. The representation of the confusion matrix is shown in Fig. 38. True positives and false positives are denoted Tp and Fp, while false negatives and true negatives are denoted Fn and Tn [20].

- True positive (TP): The number of clients who are in the churn class and whom the predictive model correctly predicted.

- True negative (TN): The number of clients who are in the non-churners class and that the predictive model correctly predicted.

- False positive (FP): The number of customers who are not churners but that the predictive algorithm identified as churners.

- False Negative (FN): The number of customers who are churners but that the predictive model identified as non-churners.

A picture containing text, screenshot, diagram, font

Description automatically generated

Fig. 38 Confusion matrix

6-2- Performance Indicators

· Recall:

The recall is the ratio of true churners or true positives (TP), and is calculated as follows:

Recall = TP/(TP+FN)

· Precision:

The precision is the ratio of predicted correct churners, its formula is as follows:

Precision = TP/(TP+FP)

· Accuracy:

The accuracy is the ration of the number of all correct predictions and is written as:

Accuracy = (TP+TN)/(TP+FP+TN+FN)

· F-score:

The F-score is the harmonic mean of precision and recall and is written as follows:

F-score = (2* Precision*Recall)/(Precision+Recall)

A value closer to 1 implies a better combination of precision and recall achieved by the classifier.

6-3- AUC Curve

To evaluate the performance of the models on the positive and negative classes of the test set, we employed the AUC curve. A high value of the AUC score indicates that the model performs better on the positive and negative classes. The AUC scores achieved for the three predictive models used to predict the target variable Churn are shown in Table 3 and Fig. 39, graphically represents the AUC scores obtained for Bagging SVM, k-NN and Random Forest. According to the AUC scores, all the selected models perform well on the test set. However, the most adequate classifier is Bagging SVM with an AUC score of 84%.

Table 3: Comparison of the model used

Model	Accuracy %	Recall %	Precision %	F-score %	AUC Score %
Bagging SVM	80.26	80.63	79.71	80.16	84
KNN	79.03	76.30	76.30	76.92	80
Random Forest	79.47	79.44	78.91	79.14	82

A picture containing text, diagram, line, plot

Description automatically generated

Fig. 39 Models AUC curve. (1): K-Nearest Neighbors. (2): Random Forest. (3): Bagging SVM.

A picture containing screenshot, text, plot, line

Description automatically generated A picture containing text, screenshot, plot, diagram

Description automatically generated

Fig. 40 Predictive models performance indicators: Accuracy, Recall, Precision, F-score

7- Conclusion

In the telecommunications sector, churn prediction is an issue that has attracted the interest of various researchers in recent years. It is becoming one of the sources of revenue for companies and helps to prevent customers from terminating their contracts, it opens the possibility of renegotiation in order to retain the customer by implementing retention strategies.

In this paper, we provide a comparative study of churn prediction in the telecom sector using well-known machine learning techniques such as random forest, k-nearest neighbor, and support vector machine. The experimental results show that all three machine learning techniques give high accuracy for churn prediction. The SVM and bagging classifiers as an ensemble method outperformed the other algorithms in terms of all performance measures such as accuracy, precision, F-measure, recall and AUC score. Predicting churn for a business is proving to be a very tedious task so there is stiff competition in the market to retain customers by providing services that are beneficial to both parties. In futurity, with the upcoming concepts and frameworks in the field of reinforcement learning and deep learning, machine learning proves to be one of the most used and efficient ways to overcome the problems like churn prediction with better accuracy and precision.

References

[1] M. Loukili, F. Messaoudi, and M. El Ghazi, "Supervised Learning Algorithms for Predicting Customer Churn with Hyperparameter Optimization", International Journal of Advances in Soft Computing & Its Applications, Vol. 14, No. 3, 2022, pp. 49-63. doi: 10.15849/IJASCA.221128.04

[2] K. Matuszelański, and K. Kopczewska, "Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach". J. Theor. Appl. Electron. Commer. Res. 2022, 17, pp. 165-198. https://doi.org/10.3390/jtaer17010009.

[3] H. Abbasimehr, M Setak, and M Tarokh, "A neuro-fuzzy classifier for customer churn prediction", International Journal of Computer Applications, Vol. 19, No. 8, 2011, pp. 35-41.

[4] A. K. Ahmad, A. Jafar, and K. Aljoumaa, "Customer churn prediction in telecom using machine learning in big data platform". Journal of Big Data, Vol. 6, No. 1, 2019, pp. 28

[5] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, "Churn prediction : Does technology matter", International Journal of Intelligent Technology, Vol. 1, No. 2, 2006, pp. 104-110

[6] I. Brându¸soiu, G. Toderean, and H. Beleiu, "Methods for churn prediction in the pre-paid mobile telecommunications industry", in 2016 International conference on communications (COMM), IEEE, 2016, pp. 97-100.

[7] K. Coussement, and D. Van den Poel, "Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques", Expert systems with applications, Vol. 34, No. 1, pp. 313-327

[8] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, "Computer assisted customer churn management: State-of-the-art and future trends", Computers & Operations Research Vol. 34, No. 10, 2007, pp. 02-29.

[9] K. Dahiya, and S. Bhatia, "Customer churn analysis in telecom industry", in 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), Trends and Future Directions, 2015, pp. 1-6.

[10] L. Bottou, "Large-scale machine learning with stochastic gradient descent", in Proceedings of COMPSTAT’2010, 2010, Physica-Verlag HD, pp. 177-186.

[11] S. Suthaharan, "Support Vector Machine in Machine learning Models and Algorithms for Big Data Classification", Integrated Series in Information Systems, Springer, New York, Vol. 36, 2016, pp. 207-235.

[12] S. F. Sabbeh, "Machine-learning techniques for customer retention: A comparative study", International Journal of Advanced Computer Science and Applications, Vol. 9, No. 2, 2018.

[13] H. C. Kim, S. Pang, H. M. Je, D. Kim, and S. Y. Bang, "Support vector machine ensemble with bagging", Berlin, Heidelberg, Springer, 2002, pp. 397-408.

[14] H. Abbasimehr, M. Setak, and M. J. Tarokh, "A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction", Int. Arab J. Inf. Technol, Vol. 11, No. 6, 2014, pp. 599-606.

[15] S. Tavassoli, and H. Koosha, "Hybrid Ensemble Learning Approaches to Customer Churn Prediction", Kybernetes, 2021.

[16] A. Mishra, and U. S. Reddy, "A comparative study of customer churn prediction in telecom industry using ensemble-based classifiers", in 2017 International Conference on Inventive Computing and Informatics (ICICI), 2017, IEEE, pp. 721-725.

[17] N. Ali, D. Neagu, and P. Trundle, "Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets", SN Applied Sciences, Vol. 1, 2019, pp. 1-15.

[18] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, "Random Forests and Decision Trees, International", Journal of Computer Science Issues, Vol. 9, No. 5, 2012, pp. 272-275.

[19] A. Alamsyah, and N. Salma, "A Comparative Study of Employee Churn Prediction Model", in 2018 4th International Conference on Science and Technology, IEEE, 2018, pp. 1-4.

[20] M. Loukili, F. Messaoudi, and M. El Ghazi, "Sentiment Analysis of Product Reviews for E-Commerce Recommendation based on Machine Learning", International Journal of Advances in Soft Computing & Its Applications, Vol. 15, No. 1, 2023, pp. 1-13.

* Manal Loukili

manal.loukili@usmba.ac.ma

Enhancing IoT Security: A Hybrid Deep Learning-Based Intrusion Detection System Utilizing LSTM, GRU, and Attention Mechanisms with Optimized Hyperparameter Tuning
Print Date : 2025-11-02
Resolving Class Imbalance in Medical Classification: Technique Comparison and Performance Evaluation
Print Date : 2025-11-02
Optimizing Hyperparameters for Customer Churn Prediction with PSO-Enhanced Composite Deep Learning Techniques
Print Date : 2025-07-26
A Holistic Approach to Stress Identification: Integrating Questionnaires and Physiological Signals through Machine Learning
Print Date : 2025-07-26
Designing a Hybrid Algorithm that Combines Deep Learning and PSO for Proactive Detection of Attacks in IoT Networks
Print Date : 2025-07-26
Credit Risk Prediction: An Application of Federated Learning
Print Date : 2025-07-26

Share To

Article Url

Implementation of Machine Learning Algorithms for Customer Churn Prediction