The exponential growth in the number of operators in the market, due to globalization and advances in the telecommunications industry, is increasing competition. In this competitive era, it has become imperative to maximize profits regularly, for this reason, various approaches have been proposed, especially acquiring new customers, up-selling existing customers and increasing the retention period of current customers. Retaining existing customers is the simplest and least expensive strategy compared to the others. In order to adopt this strategy, companies need to reduce the eventual churn of customers. The main reason for this loss of customers is dissatisfaction with the services provided to consumers and the support mechanism. To solve this problem, the solution is to predict which customers are likely to churn . Predicting churn is a crucial objective that helps in establishing customer retention and loyalty strategies. Along with the increasing competition in service delivery markets, the risk of customer churn is also increasing explosively. As a result, it has become imperative to implement strategies to keep track of loyal customers (non-churners). Churn models aim to identify early signals of churn and attempt to predict which customers leave voluntarily. Thus, many companies have realized that their existing database is one of their most valuable assets  and according to Abbasdimehr , churn prediction is a very useful tool to predict at-risk customers.
This article is organized as follows: the next section describes the problematic. Section 3 summarizes some related work. Section 4 represents a brief review of the selected classification techniques used for this study. The different steps of our methodology are discussed in Section 4. The results are presented in section 5. And section 6 to present the results and analyze the performance of each model. Finally, section 7 concludes the article.
2- Problem Description
To overcome the above problem, the company must correctly predict the customer's behavior. Churn management can be done in two ways: reactive and proactive.
The reactive strategy is to wait for the cancellation request received from the customer and then to offer interesting plans to retain the customer. The proactive strategy, on the other hand, prevents the customer from unsubscribing. This is because the possibility of unsubscribing is anticipated, and plans are offered to customers accordingly. This results in a binary classification problem where churners are distinguished from non-churners.
To deal with this problem, machine learning algorithms have emerged as a very powerful technique for predicting information on the previously captured database , including linear regression, support vector machines, naive bayes, decision trees, random forests, etc. In machine learning models, the information is then used to predict the churn.
In machine learning models, after preprocessing, feature selection plays a major role in increasing the accuracy of classification. Researchers have developed a large number of approaches for feature selection that reduce dimensionality, overfitting, and computational complexity. For churn prediction, these features are taken from the given input vector and used for churn prediction. In this paper, to solve this problem, we used the following machine learning algorithms: Support vector machine, K-Nearest Neighbors, and Random Forest.
Support vector machines: This algorithm can be used in cases where there are two classes of objects (e.g., churners and non-churners). It can also be used when there are more than two classes of objects (e.g., churners, potential churners, and non-churners).
K-nearest neighbor: This algorithm is suitable for time series data, categorical data, and sparse datasets.
Random forest: This is a supervised machine learning algorithm which is based on classification and regression trees. It is designed to handle both categorical and continuous data types.
3- Related Work
This section briefly summarizes some related work proposed by leading researchers for the prediction of churn in the telecommunications sector.
The authors in  analyzed the variables that impact customer churn. They also conducted a comparative study of three machine learning models such as regression, neural network, and regression trees. The results showed that the decision tree performs better than the others because of its rule-based architecture. However, the accuracy obtained can be further improved by using one of the existing feature selection methods.
The authors in  adopted three machine learning approaches, namely support vector machine, neural network, and Bayesian networks for attrition rate prediction. A principal component analysis is taken into account in the feature selection process which allows for a reduction in data dimensions. However, the feature selection process can be improved to increase the accuracy of the classification by applying an optimization algorithm. The gain measure and the ROC curve were used to evaluate the performance.
In another study , the authors tried to solve the customer loss prediction problem using a support vector machine, a random forest and logistic regression. The performance of the SVM was approximately equal to that of the logistic regression and the random forest, but once optimal parameter selection was considered, the SVM outperformed the logistic regression and the random forest in terms of PCC and AUC.
Paper  presents an overview of all the machine learning models considered, as well as a detailed analysis of the feature selection techniques in use. The authors found that in the prediction models the decision tree had a higher efficiency than the others. In feature selection, optimization techniques also play an essential role that improves the prediction techniques. After a comparative analysis of existing methods.
In , the authors adopted two machine learning models, decision tree and logistic regression on a churn prediction dataset. They used the WEKA tool for experimentation. But the customer churn problem can be solved more effectively by using other machine learning methods.
4- A Brief Review of the Machine Learning Classification Algorithms Used
4-1- Bagging Support Vector Machine
Support Vector Machine or SVM is a supervised learning technique that aims to analyze data to detect patterns. There are two types of support vector machines: linear and nonlinear . If the data domain can be divided linearly (e.g., straight line or hyperplane) to separate the classes in the original domain, it is referred to as a linear support vector machine. Nonlinear support vector machine is used when the data domain cannot be split linearly and can be translated to a space called the feature space where the data domain can be divided linearly to distinguish the classes . On the basis of a set of training data, SVM attempts to determine the optimal separating hyperplanes between examples of distinct classes by representing observations as points in a high dimensional space. New instances are represented in the same space and assigned to a class depending on their closeness to the dividing gap .
Bagging, also called Bootstrap aggregating, is an ensemble learning approach that helps in the improvement of performance and accuracy of a machine learning algorithm. It is mainly used to minimize a prediction model's variance and to deal with bias-variance tradeoffs. In  Various simulated results for IRIS data categorization and hand-written digit identification demonstrate that the proposed SVM ensembles with bagging significantly outperform a single SVM in terms of classification accuracy. When it comes to the customer churn issue, ensemble-learning techniques have been used as shown in , , .
4-2- K-Nearest Neighbors
When there is little or no prior knowledge about the distribution of the data, K-Nearest Neighbors (k-NN) classification is one of the most fundamental and straightforward classification procedures and should be one of the initial options for classification research . The k-NN classification arose from the necessity to do discriminant analysis when valid parametric estimates of probability densities are unknown or impossible to calculate.
The k-NN method predicts the values of new data points based on “feature similarity”, which implies that the new data point will be assigned a value depending on how closely it resembles the points in the training set. k-NN does not attempt to build an internal model, and no calculations are done until classification time. k-NN merely stores instances of the training data in the features space, and an instance's class is chosen by the majority vote of its neighbors. The class most prevalent among its neighbors is assigned to the instance. k-NN finds neighbors based on distance utilizing Euclidian, Manhattan, or Murkowski distance measures for continuous variables and hamming distance measures for categorical data .
4-3- Random Forest
Random forests, also known as random choice forests, are an ensemble learning approach for classification, regression, and other tasks that work by building a large number of decision trees during training. It contains multiple decision trees, each reflecting a unique instance of the random forest's classification of data input. The random forest approach examines each case independently, selecting the one with the most votes as the chosen prediction. The classification findings in  suggest that Random Forest outperforms Decision Tree (J48) for the same number of characteristics and big data sets, i.e., with a higher number of instances, but Decision Tree (J48) is useful for small data sets (less number of instances). In addition to that the study in  shows that, the best classification model out of naïve Bayes, decision tree, and random forest is random forest because of its high accuracy of 97.5 % when compared to the classification model of decision tree, which has an accuracy of 88.7 %.
The steps and advantages of the proposed technique are as follows (Fig.1):
The gravitational search algorithm was used to select the features and reduce the dimensions of the data set, unlike the above existing approaches where the prediction accuracy is low due to inadequate feature selection.
After data preprocessing, some of the most important machine learning techniques used for predictions, including SVM, were applied. To avoid overfitting, cross-validation was performed, unlike other techniques where the overfitting prevention mechanism is not considered.
The power of ensemble learning was then utilized to optimize the algorithms and obtain better results, unlike the previously mentioned techniques where the performance of ensemble learning is not taken into account, which explains the low accuracies obtained.
The algorithms were then evaluated on the test set using the confusion matrix and the AUC curve to compare the best performing algorithm for the given data set.
Fig. 1 Proposed system architecture
5-1- Presentation of the Data Set Used
The data set that we used in our experiments is “Telco Customer Churn” which is available on the Kaggle site, which contains a data set of 7043 customers. The data set includes information about:
• Customers who have left the company in the last month - the column is called Churn.
• The services to which each customer has subscribed: telephone, multiple lines, internet, online security, online backup, device protection, technical support and streaming TV and movies.
• Customer account information: how long they have been a customer, type of contract, method of payment, electronic billing, monthly charges, and total charges.
• Demographic information about customers: gender, age range, and whether they have partners and dependents.
The database consists of 21 attributes including a target value called Churn. The data set of the customer attributes alongside with their description is presented in Table. 1, Table. 2 presents the type of these attributes.
Table 1: Margin specifications
A code specific to each customer
The gender of the customer
Whether the client is young or old
Whether the client is married or not
If the client has someone dependent on him
The number of months during which the customer is loyal
Whether the customer has a telephone service or not
Whether the customer has a multitude of lines or not
If the customer has an internet service
If the customer has online security
If the client has an online backup
If the customer has device security
If the customer has technical support
If the customer has on-demand television
If the customer has the movies on demand
The contract renewal period
Whether the customer has paperless or non-paperless billing
The payment method of the customer
The monthly charge of the client
The total charge of the client
If the customer cancels his contract or not
Table 2: Type of attributes
5-2- Data Analysis
After importing data from the “.csv” file, the df.info() command was executed to display information about the database.
It has been noticed that there is a problem with the types of the attributes "SeniorCitizen" and "TotalCharges" in which "SeniorCitizen" needs to be converted to a string type, and "TotalCharges" needs to be converted to an integer type. The conversion of these attributes to their appropriate types will be carried out as the first step.
After the conversion, it was observed that the "TotalCharges" column has 11 missing values. It is known that the "TotalCharges" variable can be calculated by multiplying the two variables "Tenure" and "MonthlyCharges". However, for all the entries in the "TotalCharges" column, the corresponding "Tenure" value is 0, indicating that these customers are in their first month. Therefore, the value of "MonthlyCharges" will be directly assigned to them as their "TotalCharges" value.
The database has been cleaned and is now prepared for visualization.
5-3- Data Visualization
a) Qualitative Variables
The qualitative variables were visualized using Python, as depicted in the figures below Fig. 2-Fig. 18:
Fig. 2 Male/female distribution
Fig. 3 Young/old people distribution
Fig. 4 Single/engaged people distribution
Fig. 5 Independent/ dependent people distribution
Fig. 6 Distribution of the customers having a telephone line at disposition
Fig. 7 Distribution of customers with several telephone lines available
Fig. 8 Type of the customer's internet service provider
Fig. 9 Distribution of customers with online security
Fig. 10 Distribution of customers with an online backup available
Fig. 11 Distribution of customers with a protective device
Fig. 12 Distribution of customers with technical support