This paper aims to predict the churn of telecom customers, which will help us react in time and try to retain the existing users who want to switch to different networks. We will be using three different machine learning techniques for classification Support Vector Machines, K-Nearest Neighbour and Random Forest also find out the best model for classification.The data consists of information about almost six thousand users including the services they use, their demographic characteristics, the duration of the operator’s services, the amount of payment and the method of payment.In the dataset there are 20 variables, some of them which are numerical and most are categorical. There are also some missing values in the dataset. We have to do data pre- processing before implementing any model.(Data Pre-Processing) Let’s first remove the null values from the dataset. There are only 10 missing values present in total charge variable. The customers with NA values all have a tenure of 0, they are new clients who has yet to pay their bills therefore total charge value for them should be zero. We also have to drop unwanted columns like ‘gender’, ‘MultipleLines’ , ‘PhoneServices’ , ‘differences’.Exploratory Data Analysis Why the clients are more inclined to leave the company and on what factors it depends.'Phone services' were available in 91% of cases. 88 percent had a "month-to-month" contract, 82 percent had no "dependents," 78 percent had no "online security," 77 percent had no "tech support," 75 percent had "paperless billing," and 75 percent are "older citizens." 68 percent had fibre optic internet, 65 percent had no 'online backup' or 'device protection,' and 64 percent had no partner. 57 percent paid with an electronic check, 50 percent did not have'streaming TV,' were male, and did not have'streaming movies,' and 45 percent had'multiplelines.' hypotheses formulation Based on our observations, we believe that a client is more likely to depart if he has a high MonthlyCharge. This is especially true if the client is new (less than 15 months). It lacks particular services such as internet security, tech assistance, online backup, and/or device protection if the decision to quit is simple, i.e. there is no firm commitment: has a month-to-month contract, no other person involved in the decision: no dependant and/or spouse, everything can be done via the internet or over the phone: Paperless Billing and Phone Services are available.Class Imbalance It is clearly visible that there is a huge difference between the two classes (customers who stayed and the customers who left the company) one is the majority class and the other one is minority.The challenge here that we can face with such a imbalanced data is that most of the classification techniques will not consider the minority class (customers who left), and in turn show poor prediction.Here we will use one approach to address the problem SMOTE. SMOTE (Synthetic Minority Oversampling Technique) is an oversampling technique used to create synthetic samples for the minority class instead of creating copies. We will be using the from imblearn.over_sampling import SMOTE python library. The method chooses two or more comparable examples (through a distance measure) and perturbs one characteristic at a time by a random amount within the difference between the surrounding examples.The last thing we have to do is to split and scale the dataset, In splitting we will split the data into training samples and testing samples randomly and in scaling we solely normalise continuous data and leave dummy variables alone. We also apply the min-max scaler to those continuous variables, giving them the identical minimum of zero, maximum of one, and range of one. (Correlation Heatmap) Correlation heatmap is shown in the below figure it helps us to depict the relations between different variables.And also I have plotted histogram and scatter plot between variables and their relations with the target variables.We can observe that, in general, clients that desire to quit (churn = 'Yes') are new clients (low tenure 15 months, and hence low TotalCharges) with high MonthlyCharges > 65$/month. Because there is no linear relationship between tenure and TotalCharges, additional fees must be determined.( MACHINE LEARNING CLASSIFICATION TECHNIQUES) are used such as Support Vector Machine, K-Nearest Neighbor, Random Forest where Random Forest is the most perdicted accuracy model with 83.2%. random forest classification method we can get the best prediction for the customers leaving the telecom company.
Stars
2
Forks
0
Watchers
2
Open Issues
0
Overall repository health assessment
No package.json found
This might not be a Node.js project
1
commits