Predictive analytics 101- 5 : Logistic regression with R
WHAT CAN YOU LEARN IN THIS LECTURE
In this lecture, I would like to focus on how to obtain parameters or weights. Once the parameters are obtained, we can predict the values of the target that we want to predict in order to make better business decisions. The target is "probability of defaults for each customer" here. Statistical computing language "R" is used to calculate the value of parameters. Let us start now!
PARAMETERS SHOULD BE OBTAINED TO PREDICT VALUE OF probability of default, BUT HOW?
1. Recall what parameters are
The last lecture, I explained parameters. Parameters are weights of each corresponding feature. Please look at this chart again. θ is a parameter, x is feature and y is target. In this lecture, the target is the probability of defaults (PD).
Formulars of "Logistic regression model"
2. Collecting data
As I explained the last lecture, data is very important in practice. Here is the data used in this analysis. This is the same as I I explained in the last lecture. Once data is collected, I input the data into R so that computers can learn the data. I name it "datapd".
Data collected to obtain the parameters
3. Learning data in the model
Let us make computers learn the data so that parameters can be obtained. This time, I use logistic regression model, which is one of the most widely used statistical models. To do that, I write only one line in R!
R Script to obtain the parameters
"
4. Obtain the parameters
In order to predict probability of default, parameters should be obtained firstly. There is no need to worry about. R can calculate this automatically! The result is as follows. This is the output from R. Coefficients mean parameters.
The parameters and other metrics obtained by R
PREDICT PROBABILITY OF DEFAULT
Finally, we can predict probably
R Script to obtain predictions of probability of default and the result
The numbers above are probability of default predicted by R automatically. Let us compare them
Calculations for "probability of default" of each customer
Could you see the probability of defaults for "Steeve"? It is "0.00" in the table above. It is in line with the prediction by computers, which is 4.536137e-11. Because 4.536137e-11 is close to 0. Prediction for each customer is in line with corresponding predictions by computers. Could you confirm them by yourself?
If you are interested in R scripts in details, you can refer this awesome site!
I hope you enjoy prediction of probability of defaults in this lecture. See you again!
October 18, 2015
Notice: TOSHI STATS SDN. BHD.