Predictive analytics 101- 5
: Logistic regression with R
WHAT CAN YOU LEARN IN THIS LECTURE
In this lecture, I would like to focus on how to obtain parameters or weights. Once the parameters are obtained, we can predict the values of the target that we want to predict in order to make better business decisions. The target is "probability of defaults for each customer" here. Statistical computing language "R" is used to calculate the value of parameters. Let us start now!
PARAMETERS SHOULD BE OBTAINED TO PREDICT VALUE OF probability of default, BUT HOW?
1. Recall what parameters are
The last lecture, I explained parameters. Parameters are weights of each corresponding feature. Please look at this chart again. θ is a parameter, x is feature and y is target. In this lecture, the target is the probability of defaults (PD).
2. Collecting data
As I explained the last lecture, data is very important in practice. Here is the data used in this analysis. This is the same as I I explained in the last lecture. Once data is collected, I input the data into R so that computers can learn the data. I name it "datapd".
3. Learning data in the model
Let us make computers learn the data so that parameters can be obtained. This time, I use logistic regression model, which is one of the most widely used statistical models. To do that, I write only one line in R!
4. Obtain the parameters
In order to predict probability of default, parameters should be obtained firstly. There is no need to worry about. R can calculate this automatically! The result is as follows. This is the output from R. Coefficients mean parameters.
PREDICT PROBABILITY OF DEFAULT
Finally, we can predict probably
The numbers above are probability of default predicted by R automatically. Let us compare them
Could you see the probability of defaults for "Steeve"? It is "0.00" in the table above. It is in line with the prediction by computers, which is 4.536137e-11. Because 4.536137e-11 is close to 0. Prediction for each customer is in line with corresponding predictions by computers. Could you confirm them by yourself?
If you are interested in R scripts in details, you can refer this awesome site!
I hope you enjoy prediction of probability of defaults in this lecture. See you again!
October 18, 2015
Notice: TOSHI STATS SDN. BHD.