Predictive analytics 101- 5 : Logistic regression with R

WHAT CAN YOU LEARN IN THIS LECTURE

In this lecture, I would like to focus on how to obtain parameters or weights. Once the parameters are obtained, we can predict the values of the target that we want to predict in order to make better business decisions. The target is "probability of defaults for each customer" here.  Statistical computing language "R" is used to calculate the value of parameters. Let us start now!

 

PARAMETERS SHOULD BE OBTAINED TO PREDICT VALUE OF probability of default, BUT HOW?

1. Recall what parameters are

The last lecture, I explained parameters. Parameters are weights of each corresponding feature. Please look at this chart again. θ is a parameter, x is feature and y is target. In this lecture, the target is the probability of defaults (PD). 

                                                                      Formulars of "Logistic regression model"

                                                                    Formulars of "Logistic regression model"

2. Collecting data

As I explained the last lecture, data is very important in practice. Here is the data used in this analysis. This is the same as I I explained in the last lecture. Once data is collected, I input the data into R so that computers can learn the data. I name it "datapd".

                                            Data collected to obtain the parameters

                                           Data collected to obtain the parameters

 

3.  Learning data in the model

Let us make computers learn the data so that parameters can  be obtained. This time, I use logistic regression model, which is one of the most widely used statistical models.  To do that, I write only one line in R!

                                                                                            R Script to obtain the parameters

                                                                                           R Script to obtain the parameters

"glm" means "generalized linear model" which includes "logistic regression model". So all you have to do is just typing "one line program" above!

 

 

4. Obtain the parameters

In order to predict probability of default,  parameters should be obtained firstly.  There is no need to worry about.  R can calculate this automatically!  The result is as follows. This is the output from R.  Coefficients mean parameters. 

                                                                          The parameters and other metrics obtained by R

                                                                         The parameters and other metrics obtained by R

 

 

PREDICT PROBABILITY OF DEFAULT

Finally, we can predict probably of default. All you have to do is just typing two lines below.  The calculation is done by computers automatically

                                       R Script to obtain predictions of probability of default and the result

                                      R Script to obtain predictions of probability of default and the result

The numbers above are probability of default predicted by R automatically.  Let us compare them to the predictions which are calculated manually in the last lecture.  These "manual" predictions are presented at the last column "y:predicton" in the table below.

                                      Calculations for "probability of default" of each customer

                                    Calculations for "probability of default" of each customer

Could you see the probability of defaults for "Steeve"?  It is "0.00" in the table above.  It is in line with the prediction by computers, which is 4.536137e-11. Because 4.536137e-11 is close to 0.  Prediction for each customer is in line with corresponding predictions by computers. Could you confirm them by yourself? 

If you are interested in R scripts in details, you can refer this awesome site!

 

 I hope you enjoy prediction of probability of defaults in this lecture.  See you again!

 

October 18, 2015  :  The lecture is released

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.