Predictive analytics 101-3 : Linear regression with R
WHAT CAN YOU LEARN IN THIS LECTURE
In this lecture, I would like to focus on how to obtain parameters or weights. Once the parameters are obtained, we can predict the values of the target that we want to predict in order to make better business decisions. The target is "the price of wine" here. Statistical computing language "R" is used to calculate the value of parameters. Let us start now!
PARAMETERS SHOULD BE OBTAINED TO PREDICT value of the target, but How?
1. Recall what parameters are
The last lecture, I explained parameters. Parameters are weights of each corresponding feature. Please look at this chart below again. θ is a parameter, x are features and y is a target. In this lecture, the target is the price of wine.
2. Collecting data
As I explained the last lecture, data is very important in practice. Here is the data used in this analysis. I explain what data is. Please look at the data below. There are 38 observations. According to the web site,
3. Learning data in the model
Once data is prepared, I input the data into my model in R so that computers can learn the data. This time, I use linear regression model, which is the simplest statistical model. To do that, I write only one line in R!
lm(LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV, data =wine)
This image is RStudio, one of the famous Integrated Development Environment for R. Upper left side of
4. Obtain the parameters
In order to predict wine price, parameters should be obtained firstly. There is no need to worry about. R can calculate this automatically! The result is as follows. Coefficients mean parameters. You can see this result in lower left side of
(Intercept) WRAIN DEGREES HRAIN TIME_SV
-12.145007 0.001167 0.616365 -0.003861 0.023850
It means that wine price can be presented with the model as follows
When new data are obtained, we can predict the price of wine by just inputting new data into this model above. It is easy to do that!
Predict the price of wine
Finally, we can predict wine price. As I said before, we do not know the price of wine after 1981. So let us predict them! We already obtained the model here
Let us predict wine price of 1986 with observation 35 of the data.
Then just input each of them into the model.
This value is represented as log
It means that wine price of 1986 may be just 11% of wine price of 1961, which is the base price.
You can see the predictions of wine price of all observations in the chart at lower right side of RStuido. This calculation is also done automatically with a few lines of code.
This graph shows the comparison between the predictions of wine price and real price of wine. Red square tells us predictions of wine price and blue circle tells the real price of wine. These are relative prices against an average price in 1961. So the real price of 1961 is 1.0 as it is the base price. It seems that the model works well. Of course it may not work in the current situation as it was made more than 20 years ago. But it is good to learn how models work and predictions are made by data. Once you can understand the linear regression model, it enables you to understand other complex models with ease. I hope you enjoy prediction of wine price in this lecture. OK, let us move on another powerful model in the next lecture!
September 20, 2015
Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.