Predictive analytics 101-3 : Linear regression with R


In this lecture, I would like to focus on how to obtain parameters or weights. Once the parameters are obtained, we can predict the values of the target that we want to predict in order to make better business decisions. The target is "the price of wine" here.  Statistical computing language "R" is used to calculate the value of parameters. Let us start now!



1. Recall what parameters are

The last lecture, I explained parameters. Parameters are weights of each corresponding feature. Please look at this chart below again. θ is a parameter, x are features and y is a target. In this lecture, the target is the price of wine. 

                                                                                                                     Linear regression model

                                                                                                                    Linear regression model

2. Collecting data

As I explained the last lecture, data is very important in practice. Here is the data used in this analysis. I explain what data is. Please look at the data below. There are 38 observations. According to the web site, Y is the price of wine(1). X1 is the amount of rain (Oct-March. ML), x2 is the average temperature (April-September, degree), x3 is the amount of rain in the harvest season (August and September. ML). x4 is time since 1983 (Years, Years after 1983 have -).  We do not know the price of wine after 1981. So let us predict them!     (Note (1) : LPRICE2 is the logarithm of Average Vintage Price Relative to 1961)

3.  Learning data in the model

Once data is prepared, I input the data into my model in R so that computers can learn the data. This time, I use linear regression model, which is the simplest statistical model.  To do that, I write only one line in R!

lm(LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV, data =wine)

This image is RStudio, one of the famous Integrated Development Environment for R.   Upper left side of RStudio, I develope the function with a linear regression model “lm” and data from the web site is input into the model. (In the orange box)


4. Obtain the parameters

In order to predict wine price,  parameters should be obtained firstly.  There is no need to worry about.  R can calculate this automatically! The result is as follows. Coefficients mean parameters. You can see this result in lower left side of RStudio (In the red box).


(Intercept)     WRAIN    DEGREES   HRAIN    TIME_SV
-12.145007   0.001167   0.616365   -0.003861   0.023850


It means that wine price can be presented with the model as follows

wine price=-12.1450077+0.001167×amount of rain+ 0.616365×average temperature‐ 0.003861×amount of rain in harvest season + 0.02385×years from 1983

When new data are obtained,  we can predict the price of wine by just inputting new data into this model above. It is easy to do that!



Predict the price of wine

Finally, we can predict wine price. As I said before, we do not know the price of wine after 1981. So let us predict them!  We already obtained the model here

wine price=-12.1450077+0.001167×amount of rain+ 0.616365×average temperature‐ 0.003861×amount of rain in harvest season + 0.02385×years from 1983


Let us predict wine price of 1986 with observation 35 of the data.

amount of rain                                563

average temperature               16.2833

amount of rain in harvest season     171

years from 1983                              -3


Then just input each of them into the model.

wine price of 1986=-12.1450077+0.001167×563+ 0.616365×16.2833‐ 0.003861×171+ 0.02385×(-3)=-2.183311496

This value is represented as log(value) so let us remove log by using exponential. 

wine price of 1986 = exp(-2.183311496)=0.1126678132

It means that wine price of 1986 may be just 11% of wine price of 1961, which is the base price.  


You can see the predictions of wine price of all observations in the chart at lower right side of RStuido.  This calculation is also done  automatically with a few lines of code.


This graph shows the comparison between the predictions of wine price and real price of wine.  Red square tells us predictions of wine price and blue circle tells the real price of wine. These are relative prices against an average price in 1961.  So the real price of 1961 is 1.0 as it is the base price.  It seems that the model works well.  Of course it may not work in the current situation as it was made more than 20 years ago. But it is good to learn how models work and predictions are made by data. Once you can understand the linear regression model, it enables you to understand other complex models with ease. I hope you enjoy prediction of wine price in this lecture. OK, let us move on another powerful model in the next lecture!


September 20, 2015  :  The lecture is released



Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.