Predictive analytics 101-2 : Linear regression : prediction of wine price


I would like to explain the simplest statistical model called "Linear regression model" and how to predict the price of wine with linear regression. In this lecture, the basic structure of linear regression is presented and in the next lecture, how to program this model by "R" language will be explained. Less math is used so that beginners for data analysis can understand this model easily.  Let us start now!



Could you remember target, features and statistical model which I explained in last lecture?  If you could not, please go back to last lecture, introduction as they are critically important for us.  In this lecture, I would like to expand them a little. "Linear regression model" can be explained in the chart below.  Target is y.  Features are x1, x2 and x3. A statistical model is "Linear regression model" in this lecture. You might be wondering what "Linear regression model" is exactly.

                                                                                              Linear regression model

                                                                                              Linear regression model

Parameters should be obtained to predict the target

θ is new for us.  θ is called "parameter" or "weight" of each feature. As you see the formula, each θ is multiplied with corresponding feature x and all values are added to obtain predictions of the target y.  In other words, features x are inputs and weighted with "parameters θ".  This is "Linear regression model". Values of θ are unknown initially. We should know what values of θ are. Once value of each θ is obtained, we can get the predictions of the target. Therefore parameter θ is critically important to obtain the accurate predictions of the target. Most of our efforts about calculations will go to obtain values of parameters θ in practice. Some of you might look at parameters for the first time. It is beneficial to be familiar with how parameter θ works because more advance models also have parameters θ.  θ are usually obtained by using massive amount of data. That is why collecting data is very important in practice. Next lecture, I would like to explain how to calculate θ in details.

Let us start predicting wine price

Now I would like to follow the famous story of wine price prediction by one of famous economists. He is Orley Ashenfelter, a professor of economics at Princeton university.  This is “BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER”.  You can look at the details of the analysis on this site. Let us find out how to predict the price of wine!

Collecting data

Quality and price of wine are closely related to the quality of the grapes.  So it is worth considering what factors impact the quality of the grapes.  For example, temperatures, quantities of rain, the skill of the farmers, the quality of vineyards may be candidates of the factors. Historical data of each factor for more than 40 years are needed in this analysis. It is sometimes very important in practice, whether data is available for longer periods. Here is the data used in this analysis. I would like to use this data in this course. So you can do it by yourself if you want to. 

In this lecture,  I hope you understand what parameter θ is and how important it is in practice. Next lecture, I will calculate values of θ by data collected and obtain predictions of the price of wine. See you again!


September 6 2015  :  The lecture is released



Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.