Before discussing the models in details, It is good to explain how models work in general, so that beginners for data analysis can understand models. I select one of the famous research of wine quality and price by Orley Ashenfelter , a professor of economics at Princeton university. You can look at the details of the analysis on this site. This is “BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER”. I calculated them by myself in order to explain how models work in data analysis.
1. Gathering data
Quality and price of wine are closely related to the quality of the grapes. So it is worth considering what factors impact the quality of the grapes. For example, Temperatures, quantities of rain, the skill of the farmers, the quality of vineyards may be candidates of the factors. Historical data of each factor for more than 40 years are needed in this analysis. It is sometimes very important in practice, whether data is available for longer periods. Here is the data used in this analysis. So you can do it by yourself if you want to.
2. Put data into models
Once data is prepared, I input the data into my model in R. This time, I use linear regression model, which is one of the simplest models. This model can be expressed by the products of explanatory variables and parameters. According to the web sites, explanatory variable as follows
WRAIN:Winter (Oct.-March) Rain ML DEGREES:Average Temperature (Deg Cent.) April-Sept. HRAIN:Harvest (August and Sept.) ML TIME_SV:Time since Vintage (Years)
This is RStudio, famous Integrated Development Environment for R. Upper left side ofRStudio, I developed the function with linear regression “lm” with R and data from the web site is input into the model.
3. Examine the outputs from models
In order to predict wine price, parameters should be obtained firstly. There is no need to worry about. R can calculate this automatically. The result is as follows. Coefficients mean parameters here. You can see this result in lower left side of RStudio.
(Intercept) WRAIN DEGREES HRAIN TIME_SV
-12.145007 0.001167 0.616365 -0.003861 0.023850
Finally, we can predict wine price. You can see the predictions of wine price in lower right side of RStuido.
This graph shows the comparison between the predictions of wine price and real price of wine. Red square tells us predictions of price and Blue circle tells real price. These are relative prices against an average price in 1961. So the real price in 1961 is 1.0. It seems that the model works well. Of course it may not work now as it was made more than 20 years ago. But it is good to learn how models work and predictions are made by this research. Once you can understand the linear regression model, it enables you to understand other complex models with ease. I hope you can enjoy prediction of wine price. OK, let us move on recommender engines again next week !
Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.