Last week I had a business trip to Tokyo and held a seminar about data analysis for businessmen/women. There were around ten participants in this seminar and everyone is a young business men/women, not a data scientist. Title is “How to predict price of wine”. Based on the data about temperatures and amount of rain, price of wine may be predicted by using liner regression model. This analysis was explained in my blog before.

This seminar was about one and half hour. During the seminar I felt that every participant is interested in data analysis very much. I had a lot of questions about data analysis from them. I think they face problems with a lot of data on a daily basis. Unfortunately it is not easy to analyse data so that better business decisions can be made according to the results of analysis. I hope I can provide clues to solve the problems for participants.

In my seminar, I focused on liner regression model. There are three reasons why I choose this model.

1. It is the simplest model in data analysis.

It uses inner product of parameters and explanatory variables. This is very simple and easy to understand, however, it appears many times in statistical models and machine learning. Once participants can be familiar with inner product, they can apply it to more complex models.

2. Liner regression model can be basis in learning more complex models.

Although liner regression model is simple, it can be extended to more complex models. For example, logistic regression model uses same inner product as liner regression model. Structures of these two models are similar each other. Neural network can be expressed as layers of logistic regression model. Support vector machine uses inner product of parameters and explanatory variables, too.

3. Method to obtain parameters can be expanded to more complex models.

Obtaining parameters is a key point to use statistical models effectively. In order to do that, least squares method is used in liner regression model. This method can be expanded to Maximum likelihood estimation, which is used in logistic regression model.

If you are a beginner of data analysis, I would like to recommend to learn liner regression model as the first step of data analysis. Once you can understand liner regression model, it enable you to understand more complex models. Anyway, let us start learning liner regression model. Good luck!