Big data

Can you win Atari games against computers? It seems to be impossible anymore.

I think it is better to watch the youtube of interview here first. Onstage at TED2014, Charlie Rose interviews Google CEO Larry Page about his far-off vision for the company.  Page talks through the company’s recent acquisition of Deep Mind, an AI that is learning some surprising.  At the time of 2 minutes 30 seconds in his interview,  he talks about DeepMind for two minutes.


According to white paper from DeepMind which were bought by Google at 650m USD in Jan 2014,  in three games of Atari 2600, Breakout, Enduro, Pong,  human can not win against computers after computer learns how each game works for a couple of hours.  There is only one same program prepared for each game and there is no input about how to win the specific game in advance.  It means that only one program should learn how to obtain high score from scratch by itself.  At the result of six games,  computers could record higher score than human experts in three games. It is amazing.

Reinforcement learning, one of machine learning, is used in this challenge. It is different form machine learning used in image recognition and natural language processing.  In reinforcement learning,  reward functions are used to decide what the best policy among many choices in the long run.  We can say in short “how much we should give up today’s lunch,  in order to maximize total sum of lunches tomorrow and later”. We always face this kind of problems but it is difficult for computers to answer.  However DeepMind proved reinforcement learning works well against this kind of problems when they presented the demo at the end of 2013.


If this kind of decision-making is available by computers, it will give huge impacts to intellectual jobs, such as lawyers, fund managers, analysts and cooperate officers because they make decisions in long-term horizon, rather than outcomes in tomorrow. They have a lot of experiences in the past, some of  them are successes and others are failures, they can use these experiences when they make a plan for the future.  If computers can use same logic as human and make decisions by themselves, it can be a revolution for intelligent job.  For example, at board meetings in companies, computers may answer questions about management strategies from board members based on the massive amount of past examples and tell them how to maximize future cash flow by using reinforcement learning.  Future cash flow is the most important thing to board members because share holders require to maximize it.


Currently a lot of discussions about our future jobs are going on because it is probable that many jobs will be replaced by computers in near future. If reinforcement learning have been improved, CEO of companies might be replaced by computers and share holders might welcome for them in future ?!

IBM Watson Analytics works well for business managers

IBM Watson Analytics was released at 4th Dec 2014.  This is new service where data analysis can be done with conversations and no programming is needed.  I am very interested in this service so I opened my account of IBM Watson Analytics and reviewed it for a week. I would like to make sure how this service works and whether it is good for business manager with no data analysis expertise. Here is a report for that.


I think IBM Watson Analytics is good for beginners of data analysis because it is easy to visualize data and we can do predictive analysis without programming the codes. I used the data which includes  score of exam1, exam2 and results of admission.  This data can be obtained at Exercise 2 of Machine Learning at coursera.  Here is the chart drawn by IBM Watson Analytics. In order to draw this chart, All have to do is uploading data, write or choose “what is the relationship between Exam1 and Exam2 by result”, and adjust some options in red box below. In the chart,  green point means ‘admitted’ and blue point means ‘not admitted’. Therefore it enable us to understand what the data means easily.


Let us move on prediction.  We can analyze data in details here because statistical models are running behind it.  I decided “result” is a target in this analysis.   This target is categorical as it includes only “1:admitted and 0:not admitted” so logistic regression model, which is one of the classification analysis, is chosen automatically by IBM Watson Analytics.  Here is the results of this analysis. In the red box, explanations about this analysis is presented automatically. According to the matrix about score of each exam, we can estimate probability of admission. It is good for business manager as this kind of analysis usually requires  programming with R or MATLAB, python.


In my view, logistic regression is the first model to learn classification because it is easy to understand and can be applied to a lot of fields. For example I used this model to analyze how the counter parties are likely to be in default when I worked at financial industries.  For marketing,  the target can be interpreted as buy the product or not.  For maintenance of machines,  the target can be interpreted as normal or fail. The more data are corrected, the more we can apply this classification analysis to. I hope many business managers can be familiar with logistic regression by using IBM Watson Analytics.


IBM Watson Analytics has just started now so improvements may be needed to make the service better. However, it is also true that business manager can analyze data without programming by using IBM Watson Analytics.  I would like to highly appreciate the efforts made by IBM.



Note:IBM, IBM Watson Analytics, the IBM logo are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. 

What is the best language for data analysis in 2015 ?


RedMonk issued the raking about popularity of programming languages. This research is conducted periodically since 2010. This chart below is coming from this research. Although general purpose languages such as JavaScript occupy top 10 ranking,  statistical language is getting popular.  R is ranked 13th and MATLAB is ranked 16th. I have used MATLAB since 2001 and R since 2013 and currently study JavaScript. Then I found that the deference between R, which is statistical language, and other general purpose languages. Let us consider it in details and good way to learn statistical languages such as R and MATLAB.

1.  R focuses on data

Because R is a statistical language,  it focuses on data to be analyzed.  These data are handled in R as vectors and matrices. Unlike JavaScript, there is no need to define variables to handle data in R. There is no need to distinguish between scalar and vector, either.  So it is easy to start analyzing data with R, especially for beginners. Therefore I think the best way to learn R is to be familiar with vectors and matrices because data is represented as vectors or matrices in R.


2.  R has a lot of functions to analyze data

R has a lot of functions because many professionals contribute to develop statistical models with R. Currently there are more than 7000 functions, which are called “R package”. This is one of the biggest advantages to learn R for data analysis. If you are interested in “liner regression model” , which is the most simple model to predict price of services and goods,  all you have to do is just writing command “lm” then R can output the parameters so that predictions of prices can be obtained.


3. R is easy to visualize data

If you would like to draw the graph,  all you have to do is to write the code ‘plot’ then simple graph appears on the screen.  When there are a lot of series of data and you would like to know relationship among each of them and other,  all you have to do is to write the code ‘pairs’ then a lot of scatter charts appear so that we can understand the relationship among each of them.  Please look at the example of charts by “pairs”.

R is open source and free to anyone. However MATLAB is proprietary software.  It means that you should buy licenses of MATLAB if you would like to use it. But do not worry about that. Octave, which is similar to MATLAB, is available without license fee as an open source software.  I recommend you to use R or Octave for beginners of data analysis because there is no need to pay any fee.

Going forward, R must be more popular in programming languages. It is available for everyone without any cost.  R is introduced as a major language for data analysis in my company and I would recommend all of you to learn R as I do.  Is it fun, isn’t it?


Mobile and Machine learning can be good friends in 2015 !

Number of mobile devices will be increasing drastically in the emerging markets in 2015. One of the biggest reason why it is increasing is that good smart phones are affordable because of competitions among the suppliers such as Google, Samsun and Xiaomi.  It is good for people in the emerging countries because a lot of people can have their own personal devices and enjoy the same internet life as people in developed countries do. I hope everyone all over the world will be able to be connected to the internet in near future.

Not only the number of mobile devices but the quality of its services will be changed dramatically in 2015 because machine learning will be available for commercial purpose. Let us consider this change more details. The key idea behind this is “Shift from judgement by ourselves to judgement by machines”.


1.  Machine Learning

Machine Learning has a power to change every industry. With machine learning,  computers can identify certain objects in images and video,  understand conversations with us and read the documents written in natural languages.  It means that most of information around us can be interpreted by computers.  Not only numerical data but also other kinds of information are understood by computers.  This changes landscape of every industry completely.  Computers can make business decisions and all we have to do is just to monitor it.  It already happened in the field of assessing credit worthiness of the customers  in banks many years ago.  Same things will happen in all industries near future.


2. Data

In emerging markets, more and more mobile phones will be sold so that every person might own his or her device in near future. It means that people all over the world will be connected through the internet and more information are collected in real-time basis.  In addition to that a lot of automobiles, homes and parts are also connected through the internet and send the information in real-time basis, too.  Therefore we can realize when and where they are and what condition of each is in real-time basis.  So maintenance for parts will be done as soon as it is needed and optimizations of resources used by people can be achieved as we can get such information in real-time basis.


3. Output

Output from computers will be sent to mobile devices of each responsible personnel  in real-time basis. So there is no need to stay in office during working-time as we can be notified wherever we are. It raises productivity of our jobs a lot. No need to wait for notifications of outputs from computers in office anymore.


Yes, my company is highly interested in the progress of machine learning for the commercial purpose. I would like to watch it closely.  I also would like to develop new services based on machine learning on mobile devices going forward.

Can we talk to computers without programming language?

IBM announced that Watson analytics provides us data analysis and visualization as a service without programming at 4th Dec 2014. It said that “breakthrough natural language-based cognitive service that can provide instant access to powerful predictive and visual analytic tools for businesses, is available in beta”.  Let us consider what kind of impacts IBM Watson analytics provides us.


Watson analytics is good at doing natural language processing.  For example,  if doctors ask Watson analytics how to cure the disease, Watson analytics understand the questions from doctors, research massive data and answer the questions. There is no need to program codes by doctors. It means that we may change from “we should learn computer programming” to “we should know how to have a conversation with computers”.  It may enable a lot of non-programming persons to use computers effectively.

In addition to that,  Watson analytics is also good at handling unstructured data.  These data include text, image, voice and video.  Therefore Watson analytics can analyze e-mail, social media contents, pictures taken by consumers.  So It may be possible to recommend what we should eat at restaurants by taking pictures of menus there, because computers have our health data and they can choose the best meals for our health by analyzing the pictures of menus.

In terms of algorithm,  these functionalities above can be achieved by machine learning.  So the more people start using this service, the more accurate answers by computers are because computers learn from a lot of data and are getting better.


IBM Watson analytics may change the landscape of every industry.  Traditionally data analysis can be executed by data scientists, using numerical data and programming languages. However this new kind of data analysis by IBM Watson analytics,  data analysis can be executed by businessmen/women, using e-mail, pictures and video and natural languages.  Machine translation from one language to another will be also available therefore there are less language barrier going forward.  This must be democratization for data analysis. It is exciting when it happens in 2015 !


Note:IBM, the IBM logo are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. 

Mobile services will be enhanced by machine learning dramatically in 2015, part 2

Happy new year !   At the beginning of 2015,  it is a good time to consider what will happen in the fields of machine learning and mobile services in 2015.  Followed by the blog last week,  we consider recommender systems and internet of things as well as investment technologies. I hope you can enjoy it !


3. Recommender systems

Recommender systems are widely used from big companies such as and small and medium-sized companies.  Going forward,  as image recognition technology progresses rapidly, consumer generated data such as pictures and videos must be taken to analyze consumers behaviors and construct consumers preferences effectively.  It means that unstructured data can be taken and analyzed by machine learning in order to make recommendations more accurate. This creates a virtuous cycle. More people take pictures by smartphones and send them thorough the internet, more accurate recommendations are.  It is one of the good examples of personalization. In 2015 a lot of mobile services have functions for personalization so that everyone can be satisfied with mobile services.


4. Internet of things

This is also one of big theme of the internet.  As sensors are smaller and cheaper,  a lot of devices and equipments from smart phone to automobile have more sensors in it. These sensors are connected to the internet and send data in real-time basis.  It will change the way to maintain equipments completely.  If fuel consumption efficiency of your car is getting worse, it may be caused by failure of engines so maintenance will be needed as soon as possible. By using classification algorithm of machine learning, it must be possible to predict fatal failure of automobiles, trains and even homes.  All notifications will be sent to smartphones in real-time basis. It leads to green society as efficiency are increasing in terms of energy consumption and emission control.


5. Investment technology

I have rarely heard that new technologies will be introduced in investment and asset management in 2014 as far as I concerned.  However I imagine that some of fin-tech companies might use reinforcement learning, one of the categories of machine leaning.  Unlike the image recognition and machine translation, right answers are not so clear in the fields of investment and asset management. It might be solved by reinforcement learning  in practice in order to apply machine learning into this field. Of course, the results of analysis must be sent to smart phone in real-time basis to support investment decisions.


Mobile services will be enhanced in 2015 dramatically because machine learning technologies are connected to mobile phone of each customer. Mobile service with machine learning will change the landscape of each industries sooner rather than later. Congratulations!

Mobile services will be enhanced by machine learning dramatically in 2015


Merry Christmas !  The end of 2014 is approaching.  It is a good time to consider what will happen in the fields of machine learning and mobile services in 2015.  This week we consider machine translation and image recognition,  next week recommender systems and internet of things as well as mobile services by machine leaning. I hope you can enjoy it !


1.  Machine translation / Text mining

Skype is a top innovator in this fields.   Microsoft already announced that machine translation between English and Spanish is available by Skype. So in 2015,  it would be possible to translate between English and other languages. Text translation is also available among 40 languages in its chat service.  So language barrier are getting lower and lower.  It is still difficult to answer to questions by computers automatically.  But it is also gradually improved.  Mizuho bank announced that it will use IBM Watson, one of the famous artificial intelligence to assist call center operators.  These technologies make global service to be developed more easily as manuscripts and frequent Q&A are translated from the language to another automatically.  I love that because my educational programs can be expanded to all over the world!


2. Image recognition

Since computers identified the image of cats automatically by deep learning, images recognition technology progresses dramatically.  Soft bank announced that Pepper, new robot for consumers, will be able to read human emotions. In my view, the most important factor to read emotions must be image recognition of  human facial expressions. Pepper could be very good at doing this therefore it can read human emotions.  Image recognition technology is very good for us as each smart phone has a nice camera and it is easy for people to take pictures and send them to clouds and social media.  Image recognition can enable us to analyze massive amount of images, which are sent through internet. That data must be a treasure for us.


These machine learning technologies must be connected to mobile phone of each customer in 2015. It means that mobile services are enhanced by machine learning dramatically. All information around us will be collected through internet and send to machine learning in real-time basis and machine learning will return the best answer for individuals. This will be standard model of mobile services as speed of calculation and communication are increasing rapidly.

Next week we consider recommender systems,  internet of things and investment technology.  See you next week!

Financial industry and artificial intelligence

UBS announced that it will deliver personalized advice to the bank’s wealthy clients by using artificial intelligence (AI). UBS plans to roll out a digital service in Asia next April.

I think this is one of the example for financial institutions to go to “digital personalized marketing”  by artificial intelligence.  In future  personalized services by AI are one of the key strategic technologies in the financial industry. Let us consider how artificial intelligence are implemented and used in marketing of  financial industries more details.


1. data

This is a basis for the analysis to predict what financial products customers want.  According to this article about UBS, in the presentation by founders of Sqreem, they said that they crawl through a wide range of openly available, unstructured data. I would like to explain unstructured data. It means the data is not organized in a database as we usually see. So I assume massive amount data could be gathered automatically.  Data might be gathered in real-time basis so final outputs such as recommendations also might be provided in real-time basis. It is a dynamic process, rather than a static process.


2. algorithm

There is no disclosure about how calculations are done in details as far as I know. So this is my assumptions based on the article.  This might be one of the recommender systems. As the article says, this focuses on the behavior of customers.  Behavior of customers could be identified in deeper level and precise recommendations to individual customers could be  provided effectively.  In my thought,  this system might be on-line learning system, too. It means that algorithm could learn new things by themselves, could be updated based on stream data in real-time basis and adjust the change of customers’ preferences.


3. output

This is also my assumptions based on the article. The articles mentioned mobile phones and other digital devices.   I think recommendations might be mainly provided to individual customers through their mobile phones. Mobile phones could be personal interface against banks and financial institutions.  One of the biggest advantage of mobile phones is that customers preference could be gathered through interaction between customers and banks without any official inquiry to the customers.


This is not the end of story but the beginning of it.  As technology is progressed,  a lot of industries will try to introduce such kind of personalized recommender systems. This is marketing of digital era so that everyone can obtain the best products and services among a lot of choices. How wonderful it is !

What is singular value decomposition?

Last week I introduced inner product as a simple model in recommender systems. This week I would like to introduce more advanced model for recommender systems. It is called singular value decomposition.


According to Mining Massive datasets in Coursera, one of the best on-line courses about machine learning and big data,  singular value decomposition or SVD is defined as follows.

Matrix A=UΣV’

U : left singular matrix

Σ : singular matrix

V : right singular matrix

Row vectors and column vectors of matrix A can be transformed into lower dimensional space. This space is called “concept”. In other words row vectors and column vectors can be mapped to concept space, which has smaller dimensions than row and column vectors of matrix A. Strength of each concept is defined in singular matrix where diagonal values are positive. When SVD is applied to recommeder systems,  row vectors of matrix A can be customers’ preference and column vectors can be items features.  For example, movies can be classified as a SF movie or a romance movie, which are “concept”.   Each customer may like SF movies or romance movies. We can predict unknown rating for customers and items by using SVD.


SVD is also used for dimensionality reduction and advantages of  SVP are as follows.

1.  find hidden correlations

2.  make visualization of data easier

3.  reduce the amount of data


Therefore SVD can be applied to not only recommender system but other kinds of business applications.


Let us see R to analyze data by singular value decomposition. R has a function of  singular value decomposition, SVD. Therefore we can execute singular value decomposition by just inputting data into function of svd() in R. IDE below is RStudio.

In this case,  matrix ss is decomposed into $d,$u and $v.

$u : left singular matrix

$d:  singular matrix

$v : right singular matrix

When we look at $d,  value of the first and second column are large, therefore we focus on the first concept and second concept.  In $u, the row vectors of ss are mapped to concept space.  In $v, the column vectors of ss are also mapped to same concept space.  Red rectangular and blue rectangular show similarity based on “concept”. I recommend you to try svd() to analyze data in R as it is very easy and effective.

SVD is a little complicated than inner product but it is very useful when there are a lot of data which has large dimensions. Let us be familiar with SVD because we would like to use this model going forward.



Disclaimer : TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Recommender engines and inner product

Last week, I introduced inner product of vectors as an essential tool for statistical models.  Let us apply inner product to recommender engines this week.


Could you remember a utility function?  Let me review it a little here. The utility function is expressed as follows.


U:utility of customers,  θ:customers’preferences,  x:Item features,  R:ratings of the items for the customers

As you know,  θ:customers’preferences,  x:Item features, both are vectors.  Let us take an example of movies. Movie features are expressed as follows.


A1: Science fiction movie

A2: Love romance movie

A3.: Historical movie

A4: US movie

A5: Japanese movie

A6: Hong Kong movie



First let us consider customer’s preferences. If you like some of the features of movies, assign 1 to the features.  If you like them very much, assign 2,  if you do not like it, just put 0 to the features.  I like Science fiction movie and US movie very much and like Japanese and Hong Kong movie,  while I do not like love romance movie and historical movie. These preferences can be expressed as a vector. My preference vector θ is [2,0,0,2,1,1] because A1=2, A2=0, A3=0,A4=2, A5=1, A6=1 according to my preference. I recommend you to make your own preference vector the same way as I did here.


Then let us move on to item features.  StarWars, A Chinese ghost story, Seven samurai and Titanic are taken as our selections of movies. Then what movies are recommended to me?

OK, let us make item feature vector of each movie. For example, if the movie is US movie, A4=1, A5=0, A6=0.

StarWars : x=[1,0,0,1,0,0]

A Chinese ghost story : x=[0,1,0,0,0,1]

Seven samurai : x=[0,0,1,0,1,0]

Titanic : x=[0,1,0,1,0,0]


Finally, let us calculate the value of the utility function for each movie. If the value is bigger, it means that I like this movie more and recommendations should be provided for me to watch the movie.  The value can be obtained by calculate inner product of  θ:customers’preferences and  x:Item features.  In StarWars case, the value of utility function is [2,0,0,2,1,1]*[1,0,0,1,0,0]’ = 4.


StarWars : U=4

Chinese ghost story : U=1

Seven samurai : U=1

Titanic : U=2


So the highest value goes to StarWars. So it should be recommended to me. the second is Titanic so it may be recommended. If you prepare your own preference vector, you can calculate the value of your utility functions and find what movie should be recommended to you !


Anyway this is one of the most simple model to calculate the value of utility for each movie. It uses inner product of vectors as I said before. Inner product can transform a lot of data into a single number. In this case, only six features are selected. Even thought number of features can be far more than six, inner product can transform a lot of data into a single number, which can be used for better business decisions!


The function of statistical models and inner product

Before we dive into liner regression model, let us consider functions of statistical models.  It is obvious that we are already surrounded by a lot of data,  web-logs, search engine query, location data from smartphones, and so on.  We cannot understand what they mean for us by just looking at them because they are massive of amount data. Then what should we do in order to understand them and make better business decisions?  We tend to lose our sights as massive data has too much information to us.  How can we reduce the dimensions of the data so that we can understand what they mean?


Here I would like to introduce inner product. It is sometimes called dot product. I would like to refer to the definition of inner product according to Wikipedia.

In mathematics, the dot product, or scalar product (or sometimes inner product in the context of Euclidean space), is an algebraic operation that takes two equal-length sequences of numbers (usually coordinate vectors) and returns a single number.

This “a single number” is very important for us because we can understand what “a number” means. By using inner product, we can put a lot data into a single number.  From 2 or 3 data to one million or billion data,  we can convert a lot of data into a single number.  Is it wonderful, isn’t it?  we can understand data if it is only a single number!

It is simple but we can apply it to a lot of statistical models.  For example, liner regression model,  logistic regression model, support vector machine, and so on.

Inner product has an essence of functions of statistical models.  It can convert a lot of data into a single number, which we can understand.  This is what we want because we are surrounded by a lot of data now!


So going forward, I would like to focus on inner product when new statistical models are introduced. It enable us to understand how statistical models work!  Especially for beginners of data analysis, I strongly recommend to get familiar with inner product. Then we can go to next phase and introduce liner regression model next week !

Last week I held a seminar in Tokyo. Everyone is active in learning data analysis.

Last week I had a business trip to Tokyo and held a seminar about data analysis for businessmen/women.  There were around ten participants in this seminar and  everyone is a young business men/women, not a data scientist.  Title is “How to predict price of wine”. Based on the data about temperatures and amount of rain, price of wine may be predicted by using liner regression model. This analysis was explained in my blog before.


This seminar was about one and half hour. During the seminar I felt that every participant is interested in data analysis very much.  I had a lot of questions about data analysis from them.  I think they face problems with a lot of data on a daily basis. Unfortunately it is not easy to analyse data so that better business decisions can be made according to the results of analysis. I hope I can provide clues to solve the problems for participants.


In my seminar, I focused on liner regression model. There are three reasons why I choose this model.

1.  It is the simplest model in data analysis.

It uses inner product of parameters and explanatory variables. This is very simple and easy to understand, however,  it appears many times in statistical models and machine learning. Once participants can be familiar with inner product, they can apply it to  more complex models.


2.  Liner regression model can be basis in learning more complex models.

Although liner regression model is simple, it can be extended to more complex models. For example, logistic regression model uses same inner product as liner regression model. Structures of these two models are similar each other. Neural network can be expressed as layers of logistic regression model.  Support vector machine uses inner product of parameters and explanatory variables, too.


3.  Method to obtain parameters can be expanded to more complex models.

Obtaining parameters is a key point to use statistical models effectively.  In order to do that, least squares method is used in liner regression model.  This method can be expanded to Maximum likelihood estimation, which is used in logistic regression model.


If you are a beginner of data analysis,  I would like to recommend to learn liner regression model as the first step of data analysis. Once you can understand liner regression model, it enable you to understand more complex models. Anyway, let us start learning liner regression model. Good luck!


Logistic regression model or Matrix factorization?

When I used to be a risk manager in financial industry,  I would like to use logistic regression model. This model is widely used to measure probability of defaults of counter parties.  So this model is very famous in the financial industry.  In the field of machine learing, this models is regarded as one of classifiers as it enable us to classify data based on the results of calculations.   Both numerical data and categorical data can be used in this model.  It is simple and flexible  so I want to use this model as our recommender engine.

In addition to that, I found that matrix factorization model is widely used in the industries currently.  It has been popular since this models had a good performance in the Netflix Prize competition in 2009.  Once we obtain the matrix which provides ratings according to users and items,  matrix factorization is applied to this matrix and divides it into two matrices, One is the matrix for users’ preferences and the other is items features. By using these two matrices, we can provide recommendations to users  even though users do not provide any ratings to the specific items. It is simple but very powerful to solve problems. This performance was proved in the Netflix Prize competition  in 2009.


When we have two models, there are two advantages as follows.

1  We can compare the results from each model each other.

By using same data, we can compare how each model provides recommendations effectively.  I think it is good because it is very difficult to evaluate how the model works well without comparison to other models.


2  We can combine two models into one model.

In practice, several models are sometimes combined into one model so that the results are more accurate compared with the results by just one model. For example, matrix factorization provides us features automatically,  These features may be used as inputs in logistic regression models. Liner product of each model is one of the methods of combining models as well.


Yes, we have two major models as our recommendation engines. So let us make them more accurate and effective going forward. The more we have experiences of developing models, the more recommendations by our models are accurate and effective. These models are expected to be implemented with R language, our primary tool for data analysis. It must be exciting!  Why don’t you join us?  you will be going to be an expert of recommender systems with this blog!


What is a utility function in recommender systems?

Let us go back to recommender systems as I did not mention last week.   Last month I found that customers’ preference and items features are key to provide recommendations. Then I started developing the model used in recommender systems.  Now I think I should explain the initial problem setting in recommender systems.  This week I looked at “Mining Massive datasets” in Coursera and I found that problem setting of recommender systems in this course is simple and easy to understand.  So I decided to follow this. If you are interested in this more detail,  I recommend to look at this course, excellent MOOCs in Coursera.


Let us introduce a utility function, which tells us how customers are satisfied with the items. The term of “utility function ” is coming from micro economics. So some of you may learn it before.  I think it is good to use a utility function here because we can use the method of economics when we analyze the impacts of recommender systems to our society going forward.  I hope  more people, who are not data-scientists, are getting interested in recommender systems.

The utility function is expressed as follows


U:utility of customers,  θ:customers’preferences,  x:Item features,  R:ratings of the items for the customers

This is simple and easy to understand what utility function is.  I would like to use this definition going forward. I think ratings may be one, two, three…, or it may be a continuous number according to recommender systems.

When we look at the simple models, such as linear regression model and logistic regression model,  Key metrics are explanatory variables or features and its weight or parameters. It is represented as x and θ respectively.  And product of θx shows us how much it has an impact on variables, which we want to predict. Therefore I would like to introduce θx as a critical part of my recommender engine.   ”θx” means that each x is multiplied to it’s correspondent weight θ and summing up all products .This is critically important for recommender systems. Mathematically θx is calculations of products of vectors/matrices. It is simple but has a strong power to provide recommendations effectively. I would like to develop my recommender engine by using θx next week.


Yes, we should consider what color of shirts maximize our utility functions, for example.  In futures, utility functions of every person might be stored in computers and recommendations might be provided automatically in order to maximize our utility functions. So everyone may be satisfied with everyday life. What a wonderful world it is!

I have started Nanodegrees in Udacity this week. Yes, I will develop my website by myself!

I have started Front-End web developer course of Nanodegrees in Udacity this week.  I would like to obtain the skills of front-end web development, such as a website and mobile service because I would like to develop websites and mobile services, which are backed by machine learning.  So I am going to  set up the prototype website on Microsoft Azure and use the visual studio online for writing codes of HTML, CSS and Java script.  When I learn methods to write the codes in Nanodegrees, I try to use these methods to develop the prototype website on Microsoft Azure.  I think it is good because I can learn the methods of wiring codes through Nanodegrees and develop my websites on Microsoft Azure at the same time.

As I said before, Nonodegrees focus on industries practices and applications for jobs.  It looks like an open training on the job.  It introduces a project based method, where participants should make several web sites by themselves according to instructions. I hope I can develop websites by writing HTML, CSS and Java script by the end of this course.

Actually, it is my first online course, which is required to pay for.  It costs 200 USD per month. I took more than 10 MOOCs (massive open online courses) in Coursera and Edex before.  Unlike Nanodegrees, these courses are free so I do not pay any fee at all. Most courses in Coursera and an Edex are provided by professors of the universities.  So Nanodegrees are contrasted to Coursera and Edex, which are major providers of MOOCs. I would like to explain what the difference is between Nanodegrees and other free courses going forward.

I want to make it a kind of parallel processing to develop websites and mobile services. When new methods of developing of websites and mobile services are provided through Nanodegrees,  I will deploy prototype websites on Microsoft Azure at the same time.  In addition to that,  the project to develop recommender engines is going on in my company and the prototype engine will be expected to be developed within this year.  This engine will be combined with the websites to enhance their services. I think it might be possible as Microsoft Azure has machine learning as a service.

This is a scheme to set up the platform to develop websites and mobile service backed by machine learning. Front-end developer course of Nanodegrees in Udacity might make it possible even for beginners like me. I hope this program keeps a high standard to provide skills and methods to participants so that everyone thinks it is worth paying fees to participate in this course.  I am sure Sebastian Thrun, CEO and cofounder ofUdacity makes it happen.

How about a recommender system for yourself? Computers know you better than you do!?

Since the beginning of SeptemberI have been considering recommender systems intensively.  Now I realize that recommender systems may know you better than you do because recommender systems can memorize your behavior, such as shopping, touring and learning as many as possible.  It is more than your memory in your brain as human being forget their memory as time passes.  The more data computers have, the more accurate the recommendations are. Now that more and more people have their own devices, such as smart phones/tablets, and use them everyday in their lives. It means that data on our personal behaviors is accumulated in computers every second, even though we do not realize that.  In the future, personal devices may provide us recommendations for every choice in our lives. What are the advantages and disadvantages of  recommendations in the future? Let me think about it for a while.



You can easily obtain what you want based on the recommendations. It may be something you cannot imagine even though computers know that for you.  When I go shopping to huge department stores, I sometimes get tired in finding what I want because there are too many goods in the department stores. In such cases, the recommendation is definitely a powerful tool if it is accurate. Information about products and services can be gathered from all over the world so recommendations may be about products from foreign countries. When a new kind of bread is introduced and appears in a bakery, computers analyze the factors of this bread, such as taste, price, appearance and calculate the metrics to provide recommendations for you. I would like to have this recommendation as I love breads as breakfast.



You may loss the opportunities to realize new taste and preference of yourself because computers can calculate your preference very accurately. When recommendations by computer never miss your expectations, you may feel no need to go outside recommendations.  It means that there is no challenge to go outside of your past behavior. If you like Japanese food, computers may provide you recommendations of Japanese food only. So you eat only Japanese food, never try other kind of foods.  But I think human being needs to go to outside it to create innovations and adventures in their lives. If that is the case, I may input random numbers into my personal device so that recommendations have some noises from my past behavior. I need a little challenge against my past behaviors as it makes my life more interesting, even if I do not know whether it works or not.


I imagine that, in 2040 each personal device such as a smart phone, a smart watch can hold massive data and carry calculation power in it.  So it may calculate your preference far better than you think.  In the morning,  you may find your most favorite bread, which you want to eat at breakfast,  on the dining table before you think about it.  This will be based on recommendations by your smart phone. It may know everything about you. It may be perfect. Is it wonderful, isn’t it?

How can we predict the price of wine by data analysis ?

Before discussing the models in details,  It is good to explain how models work in general, so that beginners for data analysis can understand models. I select one of the famous research of wine quality and price by  Orley Ashenfelter , a professor of economics at Princeton university. You can look at the details of the analysis on this site. This is “BORDEAUX WINE VINTAGE QUALITY AND THE WEATHER”. I calculated them by myself in order to explain how models work in data analysis.


1. Gathering data

Quality and price of wine are closely related to the quality of the grapes.  So it is worth considering what factors impact the quality of the grapes.  For example, Temperatures, quantities of rain, the skill of the farmers, the quality of vineyards may be candidates of the factors. Historical data of each factor for more than 40 years are needed in this analysis. It is sometimes very important in practice, whether data is available for longer periods. Here is the data used in this analysis.  So you can do it by yourself  if you want to.


2.  Put data into models

Once data is prepared, I input the data into my model in R. This time, I use linear regression model, which is  one of the simplest models.  This model can be expressed by the products of explanatory variables and parameters. According to the web sites,  explanatory variable as follows

 WRAIN:Winter (Oct.-March) Rain ML
 DEGREES:Average Temperature (Deg Cent.) April-Sept. 
 HRAIN:Harvest (August and Sept.) ML 
 TIME_SV:Time since Vintage (Years) 

This is RStudio, famous Integrated Development Environment for R.   Upper left side ofRStudio, I developed the function with linear regression “lm” with R and data from the web site is input into the model.

スクリーンショット 2014-09-30 14.05.37.png


3. Examine the outputs from models

In order to predict wine price,  parameters should be obtained firstly.  There is no need to worry about.  R can calculate this automatically. The result is as follows. Coefficients mean parameters here. You can see this result in lower left side of RStudio.


(Intercept)     WRAIN    DEGREES   HRAIN    TIME_SV
-12.145007   0.001167   0.616365   -0.003861   0.023850

Finally, we can predict wine price. You can see the predictions of wine price in lower right side of RStuido.

This graph shows the comparison between the predictions of wine price and real price of wine.  Red square tells us predictions of price and Blue circle tells real price. These are relative prices against an average price in 1961.  So the real price in 1961 is 1.0.  It seems that the model works well.  Of course it may not work now as it was made more than 20 years ago. But it is good to learn how models work and predictions are made by this research. Once you can understand the linear regression model, it enables you to understand other complex models with ease. I hope you can enjoy prediction of wine price. OK, let us move on recommender engines again next week !


Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

I think it is good to learn how models work on recommender systems!

For three weeks I have researched recommender systems by using many websites.  I am very surprised to see that a lot of documents, sites and videos are available to know how recommender systems work and what current topics are in order to improve them.  I especially focus on video lectures by Xavier Amatrian, working for Netflix as a Research/Engineering Director. He covered from basic methodologies of recommender system to the latest methods to achieve business objectives.  It is strongly recommended to see them if you are interested in recommender systems. When technical terms in the video lectures are difficult to understand,  I suggest you to look at  MOOCs by Dr.Andrew Ng at Stanford university in advance.  It enables you to understand the technicalities of  machine learning with ease because it provides broad knowledge of machine learning as I said before.

In my thought,  there are two major methods to calculate ratings for recommendations. One is matrix factorization and the other is neural network, even though a lot of other methods can be used in recommender engines.  The object of this project is to develop the model of recommender engine so that beginners for data analysis can develop the recommender engine by themselves.  Therefore neural network is out of my scope as it is too complex for beginners.  Matrix factorization must be good for learning to develop the models in recommender systems.  So I decided to focus on the method of matrix factorization in this project.


In my view, key points of modeling based on matrix factorization as follows

1. How are customers’ preferences represented in the models?

Someone like sweet things and someone do not.  Someone like love stories and someone do not.  Someone like rock ‘n’ roll and someone do not.  So  customers’ preferences look like vectors of the level of each preference. It is reasonable and easy to understand even for beginners.  Why don’t you make  a vector of your preferences for your favorite items?


2. How are items’ features represented in the models?

It may take time to prepare the features for each item.  The features of each item should be in line with customers’ preference as we discussed above.   So whether it is sweet or not?  Whether it is a love story or not?  Whether it is rock ‘n’ roll or not?  Items’ features also look like vectors as customers’ preferences do.  It is easy to understand, too.


3. How can we match between customers’ preferences and items’ features?

It is reasonable to make metrics by using products between customers’ preferences and items’ features.  If the customers’ preference has higher score and the items’ feature which is corresponding to the preference, also has a higher score,  products between the preference and the feature is also higher as each of them is higher. It means that this higher score of the product enables us to recommend this item to this customer because this item has features which the customer likes.  It makes sense! 


If you need more details of matrix factorization, this paper is recommended to read. It explains how the model work in recommender systems. Going forward, I would like to develop a prototype model of recommender engine with R language so that beginners understand how the models work on recommender systems.

For your information, Coursera will provide the course of “Mining Massive Datasetsby Jure Leskovec, Anand Rajaraman and Jeff Ullman at Stanford University. It will start at 29th Sep 2014 and cover recommender systems. Do not miss it!


How can we classify recommender systems now?

I have researched recommender systems for more than two weeks and I am very surprised to see there are a lot of recommender systems.  Therefore, I think I should have a proper way to classify these systems to explain the characteristic of each system.  Most of research papers and documents focus on the system called“collaborative filtering”.  But I think it is a little difficult to explain the difference between systems by using the word “collaborative filtering”. So I would like to focus data which are used in developing models.  Especially I am interested in content features, such as actors and actresses,  time of  production,  directors,  countries where it was produced, and so on in case items are movies. I hope beginners for recommender systems can understand them easily.


1. Recommender systems with content features

When we have content features in our data set,  this type of models is used. They are useful when there are less data about customers’ rating and interactions because recommendation to the customer can be produced without other customers’ data.  It means that we can avoid “cold start issues”. On the other hand,  we need data about content features for each item. It may take time and cost to prepare it, although it is worth doing so.   I think in most cases this type of recommender systems are used in businesses now.


2. Recommender systems without content features

When we have no content features in our data set,  this type of models should be considered. Without content features,  similarity between customers are used to produce recommendations.  Similarity between items are also used without contents features.  This type models may be referred as”collaborative filtering” in documents and research papers. It has advantages as there is no need to obtain content features.  On the other hand, we need customers ratings and interactions in advance to develop models.  It is not good for startups as they have less data about customers in general.


In practice,  it is good for beginners to classify recommender systems based on whether content features are used or not. Because there is no need to know the mechanisms of recommender systems.  There are a lot of methods to develop recommender engine. For example, regression, classification, clustering, collaborative filtering, etc.  So I think it is very difficult to classify recommender systems based on the way of “how to calculate recommendations”. When beginners are getting familiar with the methods above,  they can understand  how each method works in recommender systems.


When I research the documents and papers about recommender systems,  I found thatNetflix prize,  a kind of programming competition where the winner was granted one million USD in 2009.  It is very interesting to go into deeper because these models discussed during the competition were superior to existing models to provide recommendations accurately and easy to learn even for beginners. I would like to discuss these methods in the next blog.  See you next time!

How can we produce recommendations based on customers information?

Most of you know what recommendations by retailers, e-commerce are.  Few people, however, know how they are produced behind the web-screens or e-mails. So I would like to explain the mechanism about production of recommendations in a series of my blog as the project is going on in the company. First, I would like to consider three points below one by one.  I focus on personalized recommendations, which can be customized to customers individually,  based on the information about them.Unpersonalized recommendations, such as recommendations based on just sales ranking are out of our interest and scope because this is expected to be provided for every customer equally.


1. Customers

Of course,  customers are the most important for our businesses.  The problem is that how customer information is obtained and how customer preference can be known based on customer information.  It is clear that the best way to know customer preference is just asking them.  But it is almost impossible to ask everything about their preference.  Fortunately websites and smart phone are widely used among customers so they make us easier to obtain customer information through “What they view longer”, “What they put into favorite items ” and “What they bought in the past”.  We can know customer preference based on this information.


2. Items

Items mean not only products, but news, information, services and anything which can be chosen by customers.  Each item can be expressed by some features, which are characteristics of the products.  When two items share the same features,  one can be recommended to customers who bought the other because both products has similarities each other.  It may be difficult to choose good features to do it.  So I would like to continue to research how to choose features effectively.


3. Relationship between customers and items

Once information of customers and items are obtained,  the relationship between customers and items should be considered.  I imagine it is very important to obtain the relationship accurately, so that recommendations can be accurate and effective. Statistical models are needed to calculate metrics in order to express the strength of the relationship between customers and items.


These three points above are critically important to construct a recommender engine because recommendations are a kind of matching between customers and items.  I would like to expand this argument to develop algorithms so that recommendations can be calculated correctly. I found that a dozen of programs of recommender engines, which are open source, are available to us. I would like to review some of them going forward.  I hope you can enjoy them, too!