Hi friends, I am Toshi. I updated my weekly letter. Today I explain 1. How classification, do or do not, can be obtained with probabilities and 2. Why computers may replace experts in many fields from legal service to retail marketing. These two things are closely related to each other. Let us start now.
1. How can classification be obtained with probabilities?
Last week, I explained that “target” is very important and “target” is expressed by “features”. For example Customer “buy” or “not buy” may be expressed by customers age and the number of overseas trips a year. So I can write this way : “target” ← “features”. This week, I try to show you the value of “target” can be a probability, which is a number between 0 and 1. If the “target” is closer to “1”, the customer is highly likely to buy. If the target is closer to “0”, the customer is less likely to buy. Here is our example of “target” and “features” in the table below.
I want Susumu’s value of the “target” to be close to “1” in calculations by using “features”. How can we do that? Last week we added “features” with“weight” of each feature. For example (-0.2)*30+0.3 *3+6, the answer is 0.9. “-0.2″ and “0.3” are the weight for each feature respectively. “6” is a kind of adjustment. Next let us introduce this curve below. In the case of Susumu, his value from his features is 0.9. So let us put 0.9 on the x-axis, then what is the value of y? According to this curve, the value of y is around 0.7. It means that Susumu’s probability of buying products is around 0.7. If probability is over 0.5, it is generally considered that customer is likely to buy.
In the case of Tom, I want his value of the “target” to be close to “0” in calculations by using “features”. Let us add his value of features as follows (-0.2) *56+0. 3 *1+6, the answer is -4.9. His value from his features is -4.9. So let us put -4.9 on the x-axis, then what is the value of y? According to this curve, Tom’s probability of buying products is almost 0. Unlike Susumu’s case, Tom is less likely to buy.
This curve is called “logistic curve“. It is interesting that whatever value “x” takes, “y” is always between 0 and 1. By using this curve, everyone can have the value between 0 and 1, which is considered as the probability of the event. This curve is so simple and useful that it is used in many fields. In short, everyone has a probability of buying products, which is expressed as the value of “y”. It means that we can predict who is likely to buy in advance as long as “features”are obtained! The higher value customers have, the more likely they will buy the products.
2. Why may computers replace experts in many fields?
Now you understand what are”features”. “Features” generally are set up based on expert opinion. For example, if you want to know who is in default in the future, “features”needed are considered “annual income”, “age”, “job”, “the past delinquency” and so on. I know them because I used to be a credit risk manager in consumer finance company in Japan. Each expert can introduce the features in the business and industries. That is why the expert’s opinion is valuable, so far. However, computers are also creating their features based on data. They are sometimes so complex that no one can understand them. For example, ” -age*3-number of jobs in the past”
In the future, I am sure much more data will be available to us. It means computers have more chance to create better “features” than
Notice: TOSHI STATS SDN. BHD.