Let us do "Deep reinforcement learning" with OpenAI Gym this year!


Since new year starts, I would like to perform a new method to analyze data.  So I decide I introduce "Deep reinforcement learning" in my-startup, TOSHI STATS,  and develop new applications this year. The definition of "Deep reinforcement learning" is as follows.

"Reinforcement learning (RL) is the branch of machine learning that is concerned with making sequences of decisions. RL has a rich mathematical theory and has found a variety of practical applications."(1)

The beauty is "making sequences of decisions".  In our lives, we make many decisions in a sequential manner. For example, how can we spend money this month?  Going to a good restaurant at the beginning of this month?  or going abroad later on this month.  Because the budget is limited, unfortunately, we can not do both of them in this month.  So we should decide them as sequential decision making.



"Deep reinforcement learning" looks awesome to solve our real-life problems.  But we need the framework to perform it.  Because special environments are needed to do that.  I review several of frameworks and decide that "OpenAI Gym (2)" is introduced as the framework to develop this technology in my start-up.  Because OpenAI Gym was released in June 2016 and since then it has been widely used in the communities of researchers and there are many open source projects with it. OpenAI Gym provides many game-based environments for Deep Reinforcement Learning.

I try simple experiments by using "CartPole-v0".  Let us see the difference before and after training the model. It is very interesting!

Without training

With training (nb_steps=50000, took 368sec)

You can see behaviour of the pole after training is much more stable than before. Because the pole is controlled by optimisation of deep reinforcement learning. This algorithm is provided by awesome open source library "keras-rl(3)". It is easy to use with OpenAI Gym as it is written by "keras", which is one of the most popular framework in deep learning. I use my Mac-Air11 for this experiment. If you are interested in the theory behind the experiment, you can refer this paper.



Since AI-Go player "AlphaGo" defeated human professional Go player in 2016, Deep Reinforcement Learning gets attention among researchers and developers in many industries. I want to go deeper and develop many applications by deep reinforcement learning going forward.  So stay tuned!

Regards Toshi




(1) ,(2) OpenAI Gym, Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba, 2016

(3) keras-rl, Matthias Plappert, 2016




Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.