Let us think about how to create our own AI in our hands. This must be exciting!

Last article,  I said AI Chatbots are getting hotter and hotter. Since then I wonder how I can create my own AI to make chat bots, Q&A systems and my own agents.  I find it is relatively easy to use API services such as Chat GPT API. But I would like to create my own AI from scratch with open source models!  This is especially good when we want to analyze confidential data as we do not need to use models in public. It must be exciting. Let us start!

 
  1. Let us choose base models to create our own AI

There are many language models which are open source. It is very important for us to choose the best one as we should keep a balance between the performance of the model and the size of the model. Last week, I found a brand new model called “UL2 20B” from Google Brain. This is led by Mr. Yi Tai, who is Senior Research Scientist at Google Brain, Singapore. This is perfectly open as everyone can download the model and its weights.  I am very glad because many LLMs have the limitation to use, such as non-commercial license.  When you are interested in the technical details, I strongly recommend reading his blog “A New Open Source Flan 20B with UL2”.  This is “must read” for everyone who is interested in LLM.

 

2. Perform small experiments and see how it works! 

I would like to use the famous research paper “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models(2)”. It has a good abstract in it. It says

“We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.”


It might be a little difficult to read as there are many technical terms in it, haha. Then I would like to ask two questions about this abstract. The first one is here and I get answer from the model.

Q : "What is the meaning of 'a chain of thought' in this document?

A : a series of intermediate reasoning steps

I put my notebook to show how it works during the experiment.

The second one is 

Q : What is the meaning of 'chain of thought prompting' in this document?

A : chain of thought prompting is a method for generating a chain of thought

These questions are slightly different, But the model can answer both of them accurately without confusion. This is incredible! Is the model really free and open source?! I confirm this model is the best of the best to create our own AI in our hand.

 




As we see, we obtain the best model to create our own AI. Then I would like to consider how to implement the model to use it easily. I will explain it in my next article, stay tuned!







(1) “A New Open Source Flan 20B with UL2” Yi Tai, Senior Research Scientist at Google Brain, Singapore, 3 , March 2023

(2) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou Google Research, Brain Team  10, Jan 2023

Copyright  ©  2023  Toshifumi Kuga  All right reserved





Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

AI can be our agent which understands our languages. It must be a game changer in our businesses, lives and science!

Recently, AI Chat bot is getting hotter and hotter all over the world. It starts with Chat GPT, which was released in Nov 2022 and attracts over 100 million users in just two months. It is amazing! You might want to know why it is so popular and what impacts it provides us. Here is the answer from me. Let us start!


1.Why can AI understand our languages?

The first time I used Chat GPT last year, I felt like It could understand what I said. When we use relatively small NLP (Natural Language Possessing) models, it cannot understand our languages because it cannot retain much information with a small number of parameters. Therefore we need programming languages to instruct small models in order to solve our tasks. When we start large language models such as GPT2, T5xxl, which have more than 10 billion parameters, it starts to acquire the ability to understand our language gradually. We call them LLM (Large Language Model). When LLM can understand our language, LLM can perform complex training methods to absorb more information. The more parameters they have, the more complex training they can perform. As a result of that, it can finally understand what we say in our language. Although it is not perfect and is still in progress, it is already enough to create AI agents. Let us move on to next.


2. What can AI do when we instruct it in our languages?

Once LLM understands our language, it can do many things such as answering questions and summarizing texts. These tasks are relatively simple. But LLM can perform more than that. Basically, LLM has a structure, in which we input texts and LLM outputs texts. So it is called “sequence to sequence” structure. “Sequence” can be anything you want. For example, it can be texts which tell us “the processes to get to buy the ticket of the next concerts” or “ Path to reach our destinations in detail”, based on our instructions. When we instruct “ I want something” by our languages, AI can output the processes to obtain that. It means that AI can be our agent. It must be exciting!


3. AI agents will appear in front of us soon!

Because LLM can understand what we instruct to AI in our language, AI can effectively react to our instructions. Different instructions mean different reactions by AI. It means AI can be our agents as they can act for our sake. Sounds great! Our instructions might be unclear, but AI agents can be expected to understand our intentions behind them. So they must be good agents for us! These technologies are at a very early stage. We will see many applications in our businesses going forward.


I would like to update the progress of AI agents. Stay tuned!


Copyright  ©  2023  Toshifumi Kuga  All right reserved


Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.


These images above are under CC BY-NC-SA

AI model “Stable Diffusion” is going to lead innovations in computer vision in 2023. It must be exciting!

Hi friends. Happy new year! Hope you are doing well. Last year, I found a new computer vision model, called AI model “Stable Diffusion” in September. Since then, many AI researchers, artists and illustrators are crazy about that because it can create high quality of images easily. The image above is also created by “Stable Diffusion”. This is great!

1. I created many kinds of images by “Stable Diffusion”. They are amazing!

These images below were created in the experiments by “Stable Diffusion” last year. I found that it has a great ability to generate many kinds of images from oil painting to animation. With fine-tuning by “prompt engineering”, they are getting much better. It means that we should input appropriate words / texts into the model then the model can generate images that we want more effectively.


2. “Prompt engineering” works very well

In order to generate images that we want, we need to input the appropriate “prompt” into the model. We call it “prompt engineering” as I said before,

If you are a beginner to generate images, you can start it with a short prompt such as ” an apple on the table”. When you want the image which looks oil painting, you can just add it such as “oil painting of an apple on the table”.

Let us divide each prompt into three categories

  • Style

  • objects

  • the way objects are displayed (ex. lighting)

So all we have to do is to consider what “each category of our prompt” is and input it into the model. For example “oil painting of an apple on the table, volumetric light’ . The results are images below. Why don’t you try it by yourself?



3. More research needed

Some researchers in computer vision think “Prompt engineering” can be optimized by computers. They developed the model to optimize it. In the research paper(1), they compare hand made prompt vs AI optimized prompt. Which do you like better? I am not sure optimization always works perfectly. Therefore I think more research is needed with many use cases.



I will update my article to see how the technology is going in the future. Stay tuned!





1) Optimizing Prompts for Text-to-Image Generation Yaru Hao, Zewen Chi, Li Dong, Furu Wei, Microsoft Research, 19 Dec 2022, https://arxiv.org/abs/2212.09611

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.