Toshifumi Kuga

July 22, 2023

AI, NLP, LLM, Llama2

"Llama2" is a great LLM as it is Open source and for commercial use. I want to try many applications with this language model.

Toshifumi Kuga

July 22, 2023

AI, NLP, LLM, Llama2

Llama2 website https://ai.meta.com/llama/

Hi friend, I would like to introduce a new LLM, which was released from Meta on July 18,2023. It is called “Llama2”. I have some experiments with this model. Let us start!

1. What is Llama2?

“Llama2” is language model from Meta AI. Many researchers are very excited because it is a open source and available for commercial usage. Its specs are explained in the table below.

2. Let us extract information from the article in English

I want to perform a small experiment to extract information from text.

sentiment
root cause of the sentiment
name of product
name of makers of product

I made my prompt and fiction story in the mail. Then run Llama2 13B chat. Here are the results

Woh, looks good! I can obtain the information I need from text. Unfortunately the model cannot output it in Japanese.

3. Let us see how it works against Japanese sentences

Next, I would like to apply the same prompt against Japanese sentences here.

Woh, looks good, too! Although the model cannot output it in Japanese, either.

4. Llama2 has a great potential for AI applications in the future!

Today I found that Llama2 works in English very well. When we want to minimize running costs for AI applications or keep secret/confidential data within our organization, this model can be a good candidate for AI models in our applications. It is great to have many choices of LLMs in addition to proprietary models, such as ChatGPT.

I want to mention a great repo on GitHub. It makes it easier to compare many open source LLMs, It is a strong recommendation for everyone who is interested in LLMs. Thanks camenduru!

Thanks for your attention! would like to follow the progress of Llama2 and share it with you soon. Stay tuned!

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

Toshifumi Kuga

July 10, 2023

AI, LLM, NLP, ChatGPT, Tree of Thoughts

"Tree of Thoughts" can go mainstream in prompt engineering!

Toshifumi Kuga

July 10, 2023

AI, LLM, NLP, ChatGPT, Tree of Thoughts

Today, I found a very interesting paper called “Tree of Thoughts (ToT)”(1). With ToT, we can solve the tasks, where we could not do it before. So I want to share it with you and consider how it works together. Let us start now!

1.Chain of Thoughts (CoT)

This paper provides four kinds of prompting as the chart below says. The left one is called “IO prompting” and is relatively simple. The right one is the most complex, called “Tree of Thoughts (ToT)”.

Among four kinds of prompting, I focus on Chain of Thoughts (CoT) first because it gives us a fundamental space to explore. The paper says “The key idea is to introduce a chain of thoughts z1, · · · , zn to bridge x and y, where each zi is a coherent language sequence that serves as a meaningful intermediate step toward problem solving“. By CoT, we explore a prompting method for improving the reasoning abilities of LLMs and solve complex tasks effectively. Once we understand how CoT works, let us move on ToT.

2. Tree of Thoughts (ToT)

Let us expand CoT with tree search so that we can apply it to more complex tasks effectively. This paper says “we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving.”. Sounds great! OK, let us consider how it works.

ToT has four steps to implement it. I would like to explain them step by step.

decompose the process into thoughts
- each thought should be small enough so that LLMs can generate promising and diverse samples
generate states
- generate potential thoughts from each state. There are two kinds of methods to do this according to this paper.
evaluate each state
- LLMs evaluate each state to decide how a tree should grow
search for the best state
- If the current state is not good enough, we should search into other branches. There are several search algorithms to do that.

3. ToT can be solved by MCTS

Although ToT can be solved with relatively simple Tree Search algorithms, we can use more advanced algorithms, such as Monte Carlo Tree Search (MCTS). It has been famous since AlphaGo defeated a professional human Go player in March 2016. In AlphaGo, MCTS is combined with Neural network. This is sometimes called “model guided Tree Search” and we do not need to search for the whole possible state anymore. In the picture, Demis Hassabis, Google DeepMind CEO, explained how it works(2).

It must be exciting when ToT can be searched by MTCS in the near future as wider and deeper states can be explored and it must provide us good results.

Thanks for your attention! I would like to follow the progress of ToT and share it with you soon. Stay tuned!

1) “Tree of Thoughts: Deliberate Problem Solving with Large Language Models” Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 17 May 2023, https://arxiv.org/abs/2305.10601

2) Using AI to Accelerate Scientific Discovery | Campus Lecture with Demis Hassabis, https://www.youtube.com/watch?v=Ds132TzmLRQ&t=1381s

Toshifumi Kuga

June 18, 2023

AI, GPT4, LLM, NLP, Function calling, OpenAI

“Function calling” is a game changer as GPT can access outside and be converted to our agents easily!

Toshifumi Kuga

June 18, 2023

AI, GPT4, LLM, NLP, Function calling, OpenAI

Today, I want to create web-site with a description of the Japanese sweets collection, just like “Dorayaki“ in the picture above. So I ordered my AI agent to create an awesome web-site. But is it really possible? I am sure yes, it is!. As you know, OpenAI created GPT, which is very intelligent large language model (LLM). On 13 June 2023, “Function calling” was introduced by OpenAI. It can bridge GPT to other systems, APIs and functions outside. Let me explain step by step!

1.What is the advantage of “Function calling”?

Function calling makes it easy for GPT to access functions outside. For example, when you want to create a web-site where Japanese sweets are explained to customers, you need to connect GPT to the function that can write code of web-site with HTML/CSS. With “Function calling”, GPT can call this function and pass the parameters, such as “explanations of Japanese sweets” to this function. Official documents says “The latest models (gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to both detect when a function should to be called (depending on the input) and to respond with JSON that adheres to the function signature.”

2. The list of “functions” is key to set “function calling” up

“Function calling”looks great! But how can we implement in our code. I think it is so simple. Just prepare the list of functions. This should have

"name"
"description"
"parameters" : "type" , "properties", "required"

In ChatCompletion.create, we should add “functions=functions” because we want to call the function. The other part of the code has not changed so much. The code below shows us an example of functions, which comes from Official documents. Please look at these docs for the details if needed.

3. Let us see how the generated web looks like

OK, it is the time that we see the result from our agent. I instruct "Create a web-site for a pretty Japanese sweets collection" to our agent. Text of “title” and “explanation” are generated by GPT3.5-turbo and are sent to the function that creates a web. Here is the result. All are written in Japanese. The title means “a pretty Japanese sweets collection". The sentences of the explanation are pretty good! I do not think there is a need to fix or modify these sentences at all.

If you want to know more details with the code, you can see it here.

https://github.com/TOSHISTATS/Wagashi-Collection-Web-Generation-agent-by-GPT3.5#readme

Hope you can understand how AI agents work. I think potential use-cases of “Function calling”are limitless. I tried several use cases by “Function calling” and found that it can be a game changer to develop LLM application systems. I would like to update my article about AI agents by OpenAI GPT soon. Stay tuned!

Toshifumi Kuga

June 12, 2023

LLM, NLP, AGI, GPT4, Deep Learning

"Large Language Models as Tool Makers" is a good idea to enhance the accuracy of the model while keeping the cost of computation low!

Toshifumi Kuga

June 12, 2023

LLM, NLP, AGI, GPT4, Deep Learning

Since GPT4 , one of the most intelligent Large Language Model (LLM) in the world, was released on 14 March 2023, many people are surprised because it is very intelligent. This is great. But there is one problem for users. It is not free service. Users should pay the fee of GPT4, based on how much tokens they use on GPT4. Therefore when we continue to use GPT4 all day long, it must be very expensive. Of course we prefer more intelligence. But we should consider the cost of it. There is a trade off between them. What should we do to solve it? Last week, I found a good research paper called “Large Language Models as Tool Makers“(1). All charts below come from this awesome paper. The idea is simple and looks promising to tackle these problems. So let me explain more details.

1. Tool user and Tool maker

The basic idea is as follows. Let us have two LLMs, one is called “Tool maker” and another is called “Tool user”. When a new task comes to us, Tool maker create “tools” for this task. Once “tools” are ready, they are passed to Tool user for inference. These tools are reusable to solve similar tasks in the future. So GPT4 can be used only as Tool maker as it should be more intelligent. Light weights models, such as GPT3.5, can be used as a Tool user. Then we can reduce the cost of computation for inference. It sounds great! The chart below explains how it works.

2. How can we create tools for our task?

As we want to keep the accuracy of the results, Tool maker should create better tools. There are three steps to do that.

• Tool Proposing: The tool maker generates a Python function to solve the given task. If the proposed tool makes errors, the tool maker makes another tool.

• Tool Verification: The tool maker generates unit tests using validation samples and subsequently executes these tests. 3 validation samples are prepared here. If the tool fails any of these tests, the tool makes an attempt to fix the issues. The paper explains it as follows “This stage fulfills two key roles: 1) it provides examples that demonstrate how to convert natural language questions into function calls, and 2) it verifies the tool’s reliability, enabling the entire process to be fully automated.”

• Tool Wrapping: If the execution or verification passes the preset threshold, tool maker prepares the wrapped tool for tool user. This step involves wrapping up the function code and providing demonstrations of how to convert a task into a function call. This final product is then ready for use by the tool user.

The chart below shows us how it works.

Once the tool is ready, the tool is passed to the tool-user. The tool user should solve various instances of the task by using tools made by the Tool maker. The prompt for this stage is the wrapped tool which contains the function for solving the task and demonstrations of how to convert a task query into a function call. With the demonstrations, tool user can then generate the required function call in an in-context learning fashion. The function calls are then executed to solve the task. The chart below shows us how the processes from Tool maker to Tool user are going.

3. Can we confirm if tools that fit our tasks are available or not?

Here, we use a third LLM called “the dispatcher”. Because the dispatcher maintains a record of existing tools produced by the tool maker, it can confirm if tools that fit our task are available at the time when a task is received, If no appropriate tool is found, the dispatcher identifies the instance as a new task and solves the instance with a powerful model, such as GPT4. The dispatcher’s workflow is shown here.

That is it! This is a major part of “Large Language Models as Tool Makers” or “LATM” in short. By LATM, we might reduce computational cost for heavy models, such as GPT4. It is amazing! Hope you enjoy the article today. I will update new technologies around LLM in the near future. Stay tuned!

1) “Large Language Models as Tool Makers” Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, Denny Zhou, 26 May 2023, https://arxiv.org/abs/2305.17126

Toshifumi Kuga

May 14, 2023

LLM can be "reasoning engine" to create our agents. It must be a game changer!

Toshifumi Kuga

May 14, 2023

Recently, Large Language model (LLM) is getting more attractions all over the world. Google released their new LLM called “PaLM 2” on 10 May 2023. It starts competing against “ChatGPT” which was released in Nov 2022 and attracts over 100 million users in just two months. LLM is expected to be more intelligent in a short period as competition between big IT companies is getting tough. What does it mean to us? . Let us consider step by step!

1. How can we create our own agent?

In my article in Feb 2023, I said AI can be our agent which understands our languages. Let us consider how it is possible step by step. When I want to eat lunch. I just order my agent, saying “I would like to have lunch”, LLM can understand what I say and try to order my favorite hamburger at the restaurant. For LLM to act against the outside world (such as call restaurants), it needs some tools, which can be created with libraries such as “LangChain”. Then LLM can order my lunch and finally I can have lunch, anyway. It sounds good. Let us move deeper.

2. LLM is not just an “interface” to us with natural languages.

As I said in Feb this year, the first time I used ChatGPT, I felt like it could understand what I said. But now I do not think it is just an“interface” any more. Because LLM is trained with massive amounts of text from the web, books and other sources, LLM obtains a lot of knowledge of human beings from the past to the present. Since Chat GPT appeared in front of us last year, I performed many experiments with LLM and found that LLM has an ability to make decisions. Although it is not perfect, it sometimes performs at the same level as human beings. It is amazing! In addition to that, LLM is still in the early stage and evolves on a daily basis!

3. LLM will be more and more intelligent as a “reasoning engine”!

Mr. Sam Altman, OpenAI CEO says in youtube “ChatGPT may be a reasoning engine”(1). I completely agree with his opinion. When we create our agents, LLM works as a “reasoning engine” to make decisions to solve complex tasks. Around LLM, there are many systems to act against the outside world, such as “search web” or “shop in e-commerce”. All we have to do is think “how can we enable LLM make the right decisions”. Because LLM is very new for everyone, no one knows the right answer. Fortunately, LLM can understand our languages, it may not need programming anymore. It is very important for us. So let us consider step by step!

I would like to update the progress of AI agents. Stay tuned!

1) Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367 https://www.youtube.com/watch?v=L_Guz73e6fw&t=867s (around 14:25)

Toshifumi Kuga

March 6, 2023

Let us think about how to create our own AI in our hands. This must be exciting!

Toshifumi Kuga

March 6, 2023

Last article, I said AI Chatbots are getting hotter and hotter. Since then I wonder how I can create my own AI to make chat bots, Q&A systems and my own agents. I find it is relatively easy to use API services such as Chat GPT API. But I would like to create my own AI from scratch with open source models! This is especially good when we want to analyze confidential data as we do not need to use models in public. It must be exciting. Let us start!

Let us choose base models to create our own AI

There are many language models which are open source. It is very important for us to choose the best one as we should keep a balance between the performance of the model and the size of the model. Last week, I found a brand new model called “UL2 20B” from Google Brain. This is led by Mr. Yi Tai, who is Senior Research Scientist at Google Brain, Singapore. This is perfectly open as everyone can download the model and its weights. I am very glad because many LLMs have the limitation to use, such as non-commercial license. When you are interested in the technical details, I strongly recommend reading his blog “A New Open Source Flan 20B with UL2”. This is “must read” for everyone who is interested in LLM.

2. Perform small experiments and see how it works!

I would like to use the famous research paper “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models(2)”. It has a good abstract in it. It says

“We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.”

It might be a little difficult to read as there are many technical terms in it, haha. Then I would like to ask two questions about this abstract. The first one is here and I get answer from the model.

Q : "What is the meaning of 'a chain of thought' in this document?

A : a series of intermediate reasoning steps

I put my notebook to show how it works during the experiment.

The second one is

Q : What is the meaning of 'chain of thought prompting' in this document?

A : chain of thought prompting is a method for generating a chain of thought

These questions are slightly different, But the model can answer both of them accurately without confusion. This is incredible! Is the model really free and open source?! I confirm this model is the best of the best to create our own AI in our hand.

As we see, we obtain the best model to create our own AI. Then I would like to consider how to implement the model to use it easily. I will explain it in my next article, stay tuned!

(1) “A New Open Source Flan 20B with UL2” Yi Tai, Senior Research Scientist at Google Brain, Singapore, 3 , March 2023

(2) Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou Google Research, Brain Team 10, Jan 2023

Toshifumi Kuga

February 14, 2023

AI can be our agent which understands our languages. It must be a game changer in our businesses, lives and science!

Toshifumi Kuga

February 14, 2023

Recently, AI Chat bot is getting hotter and hotter all over the world. It starts with Chat GPT, which was released in Nov 2022 and attracts over 100 million users in just two months. It is amazing! You might want to know why it is so popular and what impacts it provides us. Here is the answer from me. Let us start!

1.Why can AI understand our languages?

The first time I used Chat GPT last year, I felt like It could understand what I said. When we use relatively small NLP (Natural Language Possessing) models, it cannot understand our languages because it cannot retain much information with a small number of parameters. Therefore we need programming languages to instruct small models in order to solve our tasks. When we start large language models such as GPT2, T5xxl, which have more than 10 billion parameters, it starts to acquire the ability to understand our language gradually. We call them LLM (Large Language Model). When LLM can understand our language, LLM can perform complex training methods to absorb more information. The more parameters they have, the more complex training they can perform. As a result of that, it can finally understand what we say in our language. Although it is not perfect and is still in progress, it is already enough to create AI agents. Let us move on to next.

2. What can AI do when we instruct it in our languages?

Once LLM understands our language, it can do many things such as answering questions and summarizing texts. These tasks are relatively simple. But LLM can perform more than that. Basically, LLM has a structure, in which we input texts and LLM outputs texts. So it is called “sequence to sequence” structure. “Sequence” can be anything you want. For example, it can be texts which tell us “the processes to get to buy the ticket of the next concerts” or “ Path to reach our destinations in detail”, based on our instructions. When we instruct “ I want something” by our languages, AI can output the processes to obtain that. It means that AI can be our agent. It must be exciting!

3. AI agents will appear in front of us soon!

Because LLM can understand what we instruct to AI in our language, AI can effectively react to our instructions. Different instructions mean different reactions by AI. It means AI can be our agents as they can act for our sake. Sounds great! Our instructions might be unclear, but AI agents can be expected to understand our intentions behind them. So they must be good agents for us! These technologies are at a very early stage. We will see many applications in our businesses going forward.

I would like to update the progress of AI agents. Stay tuned!

These images above are under CC BY-NC-SA

Toshifumi Kuga

January 26, 2023

AI model “Stable Diffusion” is going to lead innovations in computer vision in 2023. It must be exciting!

Toshifumi Kuga

January 26, 2023

Hi friends. Happy new year! Hope you are doing well. Last year, I found a new computer vision model, called AI model “Stable Diffusion” in September. Since then, many AI researchers, artists and illustrators are crazy about that because it can create high quality of images easily. The image above is also created by “Stable Diffusion”. This is great!

1. I created many kinds of images by “Stable Diffusion”. They are amazing!

These images below were created in the experiments by “Stable Diffusion” last year. I found that it has a great ability to generate many kinds of images from oil painting to animation. With fine-tuning by “prompt engineering”, they are getting much better. It means that we should input appropriate words / texts into the model then the model can generate images that we want more effectively.

2. “Prompt engineering” works very well

In order to generate images that we want, we need to input the appropriate “prompt” into the model. We call it “prompt engineering” as I said before,

If you are a beginner to generate images, you can start it with a short prompt such as ” an apple on the table”. When you want the image which looks oil painting, you can just add it such as “oil painting of an apple on the table”.

Let us divide each prompt into three categories

Style
objects
the way objects are displayed (ex. lighting)

So all we have to do is to consider what “each category of our prompt” is and input it into the model. For example “oil painting of an apple on the table, volumetric light’ . The results are images below. Why don’t you try it by yourself?

3. More research needed

Some researchers in computer vision think “Prompt engineering” can be optimized by computers. They developed the model to optimize it. In the research paper(1), they compare hand made prompt vs AI optimized prompt. Which do you like better? I am not sure optimization always works perfectly. Therefore I think more research is needed with many use cases.

I will update my article to see how the technology is going in the future. Stay tuned!

1)　Optimizing Prompts for Text-to-Image Generation Yaru Hao, Zewen Chi, Li Dong, Furu Wei, Microsoft Research, 19 Dec 2022, https://arxiv.org/abs/2212.09611