With all the hype and progress with different models, namely ChatGPT, AlphaFold, and the coming GPT-4, weāre getting closer to AGI.
Many people in the AI community believe that scaling is all you need to reach AGI. After all, scaling laws prove that as you increase the compute, data size, and number of parameters in the model, the performance increases.
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F78d938c1-bf3e-4be8-a7e1-9aaa7045043e_1442x450.png)
On the other hand, despite the performance that scaling can help us reach, other people (including me) believe that scaling will not get us to AGI. There is a limit to scaling, in which fundamental components are missing. The main three ones, from my perspective, are:
Common sense
Factuality
Causality
Common Sense
As Yejin Choi puts it, common sense in AI is dark matter. In our world, 5% is normal and the rest (95%) is dark matter/energy. In AI the normal matter is the visible text, like words and sentences, and the dark matter are unspoken rules of how the world works.
Why common sense is missing from current systems
We donāt even talk about it (itās unspoken). If I asked you, how many eyes does a horse have, you would say 2 without a doubt. Why would data exist like that if itās so obvious that we donāt even think about it?
There are so many exceptions to the rules of thumb that describe our world. Itās hard to describe them all. If you want to teach it that all birds can fly, you need to list out the exceptions (and there are many). Itās quite analogous to self-driving cars- there are so many edge cases for every scenario that could come up.
If I were to ask you, āIf a nail was being pounded on wall, is the nail being placed horizontally or vertically?ā, Iām sure you would say the former. As Delipe George mentions, humans answer this question based on a mental simulation- we think about what the nail might look like against the wall when it is being pounded and then know immediately that it is horizontal.
Despite this being a physics/simulation type question, models also have difficulty with quantitative reasoning (i.e. applying math to real world problems). This is because thereās a lot of components required in this type of reasoning, including understanding a question, recalling formulas, and then step-by-step applying them in the proper way. A mistake in any of these leads to poor quantitative reasoning and an incorrect output.
Progress in the common sense realm
World models - Yann LeCun
Machines need to be able to learn through observation. Many times, the way we learn is through observation- if you see someone touching a light bulb and they get electrocuted, thereās no way you would do the same (unless youāre in it for the adventure š Iām kidding please donāt try that). To replicate learning through observation in machines, they need to be able to represent the world and learn + predict based on that.
Yann LeCun believes that āCommon sense can be seen as a collection of models of the world that can tell an agent what is likely, what is plausible, and what is impossible. Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems.ā He believes that unsupervised or self-supervised learning is the approach to take for this. (https://openreview.net/pdf?id=BZ5a1r-kVsf)
Neurosymbolic AI
The idea behind this is that humans can learn by creating symbolic representations and rules for dealing with the world. These rules are basically computer algorithms filled with our knowledge. For example, if you are throwing a ball and thereās no wall, you would know that it will just keep flying in the air until it falls down. If there is a wall, then it would bounce back.
Specifically, you have a structured representation of knowledge, which represents symbols at a high level and their interconnections. This forms a representation of the āworldā (symbolic AI). However, the model still doesnāt know what these symbols mean, so you use neural networks to link the representations to their meanings (i.e. ātranslations of raw sensory dataā)
Mindās Eye (from Googleās Brain Team)
I came across this paper that moves the goalpost forward for reasoning through simulation, specifically targeting the point I mentioned earlier. It notes how you can represent a physical reasoning question ā simulate the possible outcomes through MuJuCo ā use the simulation results as part of the input in the language model to perform reasoning
Minerva
Googleās team built a language model to help with the lack of quantitative reasoning. The model, Minerva, can solve mathematical and scientific questions with step-by-step processing. It generates solutions that use numerical calculations and symbolic manipulation.
Minerva specifically uses many techniques, like chain of thought (to break up a complex problem into multiple intermediary steps), scratchpad (to help process these multiple steps), and majority voting (looks at the multiple possible outputs generated and chooses the most common result as the final answer).
On the MATH dataset, which was developed by the Center for AI Safety for quantitative reasoning, 540B Minerva had an accuracy of 50.3%, whereas 540B PaLm (another language model) had an accuracy of 8.8%.
Factuality
Thereās so much data that LLMs donāt have and because they have no world models, it becomes hard to infer from the already present data. Even with the data that is present, these models sometimes say things that are inconsistent with their own training sets. For instance, Galactica stated, āElon Musk died in a car crashā and there is literally evidence in the data that states heās alive. The model was āhallucinatingā
This goes to show that misinformation is huge with LLMs- this paper āTruthfulQAā exemplifies why this is concerning.
The paper, āAI safety via debateā, proposes a pretty interesting solution to this: there is a debate between competing agents where an agent makes arguments and another agent pokes holes in those arguments, and so on until thereās enough information to decide the truth on a statement. Check out my notes on the paper: https://shizacharania.notion.site/shizacharania/AI-safety-via-debate-b54ca7a9d8464f8e8461569d937cf498. There is also a paper that Deepmind put out using this concept for natural language. Iāll be covering more of my thoughts on misinformation and these papers in a post coming soon š.
Another problem here is that, it is also crucial to have the model actually understand whatās happening in the present data before it can reason/use its logic. For instance, with DeepMindās Gato, which is a multi-modal, multi-task, and multi-embodiment generalist agent, it fails to comprehend the data and isnāt reliable nor factual. Here are some examples of these:
Causality
Current ML/AI algorithms are based on correlations to predict the output. With causal reasoning, models can answer cause and consequence questions, like āif x happens, what will happen to y?ā. They become more generalizable because, even if there are subtle changes, the model can easily just tweak its thinking to consider the impact on a new variable.
As humans, weāre better at seeing what works and what doesnāt. This goes back to world models as well because if you realize that thereās a big rock and itās strong, you will most likely not kick it to try to move it because you know what will happen next.
Something really interesting that I came across when looking for ways to help models with causality is generative flow networks or GFlowNets. This is something Yoshua Bengio is really excited about.
The idea behind GFlowNets is to represent the way humans reason sequentially, adding a new piece of relevant information at each step. It consists of both RL since the model learns a policy to solve a problem and generative modelling because it generates trajectories to get to its end state in the graph network.
To learn more about this, check out my Youtube video, explaining the high-level architecture with my brotherās lego pieces š :
That concludes my primary thoughts on these 3 areas that, in my opinion, need to be worked on to get to AGI. Iāve just started exploring these different theories, so if you have any arguments against my reasoning, please feel free to reach out on Twitter, LinkedIn, or email me at charania.shiza@gmail.com
P.S. many points in this post were based on an AGI debate I watched a couple of days ago held by Gary Marcus. Iād definitely give it a watch if youāre into this sort of stuff :)