With all the hype and progress with different models, namely ChatGPT, AlphaFold, and the coming GPT-4, we’re getting closer to AGI.
Many people in the AI community believe that scaling is all you need to reach AGI. After all, scaling laws prove that as you increase the compute, data size, and number of parameters in the model, the performance increases.
On the other hand, despite the performance that scaling can help us reach, other people (including me) believe that scaling will not get us to AGI. There is a limit to scaling, in which fundamental components are missing. The main three ones, from my perspective, are:
Common sense
Factuality
Causality
Common Sense
As Yejin Choi puts it, common sense in AI is dark matter. In our world, 5% is normal and the rest (95%) is dark matter/energy. In AI the normal matter is the visible text, like words and sentences, and the dark matter are unspoken rules of how the world works.
Why common sense is missing from current systems
We don’t even talk about it (it’s unspoken). If I asked you, how many eyes does a horse have, you would say 2 without a doubt. Why would data exist like that if it’s so obvious that we don’t even think about it?
There are so many exceptions to the rules of thumb that describe our world. It’s hard to describe them all. If you want to teach it that all birds can fly, you need to list out the exceptions (and there are many). It’s quite analogous to self-driving cars- there are so many edge cases for every scenario that could come up.
If I were to ask you, “If a nail was being pounded on wall, is the nail being placed horizontally or vertically?”, I’m sure you would say the former. As Delipe George mentions, humans answer this question based on a mental simulation- we think about what the nail might look like against the wall when it is being pounded and then know immediately that it is horizontal.
Despite this being a physics/simulation type question, models also have difficulty with quantitative reasoning (i.e. applying math to real world problems). This is because there’s a lot of components required in this type of reasoning, including understanding a question, recalling formulas, and then step-by-step applying them in the proper way. A mistake in any of these leads to poor quantitative reasoning and an incorrect output.
Progress in the common sense realm
World models - Yann LeCun
Machines need to be able to learn through observation. Many times, the way we learn is through observation- if you see someone touching a light bulb and they get electrocuted, there’s no way you would do the same (unless you’re in it for the adventure 😉 I’m kidding please don’t try that). To replicate learning through observation in machines, they need to be able to represent the world and learn + predict based on that.
Yann LeCun believes that “Common sense can be seen as a collection of models of the world that can tell an agent what is likely, what is plausible, and what is impossible. Using such world models, animals can learn new skills with very few trials. They can predict the consequences of their actions, they can reason, plan, explore, and imagine new solutions to problems.” He believes that unsupervised or self-supervised learning is the approach to take for this. (https://openreview.net/pdf?id=BZ5a1r-kVsf)
Neurosymbolic AI
The idea behind this is that humans can learn by creating symbolic representations and rules for dealing with the world. These rules are basically computer algorithms filled with our knowledge. For example, if you are throwing a ball and there’s no wall, you would know that it will just keep flying in the air until it falls down. If there is a wall, then it would bounce back.
Specifically, you have a structured representation of knowledge, which represents symbols at a high level and their interconnections. This forms a representation of the “world” (symbolic AI). However, the model still doesn’t know what these symbols mean, so you use neural networks to link the representations to their meanings (i.e. “translations of raw sensory data”)
Mind’s Eye (from Google’s Brain Team)
I came across this paper that moves the goalpost forward for reasoning through simulation, specifically targeting the point I mentioned earlier. It notes how you can represent a physical reasoning question → simulate the possible outcomes through MuJuCo → use the simulation results as part of the input in the language model to perform reasoning
Minerva
Google’s team built a language model to help with the lack of quantitative reasoning. The model, Minerva, can solve mathematical and scientific questions with step-by-step processing. It generates solutions that use numerical calculations and symbolic manipulation.
Minerva specifically uses many techniques, like chain of thought (to break up a complex problem into multiple intermediary steps), scratchpad (to help process these multiple steps), and majority voting (looks at the multiple possible outputs generated and chooses the most common result as the final answer).
On the MATH dataset, which was developed by the Center for AI Safety for quantitative reasoning, 540B Minerva had an accuracy of 50.3%, whereas 540B PaLm (another language model) had an accuracy of 8.8%.
Factuality
There’s so much data that LLMs don’t have and because they have no world models, it becomes hard to infer from the already present data. Even with the data that is present, these models sometimes say things that are inconsistent with their own training sets. For instance, Galactica stated, “Elon Musk died in a car crash” and there is literally evidence in the data that states he’s alive. The model was “hallucinating”
This goes to show that misinformation is huge with LLMs- this paper ‘TruthfulQA’ exemplifies why this is concerning.
The paper, ‘AI safety via debate’, proposes a pretty interesting solution to this: there is a debate between competing agents where an agent makes arguments and another agent pokes holes in those arguments, and so on until there’s enough information to decide the truth on a statement. Check out my notes on the paper: https://shizacharania.notion.site/shizacharania/AI-safety-via-debate-b54ca7a9d8464f8e8461569d937cf498. There is also a paper that Deepmind put out using this concept for natural language. I’ll be covering more of my thoughts on misinformation and these papers in a post coming soon 👀.
Another problem here is that, it is also crucial to have the model actually understand what’s happening in the present data before it can reason/use its logic. For instance, with DeepMind’s Gato, which is a multi-modal, multi-task, and multi-embodiment generalist agent, it fails to comprehend the data and isn’t reliable nor factual. Here are some examples of these:
Causality
Current ML/AI algorithms are based on correlations to predict the output. With causal reasoning, models can answer cause and consequence questions, like “if x happens, what will happen to y?”. They become more generalizable because, even if there are subtle changes, the model can easily just tweak its thinking to consider the impact on a new variable.
As humans, we’re better at seeing what works and what doesn’t. This goes back to world models as well because if you realize that there’s a big rock and it’s strong, you will most likely not kick it to try to move it because you know what will happen next.
Something really interesting that I came across when looking for ways to help models with causality is generative flow networks or GFlowNets. This is something Yoshua Bengio is really excited about.
The idea behind GFlowNets is to represent the way humans reason sequentially, adding a new piece of relevant information at each step. It consists of both RL since the model learns a policy to solve a problem and generative modelling because it generates trajectories to get to its end state in the graph network.
To learn more about this, check out my Youtube video, explaining the high-level architecture with my brother’s lego pieces 😅:
That concludes my primary thoughts on these 3 areas that, in my opinion, need to be worked on to get to AGI. I’ve just started exploring these different theories, so if you have any arguments against my reasoning, please feel free to reach out on Twitter, LinkedIn, or email me at charania.shiza@gmail.com
P.S. many points in this post were based on an AGI debate I watched a couple of days ago held by Gary Marcus. I’d definitely give it a watch if you’re into this sort of stuff :)