From now to AGI – What will be the key advancements needed?

Please comment on what you believe will be a necessary development to reach AGI.

To start, I'll try to frame what we have now in such a way that it becomes apparent what is missing, if we were to compare AI to human intelligence, and how we might achieve it:

What we have:

Verbal system 1 (intuitive, quick) thinkers: This is your normal gpt-4o. It fits the criteria for system 1 thinking and likely supersedes humans in almost all verbal system 1 thinking aspects.
Verbal system 2 (slow, deep) thinkers: This will be an o-series of models. This is yet to supersede humans, but progress is quick and I deem it plausible that it will supersede humans just by scale alone.
Integrated long-term memory: LLMs have a memory far superior to humans. They have seen much more data, and their retention/retrieval outperforms almost any specialist.
Integrated short/working memory: LLMs also have a far superior working memory, being able to take in and understand about 32k tokens, as opposed to ~7 items in humans.

What we miss:

Visual system 1 thinkers: Currently, these models are already quite good but not yet up to par twithhumans. Try to ask 4o to describe an ARC puzzle, and it will still fail to mention basic parts.
Visual system 2 thinkers: These lack completely, and it would likely contribute to solving visuo-spatial problems a lot better and easier. ARC-AGI might be just one example of a benchmark that gets solved through this type of advancement.
Memory consolidation / active learning: More specifically, storing information from short to long-term memory. LLMs currently can't do this, meaning they can't remember stuff beyond context length. This means that it won't be able to do projects exceeding context length very well. Many believe LLMs need infinite memory/bigger context length, but we just need memory consolidation.
Agency/continuity: The ability to use tools/modules and switch between them continuously is a key missing ingredient in turning chatbots into workers and making a real economic impact.

How we might get there:

Visual system 1 thinkers likely will be solved by scale alone, as we have seen massive improvements from vision models already.
As visual system 1 thinkers become closer to human capabilities, visual system 2 thinkers will be an achievable training goal as a result of that.
Memory consolidation is currently a big limitation of the architecture: it is hard to teach the model new things without it forgetting previous information (catastrophic forgetting). This is why training runs are done separately and from the ground up. GPT-3 is trained separately from GPT-2, and it had to relearn everything GPT-2 already knew. This means that there is a huge compute overhead for learning even the most trivial new information, thus requiring us to find a solution to this problem.
- One solution might be some memory-retrieval/RAG system, but this is way different from how the brain stores information. The brain doesn't store information in a separate module but dissipates it dissipatively across the neocortex, meaning it gets directly integrated into understanding. When it has modularized memory, it loses the ability to form connections and deeply understand these memories. This might require an architecture shift if there isn't some way to have gradient descent deprioritize already formed memories/connections.
It has been said that 2025 will be the year of agents. Models get trained end-to-end using reinforcement learning (RL) and can learn to use any tools, including its own system 1 and 2 thinking. Agency will also unlock abilities to do things like play Go perfectly, scroll the web, and build web apps, all through the power of RL. Finding good reward signals that generalize sufficiently might be the biggest challenge, but this will get easier with more and more computing power.

If this year proves that agency is solved, then the only thing removing us from AGI is memory consolidation. This doesn't seem like an impossible problem, and I'm curious to hear if anyone already knows about methods/architectures that effectively deal with memory consolidation while maintaining transformer's benefits. If you believe there is something incorrect/missing in this list, let me know!

submitted by /u/PianistWinter8293
[link] [comments]