Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat.
The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too.
But most existing models specialize in pure language or pure code. There's a separation that is limiting.
The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language.
Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning.
More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess.
It beat alternatives at:
- Using tools like Python and Wikipedia to enhance reasoning
- Debugging code by leveraging error messages
- Improving the most from natural language feedback
- Exploring partially observable environments like cybersecurity and web browsing simulations.
Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents.
TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges.
Full summary is here. Paper is here.
[link] [comments]