Meta Research: Language Modeling in a Sentence Representation Space?

Link to paper: https://scontent-lax3-2.xx.fbcdn.net/v/t39.2365-6/470149925_936340665123313_5359535905316748287_n.pdf?_nc_cat=103&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=AiJtorpkuKQQ7kNvgFB1Vbe&_nc_zt=14&_nc_ht=scontent-lax3-2.xx&_nc_gid=Ai90Z2yRtNh1RpGRcmdT4WW&oh=00_AYA6kvrqMczaZ8FhsXj7XgTNvFMysDwGUfZq2WfpOSAvDw&oe=6763E4D2

https://preview.redd.it/6u5yqmr8gw6e1.png?width=975&format=png&auto=webp&s=3c57851d962ec9eaceae1f15f0e3be3a9c082dab

Meta AI Research introduces a novel approach to language modeling in AI with "Large Concept Models" (LCMs), shifting from the token-based operations of traditional Large Language Models (LLMs) to focus on higher-level semantic representations called "concepts." These concepts, approximated as sentences, are modeled using the SONAR embedding space, which supports 200 languages across both text and speech. LCMs are trained for autoregressive sentence prediction using techniques like MSE regression, diffusion-based models, and quantized SONAR methods. Scaled from 1.6B to 7B parameters with training on up to 2.7T tokens, the model excels in tasks such as summarization and summary expansion, demonstrating strong zero-shot generalization across languages. This approach contrasts with the token-based, language-centric nature of current LLMs, which are heavily reliant on massive datasets and computational resources, often exceeding 400 billion parameters.

Current LLMs lack explicit hierarchical reasoning and planning at multiple abstraction levels, a characteristic inherent in human intelligence. Humans approach complex tasks through a top-down process, planning overall structures and refining details step by step. LCMs aim to fill this gap by modeling reasoning at a semantic level rather than language-specific tokens. This shift allows for more coherent and structured output, especially in tasks like document analysis, where humans navigate long texts by remembering relevant sections instead of processing every word. By focusing on abstract concepts, LCMs offer a promising direction for creating AI systems that are more flexible and capable of generalizing across languages and modalities.

The training code for LCMs is publicly available, encouraging further research and development. This open access aims to foster innovation in the field and provides a foundation for future work in hierarchical, language-agnostic AI models.

The mains characteristics of our generic Large Concept Model approach are as follows:

• Reasoning at an abstract language- and modality-agnostic level beyond tokens:

– We model the underlying reasoning process, not its instantiation in a particular language.

– The LCM can be trained, i.e. acquire knowledge, on all languages and modalities at once, promising scalability in an unbiased way.

• Explicit hierarchical structure:

– Better readability of long-form output by a human.

– Facilitates local interactive edits by a user.

• Handling of long context and long-form output:

– The complexity of a vanilla transformer model increases quadratically with the sequence length. This makes handling of large context windows challenging and several techniques have been developed to alleviate this problem, e.g., sparse attention (Child et al., 2019) or LSH attention (Kitaev et al., 2020). Our LCM operates on sequences which are at least an order of magnitude shorter.

• Unparalleled zero-shot generalization:

– Independently of the language or modality the LCM is pre-trained and fine-tuned on, it can be applied to any language and modality supported by the SONAR encoders, without the need of additional data or fine-tuning. We report results for multiple languages in the text modality.

• Modularity and extensibility:

– Unlike multimodal LLMs that can suffer from modality competition (Aghajanyan et al., 2023; Chameleon team, 2024), concept encoders and decoders can be independently developed and optimized without any competition or interference.

– New languages or modalities can be easily added for an existing system

This effort emphasizes a shift towards higher abstraction in AI models, potentially paving the way for more versatile and efficient systems that align better with human-like processing.

submitted by /u/ninjasaid13
[link] [comments]