Google launches Gemini
Google launches Gemini

Google launches Gemini

Some details (source):

  • 32k context length

  • efficient attention mechanisms (for e.g. multi-query attention (Shazeer, 2019))

  • audio input via Universal Speech Model (USM) (Zhang et al., 2023) features

  • no audio output? (Figure 2)

  • visual encoding of Gemini models is inspired by our own foundational work on Flamingo (Alayrac et al., 2022), CoCa (Yu et al., 2022a), and PaLI (Chen et al., 2022)

  • output images using discrete image tokens (Ramesh et al., 2021; Yu et al., 2022b)

  • supervised fine tuning (SFT) and reinforcement learning through human feedback (RLHF)

submitted by /u/becausecurious
[link] [comments]