Infinite context windows? Streaming LLMs can be extended to infinite sequence lengths without any fine-tuning.
LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, th…