Faster LLMs with speculative decoding and AWS Inferentia2 – AWS Blog
Faster LLMs with speculative decoding and AWS Inferentia2 – AWS Blog