If you're focused on output quality and evaluation in LLMs, I’ve created r/AIQuality —a community dedicated to those of us working to build reliable, hallucination-free systems.
Personally, I’ve faced constant challenges with evaluating my RAG pipeline. Should I use DSPy to build it? Which retriever technique works best? Should I switch to a different generator model? And most importantly, how do I truly know if my model is improving or regressing? These are the questions that make evaluation tough, but crucial.
With RAG and LLMs evolving rapidly, there wasn't a space to dive deep into these evaluation struggles—until now. That’s why I created this community: to share insights, explore cutting-edge research, and tackle the real challenges of evaluating LLM/RAG systems.
If you’re navigating similar issues and want to improve your evaluation process, join us. https://www.reddit.com/r/AIQuality/
[link] [comments]