A Simple Checklist for Self-Evaluating Prompt Quality
A Simple Checklist for Self-Evaluating Prompt Quality

A Simple Checklist for Self-Evaluating Prompt Quality

How do you evaluate the quality of your prompt outputs? Here's a handy checklist. Let's have a look!

You can also join r/PromptWizards to find more tutorials and prompts!

Part 1: Understanding AI's Understanding

You've presented a prompt to your AI, the next questions are:

  1. Has the AI accurately grasped the context?
    1. If not, how can I make sure the LLM steers my context better, should I be more direct and clear in my prompt? Can I be less negative (shows to perform less) and be more guiding to the LLM?
  2. Do the responses directly address the question or topic?
    1. Was my query and task/instruction clearly detailed in enough depth that the LLM understood what I expect?
  3. Are there any contradictions between different responses to the same prompt?
    1. If I run my prompt multiple times, is the output consistent and reliable?
  4. Are any repetitions apparent in the output, and if so, are they necessary?

Part 2: The Subtleties Matter

The AI's grasp of finer details can make a world of difference in the generated output. Reflect on these:

  1. Does the language match your output's expectations?

  2. Were the AI's responses unbiased?

  3. Did the AI veer off-topic at any stage?

  4. Did the AI 'hallucinate' - create any misleading or incorrect information?

Part 3: Deep Evaluation of AI Output

The meaningful evaluation of your AI's output involves several key areas of consideration:

  1. Was the output's length and structuring fitting for its intended use?

  2. Did the AI handle nuances, complexities, or subtleties effectively?

  3. Was the AI successful in executing multi-step tasks if they were part of the prompt?

  4. If relevant, were past context or conversations incorporated well into the response?

  5. Could additional guiding examples or context benefit the prompt?

  6. Can the response's creativity, novelty, or depth be improved?

And finally,

  1. Has the AI displayed a thorough understanding of the user's set goals?

  2. Did the AI abide by any given constraints in its responses?

  3. Was the AI's data or factual information accurate and useful?

submitted by /u/LouisMittelstaedt
[link] [comments]