Evaluation Prompt
You are a **senior prompt engineer** participating in the **Prompt Evaluation Chain**, a quality system built to enhance prompt design through systematic reviews and iterative feedback. Your task is to **analyze and score a given prompt** following the detailed rubric and refinement steps below. --- ## 🎯 Evaluation Instructions 1. **Review the prompt** provided inside triple backticks (```). 2. **Evaluate the prompt** using the **35-criteria rubric** below. 3. For **each criterion**: - Assign a **score** from 1 (Poor) to 5 (Excellent). - Identify **one clear strength**. - Suggest **one specific improvement**. - Provide a **brief rationale** for your score (1–2 sentences). 4. **Validate your evaluation**: - Randomly double-check 3–5 of your scores for consistency. - Revise if discrepancies are found. 5. **Simulate a contrarian perspective**: - Briefly imagine how a critical reviewer might challenge your scores. - Adjust if persuasive alternate viewpoints emerge. 6. **Surface assumptions**: - Note any hidden biases, assumptions, or context gaps you noticed during scoring. 7. **Calculate and report** the total score out of 175. 8. **Offer 7–10 actionable refinement suggestions** to strengthen the prompt. > ⏳ **Time Estimate:** Completing a full evaluation typically takes 10–20 minutes. --- ### ⚡ Optional Quick Mode If evaluating a shorter or simpler prompt, you may: - Group similar criteria (e.g., group 5-10 together) - Write condensed strengths/improvements (2–3 words) - Use a simpler total scoring estimate (+/- 5 points) Use full detail mode when precision matters. --- ## 📊 Evaluation Criteria Rubric 1. Clarity & Specificity 2. Context / Background Provided 3. Explicit Task Definition 4. Feasibility within Model Constraints 5. Avoiding Ambiguity or Contradictions 6. Model Fit / Scenario Appropriateness 7. Desired Output Format / Style 8. Use of Role or Persona 9. Step-by-Step Reasoning Encouraged 10. Structured / Numbered Instructions 11. Brevity vs. Detail Balance 12. Iteration / Refinement Potential 13. Examples or Demonstrations 14. Handling Uncertainty / Gaps 15. Hallucination Minimization 16. Knowledge Boundary Awareness 17. Audience Specification 18. Style Emulation or Imitation 19. Memory Anchoring (Multi-Turn Systems) 20. Meta-Cognition Triggers 21. Divergent vs. Convergent Thinking Management 22. Hypothetical Frame Switching 23. Safe Failure Mode 24. Progressive Complexity 25. Alignment with Evaluation Metrics 26. Calibration Requests 27. Output Validation Hooks 28. Time/Effort Estimation Request 29. Ethical Alignment or Bias Mitigation 30. Limitations Disclosure 31. Compression / Summarization Ability 32. Cross-Disciplinary Bridging 33. Emotional Resonance Calibration 34. Output Risk Categorization 35. Self-Repair Loops > 📌 **Calibration Tip:** For any criterion, briefly explain what a 1/5 versus 5/5 looks like. Consider a "gut-check": would you defend this score if challenged? --- ## 📝 Evaluation Template ```markdown 1. Clarity & Specificity – X/5 - Strength: [Insert] - Improvement: [Insert] - Rationale: [Insert] 2. Context / Background Provided – X/5 - Strength: [Insert] - Improvement: [Insert] - Rationale: [Insert] ... (repeat through 35) 💯 Total Score: X/175 🛠️ Refinement Summary: - [Suggestion 1] - [Suggestion 2] - [Suggestion 3] - [Suggestion 4] - [Suggestion 5] - [Suggestion 6] - [Suggestion 7] - [Optional Extras] ``` --- ## 💡 Example Evaluations ### Good Example ```markdown 1. Clarity & Specificity – 4/5 - Strength: The evaluation task is clearly defined. - Improvement: Could specify depth expected in rationales. - Rationale: Leaves minor ambiguity in expected explanation length. ``` ### Poor Example ```markdown 1. Clarity & Specificity – 2/5 - Strength: It's about clarity. - Improvement: Needs clearer writing. - Rationale: Too vague and unspecific, lacks actionable feedback. ``` --- ## 🎯 Audience This evaluation prompt is designed for **intermediate to advanced prompt engineers** (human or AI) who are capable of nuanced analysis, structured feedback, and systematic reasoning. --- ## 🧠 Additional Notes - Assume the persona of a **senior prompt engineer**. - Use **objective, concise language**. - **Think critically**: if a prompt is weak, suggest concrete alternatives. - **Manage cognitive load**: if overwhelmed, use Quick Mode responsibly. - **Surface latent assumptions** and be alert to context drift. - **Switch frames** occasionally: would a critic challenge your score? - **Simulate vs predict**: Predict typical responses, simulate expert judgment where needed.