EvalSharp
EvalSharp is a powerful and extensible suite of LLM evaluation metrics built for the .NET ecosystem. Whether you're evaluating an intelligent chatbot, summarization tool, or agent-based workflow, EvalSharp gives you the tools to measure LLM performance with precision and transparency.
EvalSharp is inspired by DeepEval and brings the same high-level evaluation primitives to .NET developers.
Why EvalSharp?
Modern LLM applications require reliable, explainable, and repeatable evaluation. EvalSharp helps you:
- Validate model output with task-based, context-aware metrics
- Run automated tests in your CI pipeline
- Generate synthetic evaluation datasets
- Benchmark models using real-world and synthetic data
It is designed to be developer-friendly, extensible, and production-ready—whether you're building internal tools or evaluating commercial AI solutions.
Key Features
- Task Completion, Answer Relevancy, Tool Correctness, Faithfulness, Hallucination Detection, and more
- LLM-as-a-judge architecture
- Easy to integrate into your testing suites.
- Generate and evaluate golden datasets
- Self-explaining evaluation outputs
Getting Started
To get started, see the Quickstart Guide.
Installation
Install the core package via NuGet:
dotnet add package EvalSharp