EvalSharp

EvalSharp is a powerful and extensible suite of LLM evaluation metrics built for the .NET ecosystem. Whether you're evaluating an intelligent chatbot, summarization tool, or agent-based workflow, EvalSharp gives you the tools to measure LLM performance with precision and transparency.

EvalSharp is inspired by DeepEval and brings the same high-level evaluation primitives to .NET developers.

Why EvalSharp?

Modern LLM applications require reliable, explainable, and repeatable evaluation. EvalSharp helps you:

Validate model output with task-based, context-aware metrics
Run automated tests in your CI pipeline
Generate synthetic evaluation datasets
Benchmark models using real-world and synthetic data

It is designed to be developer-friendly, extensible, and production-ready—whether you're building internal tools or evaluating commercial AI solutions.

Key Features

Task Completion, Answer Relevancy, Tool Correctness, Faithfulness, Hallucination Detection, and more
LLM-as-a-judge architecture
Easy to integrate into your testing suites.
Generate and evaluate golden datasets
Self-explaining evaluation outputs

Getting Started

To get started, see the Quickstart Guide.

Installation

Install the core package via NuGet:

dotnet add package EvalSharp

Why EvalSharp?​

Key Features​

Getting Started​

Installation​

Why EvalSharp?

Key Features

Getting Started

Installation