Skip to main content

EvalSharp

EvalSharp is a powerful and extensible suite of LLM evaluation metrics built for the .NET ecosystem. Whether you're evaluating an intelligent chatbot, summarization tool, or agent-based workflow, EvalSharp gives you the tools to measure LLM performance with precision and transparency.

EvalSharp is inspired by DeepEval and brings the same high-level evaluation primitives to .NET developers.

Why EvalSharp?

Modern LLM applications require reliable, explainable, and repeatable evaluation. EvalSharp helps you:

  • Validate model output with task-based, context-aware metrics
  • Run automated tests in your CI pipeline
  • Generate synthetic evaluation datasets
  • Benchmark models using real-world and synthetic data

It is designed to be developer-friendly, extensible, and production-ready—whether you're building internal tools or evaluating commercial AI solutions.

Key Features

  • Task Completion, Answer Relevancy, Tool Correctness, Faithfulness, Hallucination Detection, and more
  • LLM-as-a-judge architecture
  • Easy to integrate into your testing suites.
  • Generate and evaluate golden datasets
  • Self-explaining evaluation outputs

Getting Started

To get started, see the Quickstart Guide.

Installation

Install the core package via NuGet:

dotnet add package EvalSharp