Skip to main content

Roadmap

What we shipped, what we are building next, and what we plan to build.

Last Shipped

Jinja2 Template Support in the Playground

Use Jinja2 templating in prompts to add conditional logic, filters, and template blocks. The template format is stored in the configuration schema, and the SDK handles rendering automatically.

Programmatic Evaluation through the SDK

Run evaluations programmatically from code with full control over test data and evaluation logic. Evaluate agents built with any framework and view results in the Agenta dashboard.

Online Evaluation

Automatically evaluate every request to your LLM application in production. Catch hallucinations and off-brand responses as they happen instead of discovering them through user complaints.

Customize LLM-as-a-Judge Output Schemas

Configure LLM-as-a-Judge evaluators with custom output schemas. Use binary, multiclass, or custom JSON formats. Enable reasoning for better evaluation quality.

Structured Output Support in the Playground

Define and validate structured output formats in the playground. Save structured output schemas as part of your prompt configuration.

Vertex AI Provider Support

IntegrationPlayground

Use Google Cloud's Vertex AI models including Gemini and partner models in the playground, Model Hub, and through Gateway endpoints.

Filtering Traces by Annotation

Filter and search for traces based on their annotations. Find traces with low scores or feedback quickly using the rebuilt filtering system.

In progress

Chat Session View in Observability

Display entire chat sessions in one consolidated view. Currently, each trace in a chat session appears in a separate tab. This feature will group traces by session ID and show the complete conversation in a single view.

Navigation Links from Traces to App/Environment/Variant

Add clickable links in the observability trace and drawer view to navigate to the application, variant, version, and environment used in each trace. Makes it easy to jump directly to the configuration that generated a specific trace.

Support for built-in LLM Tools (e.g. web search) in the Playground

We are adding the ability to use built-in LLM tools (e.g. web search) in the playground.

Folders for Prompt Organization

Create folders and subfolders to organize prompts in the playground. Move prompts between folders and search within specific folders to structure prompt libraries.

Projects and Workspaces

Improve organization structure by adding projects. Create projects for different products and scope resources to specific projects.

PDF Support in the Playground

Add PDF support for models that support it (OpenAI, Gemini, etc.) through base64 encoding, URLs, or file IDs. Support extends to human evaluation for reviewing model responses on PDF inputs.

Prompt Snippets

Create reusable prompt snippets that can be referenced across multiple prompts. Reference specific versions or always use the latest version to maintain consistency across prompt variants.

Date Range Filtering in Metrics Dashboard

We are adding the ability to filter traces by date range in the metrics dashboard.

Planned

AI-Powered Prompt Refinement in the Playground

Analyze prompts and suggest improvements based on best practices. Identify issues, propose refined versions, and allow users to accept, modify, or reject suggestions.

Open Observability Spans Directly in the Playground

PlaygroundObservability

Add a button in observability to open any chat span directly in the playground. Creates a stateless playground session pre-filled with the exact prompt, configuration, and inputs for immediate iteration.

Improving Navigation between Testsets in the Playground

We are making it easy to use and navigate in the playground with large testsets .

Appending Single Testcases in the Playground

Using testcases from different testsets is not possible right now in the Playground. We are adding the ability to append a single testcase to a testset.

Improving Testset View

We are reworking the testset view to make it easier to visualize and edit testsets.

Prompt Caching in the SDK

We are adding the ability to cache prompts in the SDK.

Testset Versioning

We are adding the ability to version testsets. This is useful for correctly comparing evaluation results.

Tagging Traces, Testsets, Evaluations and Prompts

We are adding the ability to tag traces, testsets, evaluations and prompts. This is useful for organizing and filtering your data.

Feature Requests

Upvote or comment on the features you care about or request a new feature.

Request a feature