> ## Documentation Index
> Fetch the complete documentation index at: https://www.adaline.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Auto Prompt Optimization

> Understand Improve candidate generation, safety gates, prompt diffs, regression reports, traffic comparisons, and release signals

Auto Prompt Optimization is the part of Improve that proposes prompt changes. Adaline generates candidate prompt snapshots, scores them against available evidence, rejects unsafe or regressing candidates, and packages the selected candidate for review.

<img src="https://mintcdn.com/adaline/o8h3k4eQQbaIV193/images/platform-v2/improve/cycle-review-top.png?fit=max&auto=format&n=o8h3k4eQQbaIV193&q=85&s=4e2effcdac95f3c72c646116ca578ee7" alt="Improve review page showing the selected prompt candidate, prompt diff, and real traffic comparison" title="Improve candidate signals" style={{ width: "100%" }} width="3350" height="1712" data-path="images/platform-v2/improve/cycle-review-top.png" />

## What can change

| Prompt area         | Example change                                         | Review concern                                        |
| ------------------- | ------------------------------------------------------ | ----------------------------------------------------- |
| **Instructions**    | Add or clarify a constraint.                           | Avoid broad rules that affect unrelated traffic.      |
| **Examples**        | Add a demonstration of the desired behavior.           | Avoid overfitting one customer or trace.              |
| **Variables**       | Clarify how runtime inputs should be used.             | Confirm variable mapping still works.                 |
| **Model settings**  | Adjust supported generation settings.                  | Check cost, latency, determinism, and output length.  |
| **Response schema** | Tighten structured output requirements.                | Confirm downstream consumers still accept the output. |
| **Tool guidance**   | Clarify when to call a tool and what arguments matter. | Fix broken tools or backends outside the prompt.      |

For tool-using and coding agents, optimization may also affect routing rules, verification policy, tool descriptions, and few-shot examples. Always review the full diff.

## Candidate exploration

The **Prompts** stage summarizes the search.

| Signal                 | Meaning                                                     |
| ---------------------- | ----------------------------------------------------------- |
| **Variants explored**  | Prompt candidates generated for the run.                    |
| **Passed safety gate** | Candidates that did not regress protected checks.           |
| **Failed safety gate** | Candidates blocked by constraints or evaluator regressions. |
| **Strong contenders**  | Candidates with positive evidence after scoring.            |
| **Selected candidate** | The candidate packaged for review.                          |

More variants are not automatically better. A strong run finds a narrow candidate that improves the target issue and preserves healthy behavior.

## Read the diff first

The diff is the source of truth for what will change. Use the diagnosis and scores to understand *why* the change exists, but use the diff to decide whether the change is acceptable.

| Diff pattern                  | Usually good                           | Usually risky                                            |
| ----------------------------- | -------------------------------------- | -------------------------------------------------------- |
| **Narrow constraint**         | Matches the failing behavior.          | Changes all outputs broadly.                             |
| **Tool-use clarification**    | Explains when and how to call a tool.  | Hides a bad tool contract.                               |
| **Added example**             | Covers the failure and a healthy path. | Encodes private or one-off context.                      |
| **Output format change**      | Matches downstream requirements.       | Breaks existing consumers.                               |
| **Generation setting change** | Improves reliability or consistency.   | Moves cost, latency, or output quality without coverage. |

Reject or edit candidates that try to solve retrieval, provider, data, or backend failures through prompt text.

## Check regression evidence

<img src="https://mintcdn.com/adaline/o8h3k4eQQbaIV193/images/platform-v2/improve/review-regression-runtime.png?fit=max&auto=format&n=o8h3k4eQQbaIV193&q=85&s=26c103009dc205f9955b35784e10790a" alt="Regression report showing evaluator movement, cost, token, and latency tradeoffs for the selected candidate" title="Regression safety" style={{ width: "100%" }} width="1318" height="1024" data-path="images/platform-v2/improve/review-regression-runtime.png" />

The regression report compares baseline and candidate behavior across authored evaluators, auto generated evaluators, and validation cases.

Watch for:

* Protected evaluators moving down.
* Blank baseline or aggregate cells, which usually mean weak comparable scoring coverage.
* New generated checks that need review before becoming hard gates.
* Healthy dataset rows failing after the candidate improves the target issue.

An evaluator drop is not automatically fatal, but it needs a named owner and a reason.

## Inspect traffic comparison

<img src="https://mintcdn.com/adaline/o8h3k4eQQbaIV193/images/platform-v2/improve/traffic-comparison.png?fit=max&auto=format&n=o8h3k4eQQbaIV193&q=85&s=222b763aa3d8e803bd21da8614121d38" alt="Traffic comparison showing current prompt outputs beside improved candidate outputs for tested conversations" title="Traffic comparison" style={{ width: "100%" }} width="1310" height="784" data-path="images/platform-v2/improve/traffic-comparison.png" />

Traffic comparison answers the question metrics cannot fully answer: would you want users to receive this new output?

Use it to check tone, format, tool behavior, verbosity, structured output shape, and whether the original failure is actually fixed.

## Runtime tradeoffs

Prompt improvements can increase runtime cost. Before review, check:

| Signal            | Watch for                                                         |
| ----------------- | ----------------------------------------------------------------- |
| **Cost**          | More expensive model paths, longer generations, or extra calls.   |
| **Input tokens**  | Longer instructions, examples, or context packaging.              |
| **Output tokens** | More verbose answers or larger structured objects.                |
| **Latency**       | Slower model paths, extra tool calls, retries, or longer outputs. |

For high-volume prompts, cost or latency increases should be explicit release tradeoffs, not surprises.

<CardGroup cols={2}>
  <Card title="Review a Cycle" icon="git-compare" href="/improve/review-a-cycle">
    Make the approval, edit, or rejection decision.
  </Card>

  <Card title="Deploy your prompt" icon="rocket" href="/deploy/deploy-your-prompt">
    Move from reviewed prompt version to production rollout.
  </Card>
</CardGroup>
