back to top
Thursday, January 23, 2025

Careers

Anthropic’s Claude Prompt Playground for AI App Improvement

Prompt engineering became a hot job last year in the AI industry, but it seems Anthropic is now developing tools to at least partially automate it.

Anthropic’s New Features

According to a company blog post, Anthropic released several new features on Tuesday to help developers create more useful applications with the startup’s language model, Claude. Developers can now use Claude 3.5 Sonnet to generate, test, and evaluate prompts, using prompt engineering techniques to create better inputs and improve Claude’s answers for specialized tasks.

Improving Prompt Effectiveness

Language models are pretty forgiving when you ask them to perform some tasks, but sometimes, minor changes to the wording of a prompt can lead to significant improvements in the results. Usually, you’d have to figure out that wording yourself or hire a prompt engineer, but this new feature offers quick feedback that could make finding improvements easier.

Anthropic Console’s Evaluate Tab

The features are housed within the Anthropic Console under a new Evaluate tab. Console is the startup’s test kitchen for developers, created to attract businesses looking to build products with Claude. One of the features, unveiled in May, is Anthropic’s built-in prompt generator; this takes a short description of a task and constructs a much longer, fleshed-out prompt, utilizing Anthropic’s prompt engineering techniques. While Anthropic’s tools may not replace prompt engineers altogether, the company said it would help new users and save time for experienced prompt engineers.

Testing and Evaluating Prompts

Within Evaluate, developers can test the effectiveness of their AI application’s prompts in a range of scenarios. They can upload real-world examples to a test suite or ask Claude to generate an array of AI-generated test cases. Developers can then compare the effectiveness of various prompts side-by-side and rate sample answers on a five-point scale.

Practical Example

In an example from Anthropic’s blog post, a developer identified that their application was giving answers that were too short across several test cases. The developer could tweak a line in their prompt to make the answers longer and apply it simultaneously to all their test cases. That could save developers time and effort, especially with little or no prompt engineering experience.

Anthropic's Claude Prompt Playground for AI App Improvement

CEO’s Perspective

Anthropic CEO and co-founder Dario Amodei said prompt engineering was one of the most essential things for widespread enterprise adoption of generative AI in an interview with Google Cloud Next earlier this year. “It sounds simple, but 30 minutes with a prompt engineer can often make an application work when it wasn’t before,” said Amodei.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here