Claude 3.5 Haiku on AWS Trainium2 and model distillation in Amazon Bedrock
As part of our expanded collaboration with AWS, we’ve begun optimizing Claude models to run on AWS Trainium2, their most advanced AI chip.
To preview what’s possible with Trainium2, Claude 3.5 Haiku now supports latency-optimized inference in Amazon Bedrock, making the model significantly faster without compromising accuracy.
We’re also adding support for model distillation in Amazon Bedrock, bringing the intelligence of larger Claude models to our faster and more cost-effective models.
Next-gen models on Trainium2
We are collaborating with AWS to build Project Rainier—an EC2 UltraCluster of Trn2 UltraServers containing hundreds of thousands of Trainium2 chips. This cluster will deliver more than five times the computing power (in exaflops) used to train our current generation of leading AI models.
Trainium2 enables us to offer faster models in Amazon Bedrock, starting with Claude 3.5 Haiku which now supports latency-optimized inference in public preview. By enabling latency optimization, Claude 3.5 Haiku can deliver up to 60% faster inference speed—making it the ideal choice for use cases ranging from code completions to real-time content moderation and chatbots.
This faster version of Claude 3.5 Haiku, powered by Trainium2, is available in the US East (Ohio) Region via cross-region inference and is offered at $1 per million input tokens and $5 per million output tokens.
Amazon Bedrock Model Distillation
We’re also enabling customers to get frontier performance from Claude 3 Haiku—our most cost-effective model from the last generation. With distillation, Claude 3 Haiku can now achieve significant performance gains, reaching Claude 3.5 Sonnet-like accuracy for specific tasks—at the same price and speed of our most cost-effective model.
This technique transfers knowledge from the "teacher" (Claude 3.5 Sonnet) to the "student" (Claude 3 Haiku), enabling customers to run sophisticated tasks like retrieval augmented generation (RAG) and data analysis at a fraction of the cost.
Unlike traditional fine-tuning, which requires developers to manually craft training examples and continuously adjust parameters, Amazon Bedrock Model Distillation automates the entire process by:
- Generating synthetic training data from Claude 3.5 Sonnet
- Training and evaluating Claude 3 Haiku
- Hosting the final distilled model for inference
Amazon Bedrock Model Distillation automatically applies different data synthesis methods—from generating similar prompts to creating new high-quality responses based on your example prompt-response pairs.
Distillation for Claude 3 Haiku in Amazon Bedrock is now available in preview. Learn more in the AWS launch blog and documentation.
Lower prices for Claude 3.5 Haiku
In addition to offering a faster version on Trainium2, customers can continue to access Claude 3.5 Haiku on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.
To make this model even more accessible for a wide range of use cases, we’re lowering the price of Claude 3.5 Haiku to $0.80 per million input tokens and $4 per million output tokens across all platforms.
Get started
Starting today, model distillation and the faster Claude 3.5 Haiku are available in preview in Amazon Bedrock. For developers seeking the optimal balance of price, performance, and speed, you now have expanded model options with Claude:
- Claude 3.5 Haiku with latency optimization, powered by Trainium2, for general use cases
- Claude 3 Haiku, distilled with frontier performance, for high-volume, repetitive use cases
To get started, visit the Amazon Bedrock console. We can’t wait to see what you build.