Anyscale Endpoints is a fast and scalable API to integrate OSS LLMs into your app.  Use our growing list of high performance models or deploy your own.

Run, Fine Tune and Scale LLMs via production-ready APIs

Run on Anyscale Cloud <span>The best OSS LLMs at the best prices</span>

Private in your cloud<span>Any model, any data, anywhere</span>

Fast, cost-efficient, serverless APIs for LLM Serving and Fine Tuning

- Serverless
- Pay-as-you-go for $1 or less per million tokens
- Easy upgrade to Dedicated or Private Endpoints


Easy APIs you can query and fine-tune to power your apps without having to deal with infrastructure. Get started in minutes.

- Choose your accelerator, including A10, A100 and H100
- Deploy custom LLM and embedding models
- Network isolation and fine-grained access control

Dedicated GPUs to deploy your custom models and scale your production applications.

For $1 or less per million tokens, use our growing list of high performance models or deploy your own.

Validate your ideas with a familiar API, fine tune LLMs to get the quality you need, deploy your apps, and repeat.


llama-2-code

Anyscale Endpoints quickly adds new models, optimizations, and integrations to give you the best tools to build the best apps.

Use the same stack that powers the most household-name AI apps

Use your data in your cloud to build any LLM app

- A full-stack LLM API solution running in your Cloud
- Built-in alerting and observability
- Performance optimized at each layer
- Highly available and fault tolerant
- Enterprise-level security
- Built for ease of deployment and operations

Accelerate your Developers and your time to market with our production-ready LLM solution proven to maximize performance, minimize cost, and accelerate time to production.

- Built-in integrations
- Simple APIs
- Autoscaling
- Configurable optimizations
- Observability


Train, fine-tune, and run models where your data lives.

- Advanced access controls
- No public IPs
- Audit logs
- Cloud isolation

Technical Support, Model Refinement / Optimization teams, and customer success from the creators of Ray to make sure you unlock the value of LLMs for your use cases.

Anyscale Endpoints

Hosted Anyscale is in Private Preview. Click here to request access!

Anyscale is the leading AI application platform. With Anyscale, developers can build, run and scale AI applications instantly.

	Endpoints	Dedicated Endpoints
General
Infrastructure	Anyscale Cloud	Anyscale Cloud
Accelerator	Auto	Select from A10, A100, H100
Scaling	Auto	Configurable, up to 1000 GPUs
Rate Limiting	30 Concurrent requests, Cap increased upon request	No rate limiting
Compatibility	OpenAl Compatible API, Integration with Weights & Biases	OpenAl Compatible API, Integration with Weights & Biases
Pricing	$/Token	Based on compute and accelerator usage
Observability	Token Usage, System Status	Built in Grafana support, Alerting
Support	Fast response through email	Dedicated support
Customizability		Bring your custom model and optimize for throughput or latency
Performance Optimization
Models
Base Model Support	Llama2 Family (7B, 13B, 70B, Code Llama), Mistral 7B and more	Any compatible LLM or embedding model
Fine-Tuning
Fine-tuning
Security
Data Privacy	Data is never used for training.	Data is never used for training.
Security	API Key	SSO/SAML + API Key
Logging

Anyscale Endpoints

Get Started

Endpoints

Dedicated Endpoints

Product Comparison

Anyscale Endpoint Features

Use the best open-source LLMs

Llama-2-7b

Llama-2-13b

Llama-2-70b

Code Llama

Mistral 7B

Mixtral-8x7B

Llama-Guard-7b

gte-large

zephyr-7b

Start with the best open source LLMs in minutes

Use the same stack that powers the most household-name AI apps