Anyscale Endpoints

Run, Fine Tune and Scale LLMs via production-ready APIs

Get Started

Fast, cost-efficient, serverless APIs for LLM Serving and Fine Tuning

Start with

Endpoints

Easy APIs you can query and fine-tune to power your apps without having to deal with infrastructure. Get started in minutes.


  • Serverless
  • Pay-as-you-go for $1 or less per million tokens
  • Easy upgrade to Dedicated or Private Endpoints
Coming soon

Dedicated Endpoints

Dedicated GPUs to deploy your custom models and scale your production applications.


  • Choose your accelerator, including A10, A100 and H100
  • Deploy custom LLM and embedding models
  • Network isolation and fine-grained access control

Product Comparison

EndpointsDedicated Endpoints
General
InfrastructureAnyscale CloudAnyscale Cloud
Accelerator Auto Select from A10, A100, H100
ScalingAutoConfigurable, up to 1000 GPUs
Rate Limiting30 Concurrent requests,
Cap increased upon request
No rate limiting
CompatibilityOpenAl Compatible API, Integration with Weights & BiasesOpenAl Compatible API, Integration with Weights & Biases
Pricing$/TokenBased on compute and accelerator usage
ObservabilityToken Usage, System StatusBuilt in Grafana support, Alerting
SupportFast response through emailDedicated support
CustomizabilityBring your custom model and optimize for throughput or latency
Performance Optimization
Models
Base Model SupportLlama2 Family (7B, 13B, 70B, Code Llama), Mistral 7B and moreAny compatible LLM or embedding model
Fine-Tuning
Fine-tuning
Security
Data PrivacyData is never used for training.Data is never used for training.
SecurityAPI KeySSO/SAML + API Key
Logging

Anyscale Endpoint Features

Use the best open-source LLMs

For $1 or less per million tokens, use our growing list of high performance models or deploy your own.

Llama-2-7b

Llama-2-13b

Llama-2-70b

Code Llama

Mistral 7B

Mixtral-8x7B

Llama-Guard-7b

gte-large

zephyr-7b

llama-2-code
llama-2-code

Start with the best open source LLMs in minutes

Validate your ideas with a familiar API, fine tune LLMs to get the quality you need, deploy your apps, and repeat.

Use the same stack that powers the most household-name AI apps
Use the same stack that powers the most household-name AI apps

Use the same stack that powers the most household-name AI apps

Anyscale Endpoints quickly adds new models, optimizations, and integrations to give you the best tools to build the best apps.