> ## Documentation Index
> Fetch the complete documentation index at: https://tensorfuse.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Deployments

> Learn how to deploy your containerized applications as serverless, auto-scaling API endpoints on Tensorfuse.

A **Deployment** is a serverless, auto-scaling API endpoint that runs your containerized application on Tensorfuse. You provide your code and a configuration, and Tensorfuse handles the entire lifecycle: building your container, provisioning infrastructure, deploying the application, and serving traffic.

This lets you turn any model or application into a scalable, production-ready service with a single command.

### Anatomy of a Deployment

Every Tensorfuse Deployment consists of three key components. This separation of concerns makes your projects clean, portable, and easy to manage.

1. **Application Code:** This is the core logic of your service. It can be a FastAPI app, a vLLM server for a large language model, or any other application that can be containerized.

2. **Environment (`Dockerfile`):** A **`Dockerfile`** defines your application's environment. It specifies the base image, system dependencies, Python packages, and the command needed to start your service. Tensorfuse uses this file to build a container image that is identical for development and production.

3. **Configuration (`deployment.yaml`):** This YAML file defines the **infrastructure and runtime settings** for your Deployment. Here, you specify the required resources (like GPU type and count), scaling parameters, secrets to inject, and health check endpoints.

### The Deployment Workflow

When you run the `tensorkube deploy` command, Tensorfuse performs the following steps automatically:

1. **Builds** your `Dockerfile` into a container image.
2. **Pushes** the image to a private container registry (ECR) inside your AWS account.
3. **Provisions** the hardware you requested in your configuration.
4. **Deploys** your container and connects it to the autoscaler.
5. **Exposes** a secure HTTPS endpoint to serve traffic.

### Configuring Your Deployment

You can configure your deployment's resources and behavior in two ways: command-line flags or a `deployment.yaml` file.

While CLI flags are useful for quick tests, **we strongly recommend using a `deployment.yaml` file** for production workloads. This allows you to version control your infrastructure configuration alongside your code, following a GitOps approach.

To deploy, simply run:

```bash theme={null}
tensorkube deploy --config-file deployment.yaml
```

#### Example configuration

A typical `deployment.yaml` file specifies the required GPUs, attaches secrets, and defines a readiness probe. For a full list of
available configuration options, refer to the [Deployment Configuration Reference](/concepts/configuration).

```yaml theme={null}
# Request 4 L40S GPUs for this deployment
gpus: 4
gpu_type: l40s

# Attach secrets containing API keys or tokens.
# These will be available as environment variables.
secret:
  - hugging-face-secret
  - vllm-token

# Define a health check to ensure the app is ready for traffic
readiness:
  httpGet:
    path: /health
    port: 80
```

### Readiness probe

A readiness probe is a crucial part of a production-grade deployment. It's a health check endpoint that
Tensorfuse uses to determine if your application has started successfully and is ready to accept traffic.

If you don't configure a `readiness` endpoint, Tensorfuse will not know when your container is truly ready,
which can lead to failed requests. Always include a `readiness` block in your `deployment.yaml` to ensure your
deployments are robust and reliable.
