> ## Documentation Index
> Fetch the complete documentation index at: https://tensorfuse.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploying SpeechT5 on serverless GPUs

> Deploy serverless GPU applications on your AWS account

Built with developer experience in mind, Tensorkube simplifies the process of deploying serverless GPU apps. In this guide,
we will walk you through the process of deploying SpeechT5 on it.

## Prerequisites

Before you begin, ensure you have the configured Tensorkube on your AWS account. If you haven't done that yet, follow the [Getting Started](/getting-started-tensorkube) guide.

## Deploying SpeechT5 on Tensorfuse

Each tensorkube deployment requires two things - your code and your environment (as a Dockerfile).
While deploying machine learning models, it is beneficial if your model is also a part of your container image. This reduces cold-start times by a significant margin.

### Code files

We will write a small FastAPI app that loads the model and serves predictions. The FastAPI app will have two endpoints - `/readiness` and `/tts`. Remember that the `/readiness` endpoint is used by Tensorkube to check the health of your deployments.

```python tts_deploy.py theme={null}
import torch
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset

app = FastAPI()
device = 0 if torch.cuda.is_available() else -1

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")

class AppInput(BaseModel):
    text: str

@app.get("/")
async def root():
    is_cuda_available = torch.cuda.is_available()
    return {
        "message": "Hello World",
        "cuda_available": is_cuda_available,
    }

@app.get("/readiness")
async def readiness():
    return {"status": "ready"}

# endpoint for text to speech
@app.post("/tts")
async def generate_text(input: AppInput):
    inputs = processor(text=input.text, return_tensors="pt")
    embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
    speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
    vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
    speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)

    return {"speech": speech.numpy().tolist()}

```

### Environment files (Dockerfile)

Next, create your `requirements.txt` file

```txt requirements.txt theme={null}
torch
transformers
datasets
fastapi
uvicorn
pydantic
SentencePiece
```

And finally, a Dockerfile for your FastAPI app. Given below is a simple Dockerfile that you can use:

```dockerfile Dockerfile theme={null}
# Use the nvidia cuda base image
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04

RUN apt-get update && apt-get install -y \
    python3.10 \
    python3.10-dev \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN ln -s /usr/bin/python3.10 /usr/bin/python

WORKDIR /app
COPY . .
RUN pip3 install -r requirements.txt

EXPOSE 80
CMD ["uvicorn", "tts_deploy:app", "--host", "0.0.0.0", "--port", "80"]
```

### Deploying the app

SpeechT5 is now ready to be deployed on Tensorkube. Navigate to your project root and run the following command:

```bash theme={null}
tensorkube deploy --gpus 1 --gpu-type a10g
```

Speech T5 is now deployed on your AWS account. You can access your app at the URL provided in the output or using the following command:

```bash theme={null}
tensorkube list deployments
```

followed by

```bash theme={null}
tensorkube get deployment <deployment-id>
```

And that's it! You have successfully deployed SpeechT5 on serverless GPUs using Tensorkube. 🚀
