> ## Documentation Index
> Fetch the complete documentation index at: https://tensorfuse.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Finetune Llama 3 70B on your AWS account

> Finetune LoRA adapters for popular models using axolotl styled declarative configs

# Fine-tuning Guide for Tensorfuse

This guide explains how to fine-tune Llama models using Tensorfuse's QLoRA implementation.

## Supported Models

| Model         | GPU Requirements      |
| ------------- | --------------------- |
| Llama 3.1 70B | 4x L40S (Recommended) |
| Llama 3.1 8B  | 1-2x A10G             |

## Dataset Preparation

Tensorfuse accepts datasets in JSONL format, where each line contains a valid JSON object.

The following example shows the format for a conversational dataset using the ChatML format:

```json theme={null}
{
  "messages": 
  [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is the capital of France?"
    },
    {
      "role": "assistant",
      "content": "The capital of France is Paris."
    }
  ]
}

```

## Dataset Commands

```bash theme={null}
# Create dataset
tensorkube datasets create --dataset-id my_dataset --path data.jsonl

# List datasets
tensorkube datasets list

# Delete dataset
tensorkube datasets delete --dataset-id my_dataset
```

Once you have created your dataset, you can start fine-tuning your model. But before that, you need to create an authentication token from huggingface.

## Authentication

Create required secrets. Tensorkube uses Kubernetes Event Driven Autoscaling (KEDA) under the hood to scale and schedule training runs. Hence, you need to create your
secrets in the `keda` environment:

```bash theme={null}
# Create Hugging Face token
tensorkube secret create hugging-face-secret HUGGING_FACE_HUB_TOKEN=your_token --env keda
```

## Programatic Access

Tensorfuse allows you to interact with the TensorKube cluster using the Python SDK, which provides a straightforward interface for creating fine-tuning jobs.

### Authentication

First, you need to create access keys, which are required to authenticate with the TensorKube cluster deployed in your cloud.

Run the following command:

```bash theme={null}
tensorkube train create-user --name <user-name>
```

This will create a new user and provide you with access keys.

Next, export the AWS keys as environment variables where you will be running the Python code:

```bash theme={null}
export AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY>
export AWS_SECRET_ACCESS_KEY=<AWS_SECRET>

```

The following code demonstrates how to create a fine-tuning job using the Python SDK. The create\_fine\_tuning\_job function fine-tunes a LLaMA 70B base model using L40S GPUs.

```python theme={null}
from tensorkube import create_fine_tuning_job

create_fine_tuning_job( # creates a fune tuning job
    job_name="fine-tuning-job", # Job Name. Required 
    job_id="unique_id", # Unique Job ID. Required 
    gpus=4, # Number of GPUs. Required 
    gpu_type="l40s", # GPU Type. Required 
    max_scale=1, # Maximum Scale. Required 
    base_model='meta-llama/Llama-3.1-70B-Instruct', # Base Model from hugging face. Required 
    dataset='dataset-id', # Dataset ID. Required 
    epochs=10, # Number of epochs. Required
    secrets=["hugging-face-secret"], # List of secrets 
    micro_batch_size=16, # Micro Batch Size. Optional, default is 16
    lora_r=8, # Lora R. default is 8.  Optional, default is 8
    learning_rate=0.00002 # Learning Rate. Optional, default is 0.00002
)

```

To know the status of the job, you can use the `get_job_status` function. The function returns the status of the job as `QUEUED`, `PROCESSING`, `COMPLETED`, or `FAILED`.

```python theme={null}
from tensorkube import get_job_status
status = get_job_status( # gets the status of the job
  job_name="fine-tuning-job", # Job Name. Required
  job_id="unique_id" # Unique Job ID. Required
)
```

Once the job is completed, the adapter is uploaded to s3.
If you go to your s3 console you can get your adapters as follows

* find the s3 bucket with prefix `tensorkube-keda-train-bucket`. All your training lora adapters will reside here. We construct adapter id from your `job-id` and the type of gpus used for training so your adapter urls would look like this:-
  `s3://<bucket-name>/lora-adapter/<job_name>/<job_id>`

Below is an example of a training adapter url with job\_name `fine-tuning-job` and job-id `unique_id`, trained on `4`  gpu of type `l40s`

```
s3://tensorkube-keda-train-bucket-d473253e-d692-4a15/lora-adapter/fine-tuning-job/unique_id
```

## Model Deployment

1. Clone Lorax repository:

```bash theme={null}
git clone https://github.com/tensorfuse/lorax
cd lorax/llama-70b
```

2. Use the following command to deploy

<Note>
  The below deploy command deploys lorax instance in default environment. Make sure you have created the hugging-face-secret in default environment. You can create secret in default environment by adding `--env default` flag in the secret creation command.
</Note>

```bash theme={null}
tensorkube deploy --gpus 4 --gpu-type L40S --secret hugging-face-secret --secret aws-secret
```

This will deploy the base model with the `lorax` library.

3. Get. your deployment url using `tensorkube list deployments`.

## Inference

You can now use the deployment URL to make inference requests. Here is an example using `curl`. This will query the base model without any adapters.

```bash theme={null}
curl ${ENDPOINT}/generate -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "inputs": "[INST] Your prompt here [/INST]",
    "parameters": {
      "max_new_tokens": 64
    }
  }'

```

For using the adapter, you can use the following command:

```bash theme={null}
curl ${ENDPOINT}/generate -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "inputs": "[INST] Your prompt here [/INST]",
    "parameters": {
      "max_new_tokens": 64,
      "adapter_id": "s3://your-bucket/lora-adapter/your-adapter-path",
      "adapter_source": "s3"
    }
  }'
```
