
Supervised Fine-Tuning (SFT) with LoRA on Fireworks AI: Tutorial
By Fireworks AI|5/12/2025
Qwen 3 models are now available with SOTA reasoning, coding and agentic tool use capabilities. Try Qwen 3 now
By Fireworks AI|5/12/2025
Supervised Fine-Tuning (SFT) is critical for adapting general-purpose Large Language Models (LLMs) to domain-specific tasks, significantly improving performance in real-world applications. Fireworks AI facilitates easy and scalable SFT through its intuitive APIs and support for Low-Rank Adaptation (LoRA), allowing efficient fine-tuning without full parameter updates.
LoRA significantly reduces the computational cost of fine-tuning large models by updating only a small subset of parameters in a low-rank structure, making it particularly suitable for large models like LLaMA or DeepSeek.
qLoRA (Quantized LoRA) further improves efficiency by enabling fine-tuning of 4-bit and 8-bit quantized models (dependent on model types) without sacrificing performance, reducing memory requirements even more.
Fireworks AI supports both LoRA and qLoRA tuning, allowing up to 100 LoRA adaptations to run simultaneously on a dedicated deployment without extra cost.
Step-by-Step Guide to Fine-Tuning with Fireworks AI
Go to fireworks.ai > Model Library > Filter “Tunable”
You can also filter for “Serverless” models if you are planning to run it serverless
This ensures that you are selecting models which allow LoRA-based tuning and deployments. These models support uploading LoRA adapters even if they were trained on another platform. You can also upload custom models that have the same architecture as the tunable ones in the list, and those models will also be tunable. For example, a deepseek distilled llama 8b model works just as well as a vanilla llama 8b model.
Let’s say we select “DeepSeek R1 Distill Llama 70b”
Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.
Minimum Requirements:
Message Schema: Each training sample must include a messages array, where each message is an object with two fields:
Here's an example conversation dataset (one training example, pretty printed for illustration, but in JSONL it should be flattened to one line):
{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "Paris."}
]
}
Save this locally as trader_poe_sample_data.jsonl
b. Upload the dataset
Then you can upload the data set via the UI.
Go to Home> Datasets > Create Dataset
(Optional) You can also upload this dataset to Fireworks AI via firectl
To create a dataset, run:
firectl create dataset <DATASET_ID> path/to/dataset.jsonl
and you can check the dataset with:
firectl get dataset <DATASET_ID>
Step a: Select the model you want to fine-tune
Click on “Finetune this model”. Make sure the model is seen in the drop down list.
(This is another view for you to see all the models that are available to finetune)
Step b: Upload the dataset (Training & Eval Datasets)
You can upload your dataset from your local machine that you saved as a jsonl file, or if you created a Fireworks AI Dataset in Step 2, you can select that dataset from the dropdown.
💡 Note: Explanations can be found in the our docs page or you can run the command - firectl create sftj --help.
Once you run the fine-tuning job, you will see the job details on the “Fine-Tuning” page.
💡 Note: The maximum number of tokens packed into each training batch. (default 32768)
Once the fine-tuning job is completed. You will see “Deploy the LoRA” option at the top right.
Select the LoRA model from the drop-down list.
Click “Continue”
Then select your deployment. You can either deploy the LoRA model “serverless” or via an “on-demand deployment”.
Deployment Options:
💡 PS: Use an existing on-demand deployment to load the LoRA model onto (instead of serverless)
PS: As we are fine-tuning the DeepSeek R1 Distill Llama 70B, we will need to deploy the LoRA on -demand, rather than serverless. Use an existing on-demand deployment to load the LoRA model onto (instead of serverless)
Enter the model display name and click “Submit”
By following these steps, you can effectively adapt LLMs to your specific use case using Fireworks AI’s fine-tuning pipeline with LoRA. This approach ensures lower costs, faster training, and scalable deployment for real-world applications.