If you want to build a proprietary AI small model, you need two steps:
Step 1: Choose an open-source AI model
Step 2: Fine tune the model
So when it comes to open source models, we have to mention huggingface
It is an open-source community dedicated to artificial intelligence models, providing a large number of pre trained models and datasets. Of course, the above also provides some functions that can directly call the large model, such as chatting, drawing, etc.
The following chart shows the trend of Google’s popularity, which has been on the rise.
Currently, 1133267 open-source models have been collected on Huggingface.

一、Training program

1.1、Full model training
- Starting from scratch to train a model, all model parameters will be initialized and updated based on the training data. The initial method used was to train the model directly through transformers.
- Ollama+Llama3.2
- Python 3.8+
- PyTorch
- Hugging Face Transformers
- Datasets
- CUDA
from transformers import LlamaForCausalLM, AutoTokenizer, Trainer, TrainingArguments, AutoModelForCausalLM, \
AutoTokenizer
from datasets import load_dataset
# 加载模型和分词器
model_name = “Llama-3.2-1B” # 替换为你的模型名称
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, legacy=False)
# 检查词汇文件路径
print(type(tokenizer))
# 确保分词器有 pad_token
if tokenizer.pad_token is None:
tokenizer.add_special_tokens({‘pad_token’: ‘[PAD]’})
model.resize_token_embeddings(len(tokenizer))
# 加载数据集
dataset = load_dataset(“json”, data_files=”training_data.json”)
……..
The trained model will be very large, for example, the original 2G model will have over 4G after complete training, of course, this is the size after removing the checkpoints. For example, quantifying parameters or compressing models can be used to shrink the model.
1.2、fine-tuned model
On the basis of pre training the model, further train the model using data from specific tasks to adapt to new tasks.
Generally, Lora is used for fine-tuning models. In addition to LoRA, Adapter Layers, Freeze, Initialize Tuning, Prompt Tuning, BitFit, and UniPELT are similar fine-tuning techniques.
二、Train the model
My local environment is Windows 11, so I trained the model directly on Windows because there is an NVIDIA graphics card on the machine.

I have chosen LLaMA Factory from Github 34.6k star to fine tune the Qwen2-0.5B model, which is relatively small. That’s about all the knowledge and process involved in fine-tuning one’s own exclusive AI model.
Through the above explanation, we can see that using open-source models and tools for training and fine-tuning can help us create exclusive AI applications. If you also want to master these technologies and apply them to practical projects, then Zhihu Zhixuetang’s AI application live course will be your best choice. In addition, the construction package in the course has been preliminarily encapsulated and built, making it relatively easier for everyone to get started quickly.
The following is a purely self built method without any encapsulation, which is relatively complex and difficult to get started with. It has a certain technical foundation.
Here are the detailed steps:
1. Environment Preparation
1.1 Install Python
Install Python 3.8 or higher version. You can download and install it from the Python official website.
1.2 Install CUDA
Download and install the CUDA version suitable for the graphics card from the NVIDIA official website.
nvidia-smi
CUDA<=12.6 is supported here


You can find version 12.6.0 in CUDA


nvcc -V
It indicates that the installation has been successful

2. Install LLaMA Factory
2.1 Install dependent pseudo environment (optional) Factory repository
Open a command-line tool (such as PowerShell or CMD) and run the following command to clone the LLaMA Factory repository:
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
2.2 Create a virtual environment (optional)
Although it is feasible, the initial installation may damage the environment. Therefore, in order to isolate the environment, it is recommended to create a virtual environment. If it is damaged, a new virtual environment can be created:
python -m venv llama_factory_env
.\llama_factory_env\Scripts\activate
2.3 Install dependencies
Install the required Python dependencies in a virtual environment::
pip install -r requirements.txt
Generally speaking, installation is a problem of not being able to download dependencies,
Another issue that needs to be noted is that llamafactory cli webui often reports the following error: webui cannot be opened
RuntimeError: Failed to import trl.trainer.ppo_config because of the following error (look up to see its traceback):
No module named ‘tyro’
No matter how I reinstall and install, it doesn’t work. I found a solution from a friend in GitHub issues later
pip install tyro==0.8.14
Finally launch the graphical interface
llamafactory-cli webui
3. Download Qwen2-0.5B model
3.1 Download model
You can download the Qwen2-0.5B model from Hugging Face or other model repositories. Assuming you have downloaded the model file and placed it in the models directory.
4. Configure LLaMA Factory
4.1 configuration file
You can directly configure parameters on the page, usually by default.
One thing to note here is that if the model trained according to the default configuration on the page is ineffective, you can try adjusting some parameters
You can also save your OK training parameters, as shown in the following figure
The following is the content of the YAML file
top.booster: auto
top.checkpoint_path:
– train_2024-11-20-16-03-49
top.finetuning_type: lora
top.model_name: Qwen2-0.5B
top.quantization_bit: none
top.quantization_method: bitsandbytes
top.rope_scaling: none
top.template: default
train.additional_target: ”
train.badam_mode: layer
train.badam_switch_interval: 50
train.badam_switch_mode: ascending
train.badam_update_ratio: 0.05
train.batch_size: 2
train.compute_type: fp16
train.create_new_adapter: false
train.cutoff_len: 1024
train.dataset:
– identity
train.dataset_dir: data
train.ds_offload: false
train.ds_stage: none
train.extra_args: ‘{“optim”: “adamw_torch”}’
train.freeze_extra_modules: ”
train.freeze_trainable_layers: 2
train.freeze_trainable_modules: all
train.galore_rank: 16
train.galore_scale: 0.25
train.galore_target: all
train.galore_update_interval: 200
train.gradient_accumulation_steps: 16
train.learning_rate: 1e-4
train.logging_steps: 5
train.lora_alpha: 32
train.lora_dropout: 0
train.lora_rank: 16
train.lora_target: ”
train.loraplus_lr_ratio: 0
train.lr_scheduler_type: cosine
train.mask_history: false
train.max_grad_norm: ‘1.0’
train.max_samples: ‘100000’
train.neat_packing: false
train.neftune_alpha: 0
train.num_train_epochs: ‘100.0’
train.packing: false
train.ppo_score_norm: false
train.ppo_whiten_rewards: false
train.pref_beta: 0.1
train.pref_ftx: 0
train.pref_loss: sigmoid
train.report_to: false
train.resize_vocab: false
train.reward_model: null
train.save_steps: 100
train.shift_attn: false
train.train_on_prompt: false
train.training_stage: Supervised Fine-Tuning
train.use_badam: false
train.use_dora: false
train.use_galore: false
train.use_llama_pro: false
train.use_pissa: false
train.use_rslora: false
train.val_size: 0
train.warmup_steps: 0
4.2 data preparation
To place the training data in the data directory or directly use the example training data provided by LLaMA Factory in the data directory, it is necessary to replace the variables in the example data with their own content.
5. Start fine-tuning
5.1 Start fine-tuning
In the graphical interface, click the “Start Training” button, and LLaMA Factory will start fine-tuning the Qwen2-0.5B model using the LoRA method.
5.2 Monitor the training process
You can monitor the training process in real-time in the graphical interface and view metrics such as loss and learning rate.
Triggering GPU usage during training
6. Use the fine tuned model
6.1 Save Model
After the training is completed, the fine tuned model will be saved in the output directory. When using it, you can directly select it from the checkpoint path, and the file name is based on the training time at that time
6.2 Use the fine tuned model
Directly use the “Chat” feature in the LLaMA Factory graphical interface to verify the fine tuned effect
