How to run inference after SFT tuning ? #2459

alphaversedev · 2024-02-08T07:25:21Z

alphaversedev
Feb 8, 2024

Hi,
Is there a sample code of inference after finetuning using SFT ? I can see files are created in the output dir, but got error saying : RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.

this is how I load the model:
model = AutoModelForCausalLM.from_pretrained(
"D:/model_folder",
device_map="auto",
trust_remote_code=True,
temperature = 0.8).eval()

I am using Qwen1.5 for the finetune.

thanks
lixin

Answered by hiyouga

Feb 8, 2024

use CLI demo to infer trained models

View full answer

hiyouga · 2024-02-08T08:19:19Z

hiyouga
Feb 8, 2024
Maintainer

use CLI demo to infer trained models

3 replies

alphaversedev Feb 8, 2024
Author

thanks, it works, though sometime it keeps generating duplicated answers.
By adding torch_dtype=torch.float16, I am also able to generate answers from the finetuned result now.

I am using Qwen1.5-1.8B-Chat-GPTQ-Int8, with a very small dataset (80+ pairs). the answer does not seem to be good enough. I am not sure if this is the quality/quantity of the dataset, or something else.

alphaversedev Feb 8, 2024
Author

at end of the finetuning, saw this :
[INFO|modelcard.py:452] 2024-02-08 17:31:51,344 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

anything wrong or can just ignore this ?

hiyouga Feb 8, 2024
Maintainer

ignore it

alphaversedev · 2024-02-08T22:46:48Z

alphaversedev
Feb 8, 2024
Author

I tried increasing the n_epochs, from 3 to eventually 800, finally getting some meaningful response. However, I worry that this will have model overfitting.

The loss is pretty low now :
epoch = 800.0
train_loss = 0.0605

appreciate your work and support very much, @hiyouga !

0 replies

navdeepkjohal · 2024-08-04T16:13:40Z

navdeepkjohal
Aug 4, 2024

Hello,

Is there any way to pass the entire dataset to the inference code of LLama-Factory and get the response for that. Any python code would help? The command

llamafactory-cli chat examples/inference/llama3_lora_sft.yaml

prompts the user to enter an input. I am looking for some code where I pass my test data to the finetuned model as entries of a json file and model gives me a response that I could save for my analysis. My test dataset is has 11K entries in .json file.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run inference after SFT tuning ? #2459

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to run inference after SFT tuning ? #2459

alphaversedev Feb 8, 2024

Replies: 3 comments · 3 replies

hiyouga Feb 8, 2024 Maintainer

alphaversedev Feb 8, 2024 Author

alphaversedev Feb 8, 2024 Author

hiyouga Feb 8, 2024 Maintainer

alphaversedev Feb 8, 2024 Author

navdeepkjohal Aug 4, 2024

alphaversedev
Feb 8, 2024

Replies: 3 comments 3 replies

hiyouga
Feb 8, 2024
Maintainer

alphaversedev Feb 8, 2024
Author

alphaversedev Feb 8, 2024
Author

hiyouga Feb 8, 2024
Maintainer

alphaversedev
Feb 8, 2024
Author

navdeepkjohal
Aug 4, 2024