NLLB is unable to translate into a complete long sentence in Chinese. #5549

logicvv · 2024-10-10T02:31:43Z

🐛 Bug

Hi, I tried to test nllb for translating some English sentences to Chinese, and all my sentences are less than 60 tokens. However, most of sentences which more than 30 tokens cannot be generated completely, only half or less part of them can be done.

I also tried the same code, but English to French, it works. All sentences can be generated completly.

I also setted min_length, but sometimes, if I got short sentence, the last part of sentence will be compeately generated.
My code is here, please help:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
r"nllb-200-distilled-600M", token=True, src_lang="eng_Latn"
)
model = AutoModelForSeq2SeqLM.from_pretrained(r"nllb-200-distilled-600M", token=True)

input_path = r"eng_test_short.txt"
output_path = "./nllb_chn.txt"

input_file = open(input_path,'r',encoding='utf-8')

with open(output_path,'w',encoding='utf-8')as f:
for article in input_file:
inputs = tokenizer(article, return_tensors="pt")
# print(article)
# print(inputs)
translated_tokens = model.generate(
# **inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"), max_length=200
**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("zho_Hans"), max_length=512

    )
    print(tokenizer.convert_tokens_to_ids("zho_Hans"))

    output = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True,model_max_length=512)[0]

    print(output)
    f.writelines(output+'\n')

The output would be like:
input:
Politicians are loath to raise the tax even one penny when gas prices are high.
output:
政客们不愿意在高昂的燃油价格时,

The text was updated successfully, but these errors were encountered:

LiPengtao0504 · 2024-10-10T03:17:11Z

I also encountered this problem.
Src:"We now have 4-month-old mice that are non-diabetic that used to be diabetic," he added.

Tgt:他补充道：“我们现在有4个月大没有糖尿病的老鼠，但它们曾经得过该病。”

Predict:他补充说:"我们现在有4个月的小鼠,

logicvv added bug needs triage labels Oct 10, 2024

logicvv changed the title ~~NLLN is unable to translate into a complete long sentence in Chinese.~~ NLLB is unable to translate into a complete long sentence in Chinese. Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLLB is unable to translate into a complete long sentence in Chinese. #5549

NLLB is unable to translate into a complete long sentence in Chinese. #5549

logicvv commented Oct 10, 2024

LiPengtao0504 commented Oct 10, 2024

NLLB is unable to translate into a complete long sentence in Chinese. #5549

NLLB is unable to translate into a complete long sentence in Chinese. #5549

Comments

logicvv commented Oct 10, 2024

🐛 Bug

LiPengtao0504 commented Oct 10, 2024