You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importtransformersimporttransformers_cfgfromtransformersimportAutoModelForCausalLM, AutoTokenizerfromtransformers_cfg.grammar_utilsimportIncrementalGrammarConstraintfromtransformers_cfg.generation.logits_processimportGrammarConstrainedLogitsProcessorif__name__=="__main__":
print('transformers version', transformers.__version__)
print('transformers_cfg version, ', transformers_cfg)
# Load model and tokenizerllama_tokenizer=AutoTokenizer.from_pretrained("saibo/llama-1B")
llama_tokenizer.pad_token=llama_tokenizer.eos_tokenllama_model=AutoModelForCausalLM.from_pretrained("saibo/llama-1B")
# Load json grammarwithopen("examples/grammars/json.ebnf", "r") asfile:
grammar_str=file.read()
grammar=IncrementalGrammarConstraint(grammar_str, "root", llama_tokenizer)
grammar_processor=GrammarConstrainedLogitsProcessor(grammar)
# Generateprefix1="This is a valid json string for http request:"prefix2="This is a valid json string for shopping cart:"input_ids=llama_tokenizer([prefix1, prefix2], add_special_tokens=False, return_tensors="pt", padding=True)["input_ids"]
output=llama_model.generate(
input_ids,
do_sample=False,
max_length=50,
num_beams=1,
logits_processor=[grammar_processor],
repetition_penalty=1.0,
num_return_sequences=1,
)
# decode outputgenerations=llama_tokenizer.batch_decode(output, skip_special_tokens=True)
print(generations)
Context
saibo/llama-1B is a randomly initialized model for debugging purpose. Though it is not a trained LLM, it should be forced to generate some structure but it is failing.
The text was updated successfully, but these errors were encountered:
The problem seems to stem from the default padding configuration of the Llama Tokenizer, which is set to "left" padding instead of the more common "right" padding used by most large language model (LLM) tokenizers.
A straightforward solution is to adjust the padding side of the llama tokenizer by adding the line llama_tokenizer.padding_side = "right".
However, it's not yet clear which specific part of the code is affected by this setting. I plan to delve into this further. For now, the aforementioned fix is effective, and this issue seems to only impact llama models.
Note: The LLAMA-3 model already defaults the padding side to "right".
The Padding side on the Left is the right way to go for Batch processing of inputs ( sending multiple sequences at a time), as each input needs to be the same length. If we pad on the right, we will have a number of <|eot_id|> tokens following the assistant message, and the model will not generate anything.
Reproduce
Context
saibo/llama-1B
is a randomly initialized model for debugging purpose. Though it is not a trained LLM, it should be forced to generate some structure but it is failing.The text was updated successfully, but these errors were encountered: