Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatPDF功能存在的问题运行 simple_ui.py直接报错。 #72

Open
kevinwei1975 opened this issue Aug 9, 2024 · 0 comments
Open

Comments

@kevinwei1975
Copy link

运行python simple_ui.py直接报如下错误。
(mindspore) root@autodl-container-bff2469f3e-a4796232:~/autodl-tmp/ChatPDF# python simple_ui.py
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.559 seconds.
Prefix dict has been built successfully.
2024-08-09 09:59:49.937 | INFO | main::32 - Namespace(gen_model_type='auto', gen_model_name='./.mindnlp/model/01ai/Yi-6B-Chat', lora_model=None, rerank_model_name=None, corpus_files='sample.pdf', int4=False, int8=False, chunk_size=220, chunk_overlap=0, num_expand_context_chunk=1, server_name='0.0.0.0', server_port=8082, share=False)
The following parameters in checkpoint files are not loaded:
['embeddings.position_ids']
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]MindSpore do not support bfloat16 dtype, we will automaticlly convert to float16
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 3/3 [00:18<00:00, 6.14s/it]
2024-08-09 10:00:12.427 | INFO | msimilarities.bert_similarity:add_corpus:105 - Start computing corpus embeddings, new docs: 212
Batches: 100%|██████████████████████████████████████████████████████████████████████| 7/7 [00:16<00:00, 2.39s/it]
2024-08-09 10:00:29.184 | INFO | msimilarities.bert_similarity:add_corpus:117 - Add 212 docs, total: 212, emb len: 212
2024-08-09 10:00:29.185 | INFO | msimilarities.literal_similarity:add_corpus:395 - Add corpus done, new docs: 212, all corpus size: 212
2024-08-09 10:00:29.336 | INFO | msimilarities.literal_similarity:build_bm25:405 - Total corpus: 212
2024-08-09 10:00:29.336 | DEBUG | chatpdf:add_corpus:281 - files: ['sample.pdf'], corpus size: 212, top3: ['Style Transfer from Non-Parallel Text byCross-AlignmentTianxiao Shen1Tao Lei2Regina Barzilay1Tommi Jaakkola11MIT CSAIL2ASAPP Inc.', '1{tianxiao, regina, tommi}@[email protected] paper focuses on style transfer on the basis of non-parallel text.', 'This is aninstance of a broad family of problems including machine translation, decipherment,and sentiment modification. The key challenge is to separate the content fromother aspects such as style.']
Traceback (most recent call last):
File "/root/autodl-tmp/ChatPDF/simple_ui.py", line 34, in
model = ChatPDF(
File "/root/autodl-tmp/ChatPDF/chatpdf.py", line 184, in init
self.rerank_tokenizer = AutoTokenizer.from_pretrained(rerank_model_name_or_path, mirror='modelscope')
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/models/auto/tokenization_auto.py", line 775, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1723, in from_pretrained
return cls._from_pretrained(
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_base.py", line 1942, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py", line 154, in init
super().init(
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/tokenization_utils_fast.py", line 106, in init
fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 1388, in convert_slow_tokenizer
return converter_class(transformer_tokenizer).converted()
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 533, in converted
pre_tokenizer = self.pre_tokenizer(replacement, add_prefix_space)
File "/root/miniconda3/envs/mindspore/lib/python3.9/site-packages/mindnlp/transformers/convert_slow_tokenizer.py", line 515, in pre_tokenizer
return pre_tokenizers.Metaspace(replacement=replacement, add_prefix_space=add_prefix_space)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant