-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Chinese support not very well? #317
Comments
Try other embedding models : https://github.com/zilliztech/GPTCache/blob/main/gptcache/embedding/__init__.py from gptcache.embedding import SBERT, Huggingface, FastText |
I just replace the onnx to other models, but all report the error below:
|
delete faiss.index and sqlite file |
What embedding model do you use? You can find some built-in embedding methods with examples at our docs: https://gptcache.readthedocs.io/en/latest/references/embedding.html |
I find lot of embedding model in the documents. But which one is recommandation for Chinese? |
I use the import os
import time
from gptcache.embedding import Huggingface
from gptcache import cache
from gptcache.adapter import openai
from gptcache.manager import get_data_manager, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
huggingface = Huggingface(model='uer/albert-base-chinese-cluecorpussmall')
vector_base = VectorBase('faiss', dimension=huggingface.dimension)
data_manager = get_data_manager('sqlite', vector_base)
cache.init(
embedding_func=huggingface.to_embeddings,
data_manager=data_manager,
similarity_evaluation=SearchDistanceEvaluation(),
)
os.environ['OPENAI_API_KEY'] = 'YOUR API KEY'
cache.set_openai_key()
questions = [
'什么是Github',
'你可以解释下什么是Github吗',
'可以告诉我关于Github一些信息吗'
]
def response_text(openai_resp):
return openai_resp['choices'][0]['message']['content']
for question in questions:
for _ in range(2):
start_time = time.time()
response = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=[
{
'role': 'user',
'content': question
}
],
)
print(f'Question: {question}')
print('Time consuming: {:.2f}s'.format(time.time() - start_time))
print(f'Answer: {response_text(response)}\n') output:
|
You can always pass your own embedding function in GPTCache. There are also many open-source multilingual or Chinese-supportive models available on Hugging Face. In the case of using models from huggingface, you can try to pass the model name referring to example here: https://gptcache.readthedocs.io/en/latest/references/embedding.html#module-gptcache.embedding.huggingface |
@ablozhou If the appeal answer has solved your problem, I will close this issue |
`from gptcache.embedding import Huggingface def get_content_func(data, **_): cache_base = CacheBase('sqlite') from langchain.embeddings.openai import OpenAIEmbeddings loader = TextLoader('customer_data/data.txt', encoding="utf-8") embeddings = OpenAIEmbeddings() from langchain.chains.question_answering import load_qa_chain from gptcache.adapter.langchain_models import LangChainLLMs llm = LangChainLLMs(llm=OpenAI(temperature=0)) @SimFG Process finished with exit code 1 |
@EricKong1985 This does not seem to be caused by GPTCache, maybe a similar problem: https://stackoverflow.com/questions/2790828/python-cant-pickle-module-objects-error. |
Current Behavior
I test the offical similarity example in readme .
but it dosen't support Chinese very well. I ask some question, it always occured of the same answer:
I don't know how to avoid these problems?
Thank you!
Expected Behavior
match the right question and give the right answer.
Steps To Reproduce
Environment
Anything else?
using onnx
The text was updated successfully, but these errors were encountered: