You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Non-streaming response chunks should be joined before parsing?
I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to chat method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.
In the current implementation each chunk is JSON.parse'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.
Changing code of Langchain::LLM::Ollama like this works for me.
Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.
Inherit from Langchain::LLM::OpenAI ?
Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.
The text was updated successfully, but these errors were encountered:
I confirm the bug with chunks, once too big of an input is sent, I get ``parse': 451: unexpected token at '' (JSON::ParserError)`, simply because the chunk ends in a way that the line is not a valid json.
Temperature and seed parameters should be part of 'options'
According to the docs temperature and seed should be passed as options:
In the current implementation these are passed at the same level as parameters like 'model'.
Changing code of Langchain::LLM::Ollama like this works, but is probably not the best place to implement this.
Non-streaming response chunks should be joined before parsing?
I am using Ollama 0.1.45. When requesting a non-streaming response (i.e. not passing a block to
chat
method) and the response is large (more than ~4000 characters) Ollama will send multiple chunks of data.In the current implementation each chunk is
JSON.parse
'd seperately. For smaller responses which fit in a single chunck this is obviously not a problem. For multiple chunks I need to join all chunks first and then JSON parse it.Changing code of Langchain::LLM::Ollama like this works for me.
Ollama docs say nothing about this behavior. Might be a bug in Ollama. Or a feature.
This happens at least with llama3-8b-q8 and phi3-14b-q5 models.
Should langchainrb code around this? Checking if response chunks are complete JSON documents or not.
Inherit from Langchain::LLM::OpenAI ?
Since Ollama is compatible with OpenAI's API, isn't it easier to let Langchain::LLM::Ollama inherit from Langchain::LLM::OpenAI ? Overwriting default values where needed.
The text was updated successfully, but these errors were encountered: