koboldcpp-1.81.1
New year, New Kobo edition
- NEW: Added WebSearch functionality: When enabled, KoboldCpp now optionally functions as a WebSearch proxy with a new
/api/extra/websearch
endpoint, allowing your queries to be augmented with web searches! Works with all models, needs top be enabled both on Lite and on Kcpp with--websearch
or in the GUI. The websearch is executed locally from the KoboldCpp instance, and is powered by DuckDuckGo. - NEW: Heuristic chat templates: Added
AutoGuess.json
bundled chat completions adapter. When this is selected, KoboldCpp will try to heuristically infer the correct instruct template to be used for the chat completions endpoint, based on the detected Jinja template from the model. (Thanks to @kallewoof) - Fixed issues with building quantization tools
- Compilation changes on Windows to unify the windows and linux build flags: Now requires specifying desired build targets on Windows, similar to linux. For example, to do a full nocuda build on windows you now need
make LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1
, where previously you would just domake
. - Updated Kobold Lite, multiple fixes and improvements
- NEW: TextDB Document Lookup - This is a very rudimentary form of browser-based RAG. You can access it from the Context > TextDB tab. It's powered by a text-based minisearch engine, you can paste a very large text document which is chunked and stored into the database, and at runtime it will find relevant snippets to add to the context depending on the query/instruction you send to the AI. You can use the historical context as a document, or paste a custom text document to use. Note that this is NOT an embedding model, it uses lunr and minisearch for retrieval scoring instead. (Credits to @esolithe)
- Increased max supported browser save size by switching to use indexedDb for autosaves and slots. Your existing localStorage autosave and saveslot data will be automatically converted and migrated over when you launch the new version. Note that you will not be able to access this new data from older versions of KoboldAI Lite anymore. Downloaded .json savefiles will continue to be accessible in all versions.
- Allowed more resolutions and aspect ratios for generated and uploaded images
- Improved quality of multimodal image handling, can upload and recognize larger and more detailed images now. Multimodal should work nicely on a typical screenshot.
- Merged fixes and improvements from upstream, with Vulkan improvements and new model support.
Hotfix 1.81.1 - Fixed nocertify mode for websearch, fixed aesthetic ui being broken.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, we recommend trying the Vulkan option (available in all releases) first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag. You can also refer to the readme and the wiki.