Skip to content

koboldcpp-1.77

Compare
Choose a tag to compare
@LostRuins LostRuins released this 01 Nov 16:32
· 598 commits to concedo since this release

koboldcpp-1.77

the road not taken edition

logprobs

  • NEW: Token Probabilities (logprobs) are now available over the API! Currently only supplied over the sync API (non-streaming), but a second /api/extra/last_logprobs dedicated logprobs endpoint is also provided. Will work and provide a link to view alternate token probabilities for both streaming and non-streaming if "logprobs" is enabled in KoboldAI Lite settings. Will also work in SillyTavern when streaming is disabled, once the latest build is out.
  • Response prompt_tokens, completion_tokens and total_tokens are now accurate values instead of placeholders.
  • Enabled CUDA graphs for the cuda12 build, which can improve performance on some cards.
  • Fixed a bug where .wav audio files uploaded directly to the /v1/audio/transcriptions endpoint get fragmented and cut off early. Audio sent as base64 within JSON payloads are unaffected.
  • Fixed a bug where Whisper transcription blocked generation in non-multiuser mode.
  • Fixed a bug where trim_stop did not remove a stop sequence that was divided across multiple tokens in some cases.
  • Significantly increased the maximum limits for stop sequences, anti-slop token bans, logit biases and DRY sequence breakers, (thanks to @mayaeary for the PR which changes the way some parameters are passed to the CPP side)
  • Added link to help page if user fails to select a model.
  • Flash Attention GUI quick launcher toggle hidden by default if Vulkan is selected (usually reduced performance).
  • Updated Kobold Lite, multiple fixes and improvements
    • NEW: Experimental ComfyUI Support Added!: ComfyUI can now be used as an image generation backend API from within KoboldAI Lite. No workflow customization is necessary. Note: ComfyUI must be launched with the flags --listen --enable-cors-header '*' to enable API access. Then you may use it normally like any other Image Gen backend.
    • Clarified the option for selecting A1111/Forge/KoboldCpp as an image gen backend, since Forge is gradually superseding A1111. This option is compatible with all 3 of the above.
    • You are now able to generate images from instruct mode via natural language, similar to chatgpt. (e.g. Please generate an image of a bag of sand). This option requires having an image model loaded, it uses regex and is enabled by default, it can be disabled in settings.
    • Added support for Tavern "V3" character cards: Actually, V3 is not a real format, it's an augmented V2 card used by Risu that adds additional metadata chunks. These chunks are not supported in Lite, but the base "V2" card functionality will work.
    • Added new scenario "Interactive Storywriter": This is similar to story writing mode, but allows you to secretly steer the story with hidden instruction prompts.
    • Added Token Probability Viewer - You can now see a table of alternative token probabilities in responses. Disabled by default, enable in advanced settings.
    • Fixed JSON file selection problems in some mobile browsers.
    • Fixed Aetherroom importer.
    • Minor Corpo UI layout tweaks by @Ace-Lite
  • Merged fixes and improvements from upstream

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you have an Nvidia GPU, but use an old CPU and koboldcpp.exe does not work, try koboldcpp_oldcpu.exe
If you have a newer Nvidia GPU, you can use the CUDA 12 version koboldcpp_cu12.exe (much larger, slightly faster).
If you're using Linux, select the appropriate Linux binary file instead (not exe).
If you're on a modern MacOS (M1, M2, M3) you can try the koboldcpp-mac-arm64 MacOS binary.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.