Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got "Illegal instruction" error while loading GGUF models after built #434

Closed
4 tasks done
MichaelBui opened this issue Sep 11, 2023 · 9 comments
Closed
4 tasks done

Comments

@MichaelBui
Copy link

MichaelBui commented Sep 11, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

The app fails while loading models:

root@koboldcpp-cb947d9b7-jqrfp:/koboldcpp# python koboldcpp.py --model /app/models/mythalion-13b.Q8_0.gguf --threads 8 --noavx2 --debugmode
***
Welcome to KoboldCpp - Version 1.43
Attempting to use non-avx2 compatibility library.
Initializing dynamic library: koboldcpp_noavx2.so
==========
Namespace(model='/app/models/mythalion-13b.Q8_0.gguf', model_param='/app/models/mythalion-13b.Q8_0.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=8, blasthreads=8, psutil_set_threads=False, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], stream=False, smartcontext=False, unbantokens=False, bantokens=None, usemirostat=None, forceversion=0, nommap=False, usemlock=False, noavx2=True, debugmode=1, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None)
==========
Loading model: /app/models/mythalion-13b.Q8_0.gguf
[Threads: 8, BlasThreads: 8, SmartContext: False]
Illegal instruction

The build is successful (DONE without return code):

#6 19.06 I llama.cpp build info: 
#6 19.06 I UNAME_S:  Linux
#6 19.06 I UNAME_P:  unknown
#6 19.06 I UNAME_M:  x86_64
#6 19.06 I CFLAGS:   -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native
#6 19.06 I CXXFLAGS: -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread
#6 19.06 I LDFLAGS:  
#6 19.06 I CC:       cc (Debian 12.2.0-14) 12.2.0
#6 19.06 I CXX:      g++ (Debian 12.2.0-14) 12.2.0
#6 19.06 
#6 19.06 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c ggml.c -o ggml.o
#6 33.43 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c otherarch/ggml_v2.c -o ggml_v2.o
#6 45.55 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c otherarch/ggml_v1.c -o ggml_v1.o
#6 53.97 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread -c expose.cpp -o expose.o
#6 55.71 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread -c common/common.cpp -o common.o
#6 64.01 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread -c gpttype_adapter.cpp -o gpttype_adapter.o

Environment and Context

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  36
  On-line CPU(s) list:   0-35
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        QEMU
  Model name:            Intel(R) Xeon(R) CPU E5-2470 v2 @ 2.40GHz
    BIOS Model name:     pc-i440fx-8.0  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          6
    Model:               62
    Thread(s) per core:  1
    Core(s) per socket:  18
    Socket(s):           2
    Stepping:            4
    BogoMIPS:            4799.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cl
                         flush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc ar
                         ch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx
                         ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xs
                         ave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stib
                         p tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms xsav
                         eopt arat umip md_clear arch_capabilities
Virtualization features:
  Virtualization:        VT-x
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:                   1.1 MiB (36 instances)
  L1i:                   1.1 MiB (36 instances)
  L2:                    144 MiB (36 instances)
  L3:                    32 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-17
  NUMA node1 CPU(s):     18-35
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB fil
                         ling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
  • Operating System: Debian Bookworm v12.1
$ uname -a
Linux koboldcpp-cb947d9b7-jqrfp 6.1.42-production+truenas #2 SMP PREEMPT_DYNAMIC Mon Aug 14 23:21:26 UTC 2023 x86_64 GNU/Linux

$ python3 --version
Python 3.10.13

$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ g++ --version
g++ (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ git log | head -1
commit 2dc96687eb7c0fcacb2506e2fcf97dc909cc6cae

$ sha256sum /app/models/mythalion-13b.Q8_0.gguf
ed815d6d74783cc45a66beccceaa6e7d2e4642e38e333334e142e08446072a6e  /app/models/mythalion-13b.Q8_0.gguf
@LostRuins
Copy link
Owner

LostRuins commented Sep 12, 2023

What are your build flags? Did you try just running a make clean and make with no extra parameters? Technically since you are building on linux --noavx2 is not neccessary as it uses -march=native and already knows what intrinsics are supported. It could also be a problem with your openblas setup, so try running without it. Just a plain make

@bartowski1182
Copy link

@LostRuins he's using my Dockerfile, tried having him add NOAVX2_BUILD=1 and it just causes the make to fail with "no such file or directory"

@bartowski1182
Copy link

On a related subject though, are you willing to try the make on your bare metal to check if it's a docker env issue or a hardware issue? If it works on bare metal then we can move discussion completely to my repo and solve it there

@LostRuins
Copy link
Owner

Make absolutely does work for me, otherwise how have I been able to create all the windows binaries?

@LostRuins
Copy link
Owner

Unless I misunderstand the question?

@bartowski1182
Copy link

Oh yeah no that question wasn't directed at you @LostRuins sorry haha was directed at @MichaelBui

@MichaelBui
Copy link
Author

@noneabove1182 I don't have a chance to install it on a bare metal server but:

  • I've just created a new VM in Proxmox
  • Installed Debian Bookworm v12.1 (to be same as the base docker image), then
  • Install KoboldCPP manually following the same commands in the Dockerfile_cpu
    This way it's working as normal:
(.venv) root@ai-lab:/koboldcpp# python koboldcpp.py --model models/mythalion-13b.Q8_0.gguf --threads 8 --noavx2 --debugmode
***
Welcome to KoboldCpp - Version 1.43
Attempting to use non-avx2 compatibility library.
Initializing dynamic library: koboldcpp_noavx2.so
==========
Namespace(model='models/mythalion-13b.Q8_0.gguf', model_param='models/mythalion-13b.Q8_0.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=8, blasthreads=8, psutil_set_threads=False, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], stream=False, smartcontext=False, unbantokens=False, bantokens=None, usemirostat=None, forceversion=0, nommap=False, usemlock=False, noavx2=True, debugmode=1, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None)
==========
Loading model: /koboldcpp/models/mythalion-13b.Q8_0.gguf
[Threads: 8, BlasThreads: 8, SmartContext: False]

---
Identified as LLAMA model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling (scale:1.000, base:10000.0)
System Info: AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /koboldcpp/models/mythalion-13b.Q8_0.gguf (version GGUF V2 (latest))
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 2048
llm_load_print_meta: n_embd         = 5120
llm_load_print_meta: n_head         = 40
llm_load_print_meta: n_head_kv      = 40
llm_load_print_meta: n_layer        = 40
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 13824
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 13B
llm_load_print_meta: model ftype    = mostly Q3_K - Large
llm_load_print_meta: model size     = 13.02 B
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MB
llm_load_tensors: mem required  = 13189.98 MB (+ 1600.00 MB per state)
....................................................................................................
llama_new_context_with_model: kv self size  = 1600.00 MB
llama_new_context_with_model: compute buffer total size =  191.47 MB
Load Model OK: True
Embedded Kobold Lite loaded.
Starting Kobold HTTP Server on port 5001
Please connect to custom endpoint at http://localhost:5001

So I guess the issue is actually related to the Docker environment then

@bartowski1182
Copy link

@MichaelBui Hmm okay, let's close this issue to not muddy up their repo and I'll investigate, thanks for checking !

@MichaelBui
Copy link
Author

Noted, closing this to continue at bartowski1182/koboldcpp-docker#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants