Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got "Illegal instruction" error while loading GGUF models after built #434 #2

Open
MichaelBui opened this issue Sep 12, 2023 · 12 comments

Comments

@MichaelBui
Copy link

MichaelBui commented Sep 12, 2023

The app fails while loading models:

root@koboldcpp-cb947d9b7-jqrfp:/koboldcpp# python koboldcpp.py --model /app/models/mythalion-13b.Q8_0.gguf --threads 8 --noavx2 --debugmode
***
Welcome to KoboldCpp - Version 1.43
Attempting to use non-avx2 compatibility library.
Initializing dynamic library: koboldcpp_noavx2.so
==========
Namespace(model='/app/models/mythalion-13b.Q8_0.gguf', model_param='/app/models/mythalion-13b.Q8_0.gguf', port=5001, port_param=5001, host='', launch=False, lora=None, config=None, threads=8, blasthreads=8, psutil_set_threads=False, highpriority=False, contextsize=2048, blasbatchsize=512, ropeconfig=[0.0, 10000.0], stream=False, smartcontext=False, unbantokens=False, bantokens=None, usemirostat=None, forceversion=0, nommap=False, usemlock=False, noavx2=True, debugmode=1, skiplauncher=False, hordeconfig=None, noblas=False, useclblast=None, usecublas=None, gpulayers=0, tensor_split=None)
==========
Loading model: /app/models/mythalion-13b.Q8_0.gguf
[Threads: 8, BlasThreads: 8, SmartContext: False]
Illegal instruction

The build is successful (DONE without return code):

#6 19.06 I llama.cpp build info: 
#6 19.06 I UNAME_S:  Linux
#6 19.06 I UNAME_P:  unknown
#6 19.06 I UNAME_M:  x86_64
#6 19.06 I CFLAGS:   -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native
#6 19.06 I CXXFLAGS: -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread
#6 19.06 I LDFLAGS:  
#6 19.06 I CC:       cc (Debian 12.2.0-14) 12.2.0
#6 19.06 I CXX:      g++ (Debian 12.2.0-14) 12.2.0
#6 19.06 
#6 19.06 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c ggml.c -o ggml.o
#6 33.43 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c otherarch/ggml_v2.c -o ggml_v2.o
#6 45.55 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c otherarch/ggml_v1.c -o ggml_v1.o
#6 53.97 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread -c expose.cpp -o expose.o
#6 55.71 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread -c common/common.cpp -o common.o
#6 64.01 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread -c gpttype_adapter.cpp -o gpttype_adapter.o

Environment and Context

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  36
  On-line CPU(s) list:   0-35
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        QEMU
  Model name:            Intel(R) Xeon(R) CPU E5-2470 v2 @ 2.40GHz
    BIOS Model name:     pc-i440fx-8.0  CPU @ 2.0GHz
    BIOS CPU family:     1
    CPU family:          6
    Model:               62
    Thread(s) per core:  1
    Core(s) per socket:  18
    Socket(s):           2
    Stepping:            4
    BogoMIPS:            4799.99
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cl
                         flush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc ar
                         ch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx
                         ssse3 cx16 pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xs
                         ave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stib
                         p tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust smep erms xsav
                         eopt arat umip md_clear arch_capabilities
Virtualization features:
  Virtualization:        VT-x
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):
  L1d:                   1.1 MiB (36 instances)
  L1i:                   1.1 MiB (36 instances)
  L2:                    144 MiB (36 instances)
  L3:                    32 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-17
  NUMA node1 CPU(s):     18-35
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
  Mds:                   Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB fil
                         ling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
  • Operating System: Debian Bookworm v12.1
$ uname -a
Linux koboldcpp-cb947d9b7-jqrfp 6.1.42-production+truenas #2 SMP PREEMPT_DYNAMIC Mon Aug 14 23:21:26 UTC 2023 x86_64 GNU/Linux

$ python3 --version
Python 3.10.13

$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ g++ --version
g++ (Debian 12.2.0-14) 12.2.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ git log | head -1
commit 2dc96687eb7c0fcacb2506e2fcf97dc909cc6cae

$ sha256sum /app/models/mythalion-13b.Q8_0.gguf
ed815d6d74783cc45a66beccceaa6e7d2e4642e38e333334e142e08446072a6e  /app/models/mythalion-13b.Q8_0.gguf

I've also reported here because I'm not sure if it's KoboldCPP related issue or Docker/Kubernetes related issue: LostRuins/koboldcpp#434

@bartowski1182
Copy link
Owner

bartowski1182 commented Sep 12, 2023

Can you provide your hardware information? I'm wondering if it's related to avx

Nevermind I see now it's in the printout, sec

@MichaelBui
Copy link
Author

I tried to run with different flag combinations with --noblas, --noavx2, and --nommap. Each time it loaded a different koboldcpp_*.so file but all failed with the same error.
I guess there are some incompatibility instructions in the built files vs. the hardware/os. But I don't know how to debug further.

@bartowski1182
Copy link
Owner

If you're building it from the Dockerfile can you try adding

ENV NOAVX2_BUILD=1

before make?

@MichaelBui
Copy link
Author

The build failed with that ENV at this line https://github.com/LostRuins/koboldcpp/blob/2dc96687eb7c0fcacb2506e2fcf97dc909cc6cae/Makefile#L439:
image

Logs:

#6 [3/3] RUN apt-get update && apt-get install -y git     build-essential     libopenblas-dev     && git clone https://github.com/LostRuins/koboldcpp.git --branch v1.43 ./     && pip install --no-cache-dir --trusted-host pypi.python.org -r requirements.txt     && make LLAMA_OPENBLAS=1 NOAVX2_BUILD=1     && apt-get clean s&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#6 0.139 Get:1 http://deb.debian.org/debian bookworm InRelease [151 kB]
#6 0.150 Get:2 http://deb.debian.org/debian bookworm-updates InRelease [52.1 kB]
#6 0.151 Get:3 http://deb.debian.org/debian-security bookworm-security InRelease [48.0 kB]
#6 0.220 Get:4 http://deb.debian.org/debian bookworm/main amd64 Packages [8906 kB]
#6 0.285 Get:5 http://deb.debian.org/debian bookworm-updates/main amd64 Packages [4952 B]
#6 0.340 Get:6 http://deb.debian.org/debian-security bookworm-security/main amd64 Packages [61.6 kB]
#6 1.390 Fetched 9224 kB in 1s (7277 kB/s)
#6 1.390 Reading package lists...
#6 1.937 Reading package lists...
#6 2.487 Building dependency tree...
#6 2.613 Reading state information...
#6 2.744 git is already the newest version (1:2.39.2-1.1).
#6 2.744 The following additional packages will be installed:
#6 2.746   libgfortran5 libopenblas-pthread-dev libopenblas0 libopenblas0-pthread
#6 2.825 The following NEW packages will be installed:
#6 2.826   build-essential libgfortran5 libopenblas-dev libopenblas-pthread-dev
#6 2.827   libopenblas0 libopenblas0-pthread
#6 2.847 0 upgraded, 6 newly installed, 0 to remove and 1 not upgraded.
#6 2.847 Need to get 12.6 MB of archives.
#6 2.847 After this operation, 106 MB of additional disk space will be used.
#6 2.847 Get:1 http://deb.debian.org/debian bookworm/main amd64 build-essential amd64 12.9 [7704 B]
#6 2.850 Get:2 http://deb.debian.org/debian bookworm/main amd64 libgfortran5 amd64 12.2.0-14 [793 kB]
#6 2.856 Get:3 http://deb.debian.org/debian bookworm/main amd64 libopenblas0-pthread amd64 0.3.21+ds-4 [6709 kB]
#6 2.896 Get:4 http://deb.debian.org/debian bookworm/main amd64 libopenblas0 amd64 0.3.21+ds-4 [32.6 kB]
#6 2.897 Get:5 http://deb.debian.org/debian bookworm/main amd64 libopenblas-pthread-dev amd64 0.3.21+ds-4 [4971 kB]
#6 2.926 Get:6 http://deb.debian.org/debian bookworm/main amd64 libopenblas-dev amd64 0.3.21+ds-4 [44.9 kB]
#6 3.068 debconf: delaying package configuration, since apt-utils is not installed
#6 3.094 Fetched 12.6 MB in 0s (133 MB/s)
#6 3.112 Selecting previously unselected package build-essential.
#6 3.112 (Reading database ... 
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 23974 files and directories currently installed.)
#6 3.130 Preparing to unpack .../0-build-essential_12.9_amd64.deb ...
#6 3.132 Unpacking build-essential (12.9) ...
#6 3.154 Selecting previously unselected package libgfortran5:amd64.
#6 3.154 Preparing to unpack .../1-libgfortran5_12.2.0-14_amd64.deb ...
#6 3.157 Unpacking libgfortran5:amd64 (12.2.0-14) ...
#6 3.244 Selecting previously unselected package libopenblas0-pthread:amd64.
#6 3.246 Preparing to unpack .../2-libopenblas0-pthread_0.3.21+ds-4_amd64.deb ...
#6 3.247 Unpacking libopenblas0-pthread:amd64 (0.3.21+ds-4) ...
#6 3.655 Selecting previously unselected package libopenblas0:amd64.
#6 3.658 Preparing to unpack .../3-libopenblas0_0.3.21+ds-4_amd64.deb ...
#6 3.659 Unpacking libopenblas0:amd64 (0.3.21+ds-4) ...
#6 3.678 Selecting previously unselected package libopenblas-pthread-dev:amd64.
#6 3.681 Preparing to unpack .../4-libopenblas-pthread-dev_0.3.21+ds-4_amd64.deb ...
#6 101.3   658 |                                                  n_threads);
#6 101.3       |                                                  ~~~~~~~~~~
#6 101.3 In file included from gpttype_adapter.cpp:19:
#6 101.3 llama.cpp:5728:5: note: declared here
#6 101.3  5728 | int llama_apply_lora_from_file(struct llama_context * ctx, const char * path_lora, const char * path_base_model, int n_threads) {
#6 101.3       |     ^~~~~~~~~~~~~~~~~~~~~~~~~~
#6 130.6 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c k_quants.c -o k_quants_failsafe.o
#6 132.1 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml_failsafe.o ggml_v2_failsafe.o ggml_v1_failsafe.o expose.o common.o gpttype_adapter_failsafe.o k_quants_failsafe.o ggml-alloc.o -shared -o koboldcpp_failsafe.so 
#6 132.2 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -DGGML_USE_OPENBLAS -I/usr/local/include/openblas -c ggml.c -o ggml_openblas.o
#6 142.2 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -DGGML_USE_OPENBLAS -I/usr/local/include/openblas -c otherarch/ggml_v2.c -o ggml_v2_openblas.o
#6 150.3 g++ -I. -I./common -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c++11 -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -Wno-multichar -Wno-write-strings -pthread ggml_openblas.o ggml_v2_openblas.o ggml_v1.o expose.o common.o gpttype_adapter.o k_quants.o ggml-alloc.o  -lopenblas -shared -o koboldcpp_openblas.so 
#6 150.4 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c ggml.c -o ggml_noavx2.o
#6 160.3 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c otherarch/ggml_v2.c -o ggml_v2_noavx2.o
#6 168.3 cc  -I.              -I./include -I./include/CL -I./otherarch -I./otherarch/tools -Ofast -DNDEBUG -std=c11   -fPIC -DGGML_USE_K_QUANTS -DLOG_DISABLE_LOGS -pthread -s -pthread -march=native -mtune=native  -c k_quants.c -o k_quants_noavx2.o
#6 169.7 1
#6 169.7 make: 1: No such file or directory
#6 169.7 make: *** [Makefile:439: koboldcpp_noavx2] Error 127
#6 ERROR: process "/bin/sh -c apt-get update && apt-get install -y git     build-essential     libopenblas-dev     && git clone https://github.com/LostRuins/koboldcpp.git --branch v1.43 ./     && pip install --no-cache-dir --trusted-host pypi.python.org -r requirements.txt     && make LLAMA_OPENBLAS=1 NOAVX2_BUILD=1     && apt-get clean s&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*" did not complete successfully: exit code: 2

@MichaelBui
Copy link
Author

To add: I even tried to build the Docker image inside the same k8s cluster (using Kaniko) to ensure the build environment is as close as possible (if not the same) to the running environment. However, the issue still occurs.

@bartowski1182
Copy link
Owner

@MichaelBui Could you try to replace the make on line 14 with:

&&& make OPENBLAS_NOAVX2_BUILD=1 LLAMA_CLBLAST=1 \

that was able to compile for me, may work for you

@MichaelBui
Copy link
Author

MichaelBui commented Sep 13, 2023

@noneabove1182 which branch is that? I've checked both master & v1.43, OPENBLAS_NOAVX2_BUILD is not available in the Makefile
Also, I don't have any GPU so I'm not sure if CLBlast is applicable

@MichaelBui
Copy link
Author

@noneabove1182 this version (~2 months ago) doesn't support GGUF.

@bartowski1182
Copy link
Owner

That's just where it was committed

@bartowski1182
Copy link
Owner

i did mess up and i see what you're saying now, but also i think that that build option is irrelevant anyways sadly

did you ever try building on that VM and running it with --noavx2 to see if it actually worked or if it's a hardware issue?

@MichaelBui
Copy link
Author

MichaelBui commented Sep 17, 2023

did you ever try building on that VM and running it with --noavx2 to see if it actually worked or if it's a hardware issue?

Yes, I mentioned above:

I tried to run with different flag combinations with --noblas, --noavx2, and --nommap. Each time it loaded a different koboldcpp_*.so file but all failed with the same error.

I tried similar steps with oobabooga and succeeded without any errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants