lora微调Mamba-Codestral-7B-v0.1出现了问题 #6434

tongzeliang · 2024-12-24T13:43:40Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-6.5.0-18-generic-x86_64-with-glibc2.35
Python version: 3.9.0
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.45.0
Datasets version: 2.21.0
Accelerate version: 0.34.2
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA GeForce RTX 4090

Reproduction

这是我遇到的问题:

  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/autograd/function.py", line 575, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 264, in forward
    outputs = run_function(*args)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/transformers/models/mamba2/modeling_mamba2.py", line 649, in forward
    hidden_states = self.mixer(
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/transformers/models/mamba2/modeling_mamba2.py", line 608, in forward
    return self.torch_forward(hidden_states, cache_params, cache_position, attention_mask)
  File "/home/tzl/.conda/envs/SEM/lib/python3.9/site-packages/transformers/models/mamba2/modeling_mamba2.py", line 535, in torch_forward
    G_intermediate = C[:, :, :, None, :, :] * B[:, :, None, :, : ,:]  # shape: (b, c, l, s, h, n)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 GiB. GPU 0 has a total capacity of 23.64 GiB of which 9.08 GiB is free. Including non-PyTorch memory, this process has 14.56 GiB memory in use. Of the allocated memory 14.09 GiB is allocated by PyTorch, and 19.65 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

我确认GPU中无其它进程占用，但还是有CUDA out of memory的问题，因此想请教您一下，谢谢。

Expected behavior

No response

Others

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the pending This problem is yet to be addressed label Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lora微调Mamba-Codestral-7B-v0.1出现了问题 #6434

lora微调Mamba-Codestral-7B-v0.1出现了问题 #6434

tongzeliang commented Dec 24, 2024

lora微调Mamba-Codestral-7B-v0.1出现了问题 #6434

lora微调Mamba-Codestral-7B-v0.1出现了问题 #6434

Comments

tongzeliang commented Dec 24, 2024

Reminder

System Info

Reproduction

Expected behavior

Others