Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run llama failed (Apple M2 Ultra) #618

Open
hsoftxl opened this issue Dec 30, 2024 · 1 comment
Open

run llama failed (Apple M2 Ultra) #618

hsoftxl opened this issue Dec 30, 2024 · 1 comment

Comments

@hsoftxl
Copy link

hsoftxl commented Dec 30, 2024

python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

Traceback (most recent call last):
File "/opt/anaconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/anaconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in
main()
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/server/server.py", line 138, in init
is_reachable = check_direct_reachability(initial_peers=initial_peers, use_relay=False, **kwargs)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/server/reachability.py", line 78, in check_direct_reachability
return RemoteExpertWorker.run_coroutine(_check_direct_reachability())
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/hivemind/moe/client/remote_expert_worker.py", line 36, in run_coroutine
return future if return_future else future.result()
File "/opt/anaconda3/envs/py3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/opt/anaconda3/envs/py3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/server/reachability.py", line 59, in _check_direct_reachability
target_dht = await DHTNode.create(client_mode=True, **kwargs)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/hivemind/dht/node.py", line 192, in create
p2p = await P2P.create(**kwargs)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/hivemind/p2p/p2p_daemon.py", line 234, in create
await asyncio.wait_for(ready, startup_timeout)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: 2024/12/30 11:43:56 failed to connect to bootstrap peers

@hsoftxl hsoftxl changed the title run llama failed run llama failed (Apple M2 Ultra) Dec 30, 2024
@hsoftxl
Copy link
Author

hsoftxl commented Dec 31, 2024

python -m petals.cli.run_server tiiuae/falcon-180B-chat --new_swarm

transformers version 4.43.1

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/cli/run_server.py", line 235, in
main()
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/server.py", line 237, in init
throughput_info = get_server_throughput(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache
if inference else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/models/falcon/block.py", line 421, in forward
attention_mask = FalconModel._prepare_attn_mask(attention_mask, (batch_size, seq_length), past_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'FalconModel' has no attribute '_prepare_attn_mask'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant