run llama failed (Apple M2 Ultra) #618

hsoftxl · 2024-12-30T03:56:32Z

python3 -m petals.cli.run_server meta-llama/Meta-Llama-3.1-405B-Instruct

Traceback (most recent call last):
File "/opt/anaconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/anaconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/cli/run_server.py", line 235, in
main()
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/server/server.py", line 138, in init
is_reachable = check_direct_reachability(initial_peers=initial_peers, use_relay=False, **kwargs)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/server/reachability.py", line 78, in check_direct_reachability
return RemoteExpertWorker.run_coroutine(_check_direct_reachability())
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/hivemind/moe/client/remote_expert_worker.py", line 36, in run_coroutine
return future if return_future else future.result()
File "/opt/anaconda3/envs/py3.10/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/opt/anaconda3/envs/py3.10/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/petals/server/reachability.py", line 59, in _check_direct_reachability
target_dht = await DHTNode.create(client_mode=True, **kwargs)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/hivemind/dht/node.py", line 192, in create
p2p = await P2P.create(**kwargs)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/site-packages/hivemind/p2p/p2p_daemon.py", line 234, in create
await asyncio.wait_for(ready, startup_timeout)
File "/opt/anaconda3/envs/py3.10/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
return fut.result()
hivemind.p2p.p2p_daemon_bindings.utils.P2PDaemonError: Daemon failed to start: 2024/12/30 11:43:56 failed to connect to bootstrap peers

hsoftxl · 2024-12-31T06:33:35Z

python -m petals.cli.run_server tiiuae/falcon-180B-chat --new_swarm

transformers version 4.43.1

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/cli/run_server.py", line 235, in
main()
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/cli/run_server.py", line 219, in main
server = Server(
^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/server.py", line 237, in init
throughput_info = get_server_throughput(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 83, in get_server_throughput
cache[cache_key] = measure_throughput_info(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 123, in measure_throughput_info
"inference_rps": measure_compute_rps(
^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 218, in measure_compute_rps
cache = step(cache)
^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/server/throughput.py", line 215, in step
outputs = block.forward(dummy_input, use_cache=inference, layer_past=cache if inference else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/tensor_parallel/tensor_parallel.py", line 99, in forward
return [self.module_shards[0](*args, **kwargs)][self.output_device_index]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/py3.12/lib/python3.12/site-packages/petals/models/falcon/block.py", line 421, in forward
attention_mask = FalconModel._prepare_attn_mask(attention_mask, (batch_size, seq_length), past_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'FalconModel' has no attribute '_prepare_attn_mask'

hsoftxl changed the title ~~run llama failed~~ run llama failed (Apple M2 Ultra) Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run llama failed (Apple M2 Ultra) #618

run llama failed (Apple M2 Ultra) #618

hsoftxl commented Dec 30, 2024

hsoftxl commented Dec 31, 2024

run llama failed (Apple M2 Ultra) #618

run llama failed (Apple M2 Ultra) #618

Comments

hsoftxl commented Dec 30, 2024

hsoftxl commented Dec 31, 2024