The main_cell.py is so slow #39

yangfeizZZ · 2023-06-08T02:20:00Z

Hello
when i run main_cell.py ,it is very very very slow. I start run main_cell.py at Monday this week, but the result as flow until now.

[ Epoch 29 of 80 ]

(Training) BCE: 0.348 MSE: 0.719 Loss: 0.349 norm_ratio: 0.00: 32%|▎| 321/1000 [44:21<1:30:05, 7.96s/it]

So i want to know how to more faster. Thank you very much.

ruochiz · 2023-06-08T18:03:19Z

Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization.

yangfeizZZ · 2023-06-09T07:39:01Z

Hey, did you train the model on GPU device or CPU device, and what would be the CPU / GPU utilization.
I used GPU,but it has error:

[ Epoch 38 of 60 ]

(Training) bce: 0.1953, mse: 0.0000, acc: 98.688 %, pearson: 0.943, spearman: 0.643, elapse: 152.854 s
(Validation-hyper) bce: 0.1811, acc: 99.596 %,pearson: 0.968, spearman: 0.646,elapse: 0.101 s
no improve 4
[ Epoch 39 of 60 ]
(Training) bce: 0.1946, mse: 0.0000, acc: 98.729 %, pearson: 0.944, spearman: 0.643, elapse: 148.983 s
(Validation-hyper) bce: 0.1793, acc: 99.619 %,pearson: 0.971, spearman: 0.648,elapse: 0.122 s
no improvement early stopping
(Validation-hyper) bce: 0.1806, acc: 99.606 %, auc: 0.966, aupr: 0.647,elapse: 0.564 s
Traceback (most recent call last):
File "/home/yangfei/Higashi/higashi/main_cell.py", line 1472, in
select_gpus[i])
TypeError: 'NoneType' object is not subscriptable

ruochiz · 2023-06-09T18:17:03Z

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

yangfeizZZ · 2023-06-10T05:24:13Z

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

I don't know what's mean of "no improvement early stopping"? Is it mean the trianing is ok so it stop

yangfeizZZ · 2023-06-10T14:32:04Z

Could you try to run nvidia-smi -q -d Memory |grep -A4 GPU|grep Free and nvidia-smi -q -d Memory |grep -A4 GPU in you command line and see what it returns. Higashi uses a hacky way to figure out how many GPUs you had ,and that can be not compatible for some cuda version.

Also, what did you put in the 'gpu_num' parameter in the config.JSON file, and how many GPU cards do you have on that machine.

I set "gpu_num": 2, but it has same error. So I don't know when it means the end of training and can be visualized

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The main_cell.py is so slow #39

The main_cell.py is so slow #39

yangfeizZZ commented Jun 8, 2023

ruochiz commented Jun 8, 2023

yangfeizZZ commented Jun 9, 2023

ruochiz commented Jun 9, 2023

yangfeizZZ commented Jun 10, 2023

yangfeizZZ commented Jun 10, 2023

The main_cell.py is so slow #39

The main_cell.py is so slow #39

Comments

yangfeizZZ commented Jun 8, 2023

ruochiz commented Jun 8, 2023

yangfeizZZ commented Jun 9, 2023

ruochiz commented Jun 9, 2023

yangfeizZZ commented Jun 10, 2023

yangfeizZZ commented Jun 10, 2023