Possibly memory issues with SVC? #1010

Stack-it-up · 2022-06-01T21:21:18Z

Description
I'm trying to use Intelex to accelerate training of a SVC. My dataset is pretty tame (18 MB, in fact, I am attaching it, since it is a publicly available dataset - Universal Dependencies ISDT). I wasn't expecting my 16GB of ram (and 16gb of swap) to be filled by this task, so I wonder if this could be a bug. However, I am a student, so it may be an error on my part (if so, I'm sorry).

To Reproduce
Steps to reproduce the behavior:

Download attached files in the same folder
Change extension of train_parser from txt to py
Install NLTK
Run the python script
See error

Expected behavior
A new file should be created with the training output. Instead, an Out Of Memory error is raised.

Note on NLTK implementation
The code for the function train is pretty straightforward, see source code here: https://www.nltk.org/_modules/nltk/parse/transitionparser.html#TransitionParser.train

Environment:

OS: Ubuntu 20.04
Intelex 2021.5
Python 3.9.11
scikit-learn 1.0.2
NLTK 3.7
conda 4.13.0
CPU: i5-10500

Attachments
train_parser.txt
it_isdt-ud-train.txt

EDIT:
the svmlight file generated by NLTK is actually 62 MB and the memory used during sequential training (plain sklearn) is around 1GB

FischyM · 2022-06-02T19:50:15Z

How many threads are you using? SVM uses all available threads, so having N number of threads leads to consuming N times more ram. https://intel.github.io/scikit-learn-intelex/memory-requirements.html

Stack-it-up · 2022-06-02T19:58:19Z

Thank you for your reply. My processor has 12 virtual cores so I shouldn't be able to process more thna 12 threads at once, is that right? I'm not sure if there is a way to set the maximum number of threads from Intelex.

FischyM · 2022-06-02T20:03:08Z

I'm running into the same problem right now actually. I'm not sure which environmental variable controls the number of threads, so I came to this GitHub to find out! I'll let you know if I find what we are looking for.

I do have these 5 to test with to see if they control the number of threads that spawn from a single process, but I won't be able to test them until later tonight maybe.

export OMP_NUM_THREADS=1
export BLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export NUMEXPR_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1

EDIT:
It doesn't appear that any of those changes the thread usage of SKLEARNEX. I tried some variations of SKLEARNEX_THREADS=1 and SKLEARNEX_NUM_THREADS=1, but it did not change the thread behavior. Hopefully, someone more knowledgeable will be able to answer this.

plenoi · 2022-06-18T00:31:29Z

i also face the same problem and waiting for some help.
this always happen when i use SVC.

joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

EDIT:
I solve the problem by manually import SVC and remove patch_sklearn()
from daal4py import daalinit
daalinit(1)

from daal4py.sklearn.svm import SVC

lilybellesweet · 2022-08-09T13:12:44Z

Any update here? I would like to be able to set the number of threads, as some jobs misbehave on shared resources

Alexsandruss · 2022-08-11T13:56:16Z

Number of threads per SVM training/inference can be effectively limited with daalinit:

import daal4py as d4p
d4p.daalinit(1)

I checked that it works for SVM with python's multiprocessing.

Alexsandruss · 2022-08-11T14:27:49Z

However, limit of threads will not solve memory issues of SVM completely, because it is experiencing memory leak, which is under investigation.

lilybellesweet · 2022-08-12T11:52:44Z

I tried using daalinit (for RandomForestRegressor) and it did not work, the number of threads created was not affected.

Alexsandruss · 2022-08-13T15:08:01Z

I run RandomForestRegressor and it used number of threads set by daalinit. Did you check that RandomForestRegressor was patched using verbose mode?
What OS, python, scikit-learn and scikit-learn-intelex versions are you using?

lilybellesweet · 2022-08-16T10:01:24Z

It's running the sklearnex version, I checked.

OS: CentOS 7.9
python 3.8.12
scikit-learn 1.1.1
scikit-learn-intelex 2021.6.0

I set d4py.daalinit(2), then do patch_sklearn(), but always get threads per process equal to number of CPUs available.

Alexsandruss · 2022-08-16T21:28:28Z

I used same configuration and next script while trying to reproduce:

import logging
logging.getLogger().setLevel(logging.INFO)

from sklearnex import patch_sklearn
patch_sklearn()

from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
import daal4py as d4p

from multiprocessing import Pool
from sys import argv


def train_rfr(data):
    x, y = data
    rfr = RandomForestRegressor()
    rfr.fit(x, y)
    print('Score:', rfr.score(x, y))


if __name__ == '__main__':
    n_threads = int(argv[1])
    n_forests = int(argv[2])

    dataset = [make_regression(n_samples=20000, n_features=128) for i in range(n_forests)]

    d4p.daalinit(n_threads)
    with Pool(n_forests) as p:
        p.map(train_rfr, dataset)

n_threads x n_forests total threads were used every time for varying parameters.

lilybellesweet · 2022-08-22T09:46:05Z

Thank you for this effort! I am not sure why it is behaving like this for me, but despite using very similar code I am still having this issue where the same amount of threads are created per process as the total number of cores available, no matter how I set daalinit(). I am working on a SLURM system - could this be causing the issue?

FischyM · 2022-11-09T04:27:00Z

It doesn't appear to be a SLURM issue for me, as even using the same system with and without SLURM gives me an odd issue in that SVC is returning np.nan for different testing scores in sklearn's GridSearchCV. I'm wondering if it is a CPU-specific issue, because I don't have this issue on an intel CPU (Xeon E5-2630 v3), but I do on an AMD (Milan 7763). It appears that @Stack-it-up is using an intel CPU, but it's in the Core series. What CPU are you using @lilybellesweet for your SLURM system?

lange-martin · 2022-11-22T08:06:43Z

However, limit of threads will not solve memory issues of SVM completely, because it is experiencing memory leak, which is under investigation.

@Alexsandruss Is there any update on the memory leak for SVM? I found one post of yours here where you say the issue is on Python side. Does that mean it cannot be fixed?

Alexsandruss · 2022-11-23T00:04:49Z

However, limit of threads will not solve memory issues of SVM completely, because it is experiencing memory leak, which is under investigation.

@Alexsandruss Is there any update on the memory leak for SVM? I found one post of yours here where you say the issue is on Python side. Does that mean it cannot be fixed?

Fix for memory leak is not found yet, you can try to use SVM from daal4py.sklearn.svm namespace as temporary alternative. It is wrapper for legacy DAAL interface and memory leak is not expected here, however it might have outdated API comparing to latest sklearn versions of SVM

Stack-it-up · 2023-05-01T14:56:39Z

Any update on this?

Alexsandruss · 2023-05-02T07:16:02Z

Any update on this?

Currently - no update.

montagne5641 · 2024-06-15T13:29:55Z

We have confirmed that memory leaks occur in the same way with SVR. However, daal4py does not support SVR....
Does this mean that dealing with memory leaks is inherently difficult and there is no prospect of a solution in the future?

Stack-it-up added the bug Something isn't working label Jun 1, 2022

PivovarA self-assigned this Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibly memory issues with SVC? #1010

Possibly memory issues with SVC? #1010

Stack-it-up commented Jun 1, 2022 •

edited

Loading

FischyM commented Jun 2, 2022

Stack-it-up commented Jun 2, 2022

FischyM commented Jun 2, 2022 •

edited

Loading

plenoi commented Jun 18, 2022 •

edited

Loading

lilybellesweet commented Aug 9, 2022

Alexsandruss commented Aug 11, 2022 •

edited

Loading

Alexsandruss commented Aug 11, 2022

lilybellesweet commented Aug 12, 2022

Alexsandruss commented Aug 13, 2022

lilybellesweet commented Aug 16, 2022 •

edited

Loading

Alexsandruss commented Aug 16, 2022

lilybellesweet commented Aug 22, 2022

FischyM commented Nov 9, 2022 •

edited

Loading

lange-martin commented Nov 22, 2022

Alexsandruss commented Nov 23, 2022

Stack-it-up commented May 1, 2023

Alexsandruss commented May 2, 2023

montagne5641 commented Jun 15, 2024 •

edited

Loading

Possibly memory issues with SVC? #1010

Possibly memory issues with SVC? #1010

Comments

Stack-it-up commented Jun 1, 2022 • edited Loading

FischyM commented Jun 2, 2022

Stack-it-up commented Jun 2, 2022

FischyM commented Jun 2, 2022 • edited Loading

plenoi commented Jun 18, 2022 • edited Loading

lilybellesweet commented Aug 9, 2022

Alexsandruss commented Aug 11, 2022 • edited Loading

Alexsandruss commented Aug 11, 2022

lilybellesweet commented Aug 12, 2022

Alexsandruss commented Aug 13, 2022

lilybellesweet commented Aug 16, 2022 • edited Loading

Alexsandruss commented Aug 16, 2022

lilybellesweet commented Aug 22, 2022

FischyM commented Nov 9, 2022 • edited Loading

lange-martin commented Nov 22, 2022

Alexsandruss commented Nov 23, 2022

Stack-it-up commented May 1, 2023

Alexsandruss commented May 2, 2023

montagne5641 commented Jun 15, 2024 • edited Loading

Stack-it-up commented Jun 1, 2022 •

edited

Loading

FischyM commented Jun 2, 2022 •

edited

Loading

plenoi commented Jun 18, 2022 •

edited

Loading

Alexsandruss commented Aug 11, 2022 •

edited

Loading

lilybellesweet commented Aug 16, 2022 •

edited

Loading

FischyM commented Nov 9, 2022 •

edited

Loading

montagne5641 commented Jun 15, 2024 •

edited

Loading