Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cucat Featurization base #486

Open
wants to merge 98 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 77 commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
cf07249
cucat feat support
tanmoyio May 15, 2023
d73a2db
cudf test env var added for test_feature_utils.py
tanmoyio May 15, 2023
382e18b
some import fixes
tanmoyio May 15, 2023
44200ac
passthru DT encode/umap, add back for timebar
dcolinmorgan Jun 13, 2023
777afd4
lint
dcolinmorgan Jul 21, 2023
c1bc6f1
updated cu-cat version for optional install
dcolinmorgan Jul 26, 2023
48e4017
type check without loading cudf, via getmodule
dcolinmorgan Jul 28, 2023
6b0b52b
ok we still need the check_cudf def
dcolinmorgan Jul 28, 2023
e4b0c0a
swap lazy import defs
dcolinmorgan Jul 29, 2023
7c0c0c6
working thru comments
dcolinmorgan Aug 4, 2023
f344dd8
address few issues
dcolinmorgan Aug 6, 2023
b6f6388
swap cudf=None type sig for lazy calls
dcolinmorgan Aug 8, 2023
f185a2f
swap cudf=None type sig for lazy calls
dcolinmorgan Aug 8, 2023
410c40d
swap cudf=None type sig for lazy calls
dcolinmorgan Aug 8, 2023
b9067c0
type check lint
dcolinmorgan Aug 8, 2023
8f0bc3a
lint isinstance all over
dcolinmorgan Aug 8, 2023
b7b8e63
lint isinstance all over
dcolinmorgan Aug 8, 2023
e8eb85a
rename lazy cucat to cuda
dcolinmorgan Aug 8, 2023
501ff3b
cudf df constructor change
dcolinmorgan Aug 9, 2023
918ebee
towards single engine=cuda flag
dcolinmorgan Aug 9, 2023
ccf6f47
towards single engine=cuda flag
dcolinmorgan Aug 9, 2023
60de1cf
single cuda flag
dcolinmorgan Aug 11, 2023
0b66776
lint
dcolinmorgan Aug 11, 2023
9f086c8
robust logging for cu_cat
dcolinmorgan Aug 11, 2023
78015f1
single cuda flag
dcolinmorgan Aug 11, 2023
616009b
assert after if
dcolinmorgan Aug 11, 2023
dc38d3b
super > table
dcolinmorgan Aug 11, 2023
376890e
Update feature_utils.py
dcolinmorgan Aug 11, 2023
b9828c5
rollback constant CUDA_CAT
dcolinmorgan Aug 11, 2023
8d13cbe
rollback constant CUDA_CAT
dcolinmorgan Aug 11, 2023
92769bf
else all
dcolinmorgan Aug 11, 2023
af0fc8a
else all
dcolinmorgan Aug 11, 2023
4f78b76
else all
dcolinmorgan Aug 11, 2023
b8a0db2
feat pytest tweaks
dcolinmorgan Aug 15, 2023
6e11117
feat pytest tweaks
dcolinmorgan Aug 15, 2023
b0d36cd
see if last commit induced numba install error
dcolinmorgan Aug 15, 2023
5677bea
feat pytest tweaks
dcolinmorgan Aug 15, 2023
8e15e5e
datetime passthrough for cudf
dcolinmorgan Aug 17, 2023
20200d6
add unadulterated dt back
dcolinmorgan Aug 20, 2023
26cd39c
more flexible multi-dt column add
dcolinmorgan Aug 21, 2023
c4c1bd8
start DT test
dcolinmorgan Aug 23, 2023
d889581
start DT test
dcolinmorgan Aug 24, 2023
48a7308
Merge branch 'master' into feat/gpu-featurization
dcolinmorgan Aug 24, 2023
ba25c89
Merge branch 'feat/gpu-featurization' of https://github.com/graphistr…
dcolinmorgan Aug 25, 2023
8a0ab5c
lint
dcolinmorgan Aug 25, 2023
151ab5b
lint
dcolinmorgan Aug 25, 2023
d63d729
cucat may be erroneously involked
dcolinmorgan Aug 28, 2023
ada126e
maybe fastencoder issue
dcolinmorgan Aug 28, 2023
21a475d
defaulting to cucat, concrete mixedup perhaps
dcolinmorgan Aug 29, 2023
49976e8
defaulting to cucat, concrete mixedup perhaps
dcolinmorgan Aug 29, 2023
f24411e
try basic assert isinstance
dcolinmorgan Aug 30, 2023
d303afb
nope
dcolinmorgan Aug 30, 2023
b34ee85
nope
dcolinmorgan Aug 30, 2023
2456b70
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
8fc0b22
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
ee6c523
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
4808428
defaulting to cucat, concrete mixedup perhaps
dcolinmorgan Aug 30, 2023
a22e85e
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
86fc662
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
614fff4
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
b88e3ea
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
a72d4b1
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
4eef71c
type checking node attributes causing issues
dcolinmorgan Aug 30, 2023
0522981
check which column is off
dcolinmorgan Aug 30, 2023
73ba5d1
trying everything
dcolinmorgan Aug 30, 2023
9da0b11
remove print, add print
dcolinmorgan Aug 30, 2023
f9e9260
same df every time, remove [cols]
dcolinmorgan Aug 30, 2023
58d1461
revert, remove +target_names_node from targets
dcolinmorgan Aug 30, 2023
d5acc1a
revert, remove +target_names_node from targets
dcolinmorgan Aug 30, 2023
614d9f3
nan raising equality issues, filled with 0
dcolinmorgan Aug 31, 2023
31b5f5e
add feat tests back
dcolinmorgan Sep 7, 2023
bc4f290
Merge branch 'master' into feat/gpu-featurization
dcolinmorgan Sep 7, 2023
74a2460
Merge branch 'feat/gpu-featurization' of https://github.com/graphistr…
dcolinmorgan Sep 7, 2023
624c721
comment anxiety assert
dcolinmorgan Sep 7, 2023
2fc6be5
single cuda engine flag
dcolinmorgan Sep 9, 2023
178adba
try constant substitution
dcolinmorgan Sep 9, 2023
90bd8b7
add cuda/gpu generic engine flag for full gpu pipeline
dcolinmorgan Sep 19, 2023
5d16a9e
most comments
dcolinmorgan Sep 21, 2023
e931456
most comments
dcolinmorgan Sep 21, 2023
fc212a8
most comments
dcolinmorgan Sep 21, 2023
d4b1fbe
most comments
dcolinmorgan Sep 21, 2023
498a4de
most comments
dcolinmorgan Sep 21, 2023
aab2ad9
remove single engine flag, try in next PR
dcolinmorgan Sep 21, 2023
f0eb1bf
latest cu-cat version
dcolinmorgan Sep 21, 2023
867874d
edge concat interop
dcolinmorgan Dec 29, 2023
5a69233
Merge branch 'master' into feat/gpu-featurization
dcolinmorgan Dec 29, 2023
cdda3e7
better dc default
dcolinmorgan Dec 29, 2023
63398b3
renaming
dcolinmorgan Jan 3, 2024
b720bc1
renaming
dcolinmorgan Jan 3, 2024
ed824ec
cupyx csr toarray for features_out
dcolinmorgan Jan 4, 2024
1735134
cupyx csr toarray for features_out
dcolinmorgan Jan 4, 2024
824d940
cupyx csr toarray for features_out
dcolinmorgan Jan 4, 2024
c7ce92c
add gpu-umap test, allow cucat to test w/o gpu
dcolinmorgan Jan 4, 2024
30a04a4
add gpu-umap test, allow cucat to test w/o gpu
dcolinmorgan Jan 4, 2024
50df365
dirty_cat version with Table&SuperVectorizer
dcolinmorgan Jan 4, 2024
a654f9f
dirty_cat version with Table&SuperVectorizer
dcolinmorgan Jan 4, 2024
a86be5c
better dimension try
dcolinmorgan Jan 5, 2024
4bd056c
Merge branch 'master' into feat/gpu-featurization
dcolinmorgan Jul 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docker/test-gpu-local.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,4 @@ docker run \
${NETWORK} \
graphistry/test-gpu:${TEST_CPU_VERSION} \
--maxfail=1 \
--ignore=graphistry/tests/test_feature_utils.py \
$@
1 change: 1 addition & 0 deletions graphistry/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
# for preprocessors namespace
# for dirty_cat params
DIRTY_CAT = "dirty_cat"
CUDA_CAT = "cu_cat"
N_TOPICS_DEFAULT = 42
N_TOPICS_TARGET_DEFAULT = 7
N_HASHERS_DEFAULT = 100
Expand Down
33 changes: 20 additions & 13 deletions graphistry/embed_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import numpy as np
import pandas as pd
from typing import Optional, Union, Callable, List, TYPE_CHECKING, Any, Tuple

from inspect import getmodule
from .PlotterBase import Plottable
from .compute.ComputeMixin import ComputeMixin

Expand All @@ -21,12 +21,14 @@ def lazy_embed_import_dep():
except:
return False, None, None, None, None, None, None, None

def check_cudf():
try:
import cudf
return True, cudf
except:
return False, object
# def lazy_isinstance(self._nodes, cudf):

# def check_cudf():
# try:
# import cudf
# return True, cudf
# except:
# return False, object
dcolinmorgan marked this conversation as resolved.
Show resolved Hide resolved


if TYPE_CHECKING:
Expand All @@ -38,7 +40,7 @@ def check_cudf():
MIXIN_BASE = object
torch = Any

has_cudf, cudf = check_cudf()
# has_cudf, cudf = check_cudf()
dcolinmorgan marked this conversation as resolved.
Show resolved Hide resolved

XSymbolic = Optional[Union[List[str], str, pd.DataFrame]]
ProtoSymbolic = Optional[Union[str, Callable[[TT, TT, TT], TT]]] # type: ignore
Expand Down Expand Up @@ -301,12 +303,14 @@ def embed(
"""
# this is temporary, will be fixed in future releases
try:
if isinstance(self._nodes, cudf.DataFrame):
# if isinstance(self._nodes, cudf.DataFrame):
lmeyerov marked this conversation as resolved.
Show resolved Hide resolved
dcolinmorgan marked this conversation as resolved.
Show resolved Hide resolved
if 'cudf' in str(getmodule(self._nodes)):
self._nodes = self._nodes.to_pandas()
except:
pass
try:
if isinstance(self._edges, cudf.DataFrame):
# if isinstance(self._edges, cudf.DataFrame):
dcolinmorgan marked this conversation as resolved.
Show resolved Hide resolved
if 'cudf' in str(getmodule(self._edges)):
self._edges = self._edges.to_pandas()
except:
pass
Expand Down Expand Up @@ -436,7 +440,8 @@ def predict_links(
else:
# this is temporary, will be removed after gpu feature utils
try:
if isinstance(source, cudf.DataFrame):
# if isinstance(source, cudf.DataFrame):
dcolinmorgan marked this conversation as resolved.
Show resolved Hide resolved
if 'cudf' in str(getmodule(source)):
source = source.to_pandas() # type: ignore
except:
pass
Expand All @@ -448,7 +453,8 @@ def predict_links(
else:
# this is temporary, will be removed after gpu feature utils
try:
if isinstance(relation, cudf.DataFrame):
# if isinstance(relation, cudf.DataFrame):
dcolinmorgan marked this conversation as resolved.
Show resolved Hide resolved
if 'cudf' in str(getmodule(relation)):
relation = relation.to_pandas() # type: ignore
except:
pass
Expand All @@ -460,7 +466,8 @@ def predict_links(
else:
# this is temporary, will be removed after gpu feature utils
try:
if isinstance(destination, cudf.DataFrame):
# if isinstance(destination, cudf.DataFrame):
if 'cudf' in str(getmodule(destination)):
destination = destination.to_pandas() # type: ignore
except:
pass
Expand Down
Loading