Merge pull request #532 from graphistry/dev/cugfql

Dev/cugfql
graphistry · Dec 27, 2023 · e74123e · e74123e
2 parents 7f0d32f + 9a9de51
commit e74123e
Show file tree

Hide file tree

Showing 27 changed files with 5,376 additions and 100 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,22 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 
 ## [Development]
 
+## [0.33.0 - 2023-12-26]
+
+### Added
+
+* GFQL: GPU acceleration of `chain`, `hop`, `filter_by_dict`
+* `AbstractEngine`  to `engine.py::Engine` enum
+* `compute.typing.DataFrameT` to centralize df-lib-agnostic type checking
+
+### Refactor
+
+* GFQL and more of compute uses generic dataframe methods and threads through engine
+
+### Infra
+
+* GPU tester threads through LOG_LEVEL
+
 ## [0.32.0 - 2023-12-22]
 
 ### Added

diff --git a/README.md b/README.md
@@ -11,9 +11,9 @@
 [![Uptime Robot status](https://img.shields.io/uptimerobot/status/m787548531-e9c7b7508fc76fea927e2313?label=hub.graphistry.com)](https://status.graphistry.com/) [<img src="https://img.shields.io/badge/slack-Graphistry%20chat-orange.svg?logo=slack">](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g)
 [![Twitter Follow](https://img.shields.io/twitter/follow/graphistry)](https://twitter.com/graphistry)
 
-**PyGraphistry is a Python visual graph AI library to extract, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** Installing with optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your `time to graph` for going from raw data to visualizations and AI models down to three lines of code.
+**PyGraphistry is a dataframe-native Python visual graph AI library to extract, query, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** The GFQL query language supports running a large subset of the Cypher property graph query language without requiring external software and adds optional GPU acceleration. Installing PyGraphistry with the optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your **time to graph** for going from raw data to visualizations and AI models down to three lines of code.
 
-Graphistry gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors.
+The optional visual engine, Graphistry, gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors.
 
 The PyGraphistry Python client helps several kinds of usage modes:
 
@@ -147,14 +147,14 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
     g2.plot()
     ```
 
-* GFQL: Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb))
+* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb))
 
   Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL:
 
     ```python
     from graphistry import n, e_undirected, is_in
 
-    g2 = g.chain([
+    g2 = g1.chain([
       n({'user': 'Biden'}),
       e_undirected(),
       n(name='bridge'),
@@ -166,6 +166,17 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
     g2.plot()
     ```
 
+    Enable GFQL's optional automatic GPU acceleration for 43X+ speedups:
+
+    ```python
+    # Switch from Pandas CPU dataframes to RAPIDS GPU dataframes
+    import cudf
+    g2 = g1.edges(lambda g: cudf.DataFrame(g._edges))
+    # GFQL will automaticallly run on a GPU
+    g3 = g2.chain([n(), e(hops=3), n()])
+    g3.plot()
+    ```
+
 * [Spark](https://spark.apache.org/)/[Databricks](https://databricks.com/) ([ipynb demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.ipynb), [dbc demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.dbc))
 
     ```python
@@ -1163,6 +1174,8 @@ Both `hop()` and `chain()` (GFQL) match dictionary expressions support dataframe
 * numeric: gt, lt, ge, le, eq, ne, between, isna, notna
 * string: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull
 
+Both `hop()` and `chain()` will run on GPUs when passing in RAPIDS dataframes. Specify parameter `engine='cudf'` to be sure.
+
 #### Table to graph
 
 ```python
@@ -1235,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns
 
 PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java
 
-See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)
+See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)
 
 Traverse within a graph, or expand one graph against another
 
@@ -1327,6 +1340,17 @@ pattern2 = Chain.from_json(pattern_json)
 g.chain(pattern2).plot()
 ```
 
+Benefit from automatic GPU acceleration by passing in GPU dataframes:
+
+```python
+import cudf
+
+g1 = graphistry.edges(cudf.read_csv('data.csv'), 's', 'd')
+g2 = g1.chain(..., engine='cudf')
+```
+
+The parameter `engine` is optional, defaulting to `'auto'`.
+
 #### Pipelining
 
 ```python