Skip to content

Commit

Permalink
Merge pull request #532 from graphistry/dev/cugfql
Browse files Browse the repository at this point in the history
Dev/cugfql
  • Loading branch information
lmeyerov authored Dec 27, 2023
2 parents 7f0d32f + 9a9de51 commit e74123e
Show file tree
Hide file tree
Showing 27 changed files with 5,376 additions and 100 deletions.
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,22 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

## [Development]

## [0.33.0 - 2023-12-26]

### Added

* GFQL: GPU acceleration of `chain`, `hop`, `filter_by_dict`
* `AbstractEngine` to `engine.py::Engine` enum
* `compute.typing.DataFrameT` to centralize df-lib-agnostic type checking

### Refactor

* GFQL and more of compute uses generic dataframe methods and threads through engine

### Infra

* GPU tester threads through LOG_LEVEL

## [0.32.0 - 2023-12-22]

### Added
Expand Down
34 changes: 29 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@
[![Uptime Robot status](https://img.shields.io/uptimerobot/status/m787548531-e9c7b7508fc76fea927e2313?label=hub.graphistry.com)](https://status.graphistry.com/) [<img src="https://img.shields.io/badge/slack-Graphistry%20chat-orange.svg?logo=slack">](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g)
[![Twitter Follow](https://img.shields.io/twitter/follow/graphistry)](https://twitter.com/graphistry)

**PyGraphistry is a Python visual graph AI library to extract, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** Installing with optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your `time to graph` for going from raw data to visualizations and AI models down to three lines of code.
**PyGraphistry is a dataframe-native Python visual graph AI library to extract, query, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** The GFQL query language supports running a large subset of the Cypher property graph query language without requiring external software and adds optional GPU acceleration. Installing PyGraphistry with the optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your **time to graph** for going from raw data to visualizations and AI models down to three lines of code.

Graphistry gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors.
The optional visual engine, Graphistry, gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors.

The PyGraphistry Python client helps several kinds of usage modes:

Expand Down Expand Up @@ -147,14 +147,14 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
g2.plot()
```

* GFQL: Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb))
* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb))

Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL:

```python
from graphistry import n, e_undirected, is_in

g2 = g.chain([
g2 = g1.chain([
n({'user': 'Biden'}),
e_undirected(),
n(name='bridge'),
Expand All @@ -166,6 +166,17 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
g2.plot()
```

Enable GFQL's optional automatic GPU acceleration for 43X+ speedups:

```python
# Switch from Pandas CPU dataframes to RAPIDS GPU dataframes
import cudf
g2 = g1.edges(lambda g: cudf.DataFrame(g._edges))
# GFQL will automaticallly run on a GPU
g3 = g2.chain([n(), e(hops=3), n()])
g3.plot()
```

* [Spark](https://spark.apache.org/)/[Databricks](https://databricks.com/) ([ipynb demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.ipynb), [dbc demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.dbc))

```python
Expand Down Expand Up @@ -1163,6 +1174,8 @@ Both `hop()` and `chain()` (GFQL) match dictionary expressions support dataframe
* numeric: gt, lt, ge, le, eq, ne, between, isna, notna
* string: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull

Both `hop()` and `chain()` will run on GPUs when passing in RAPIDS dataframes. Specify parameter `engine='cudf'` to be sure.

#### Table to graph

```python
Expand Down Expand Up @@ -1235,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns

PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java

See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)
See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)

Traverse within a graph, or expand one graph against another

Expand Down Expand Up @@ -1327,6 +1340,17 @@ pattern2 = Chain.from_json(pattern_json)
g.chain(pattern2).plot()
```

Benefit from automatic GPU acceleration by passing in GPU dataframes:

```python
import cudf

g1 = graphistry.edges(cudf.read_csv('data.csv'), 's', 'd')
g2 = g1.chain(..., engine='cudf')
```

The parameter `engine` is optional, defaulting to `'auto'`.

#### Pipelining

```python
Expand Down
Loading

0 comments on commit e74123e

Please sign in to comment.