From 59caacaf8f23066c79fb09a764a3ffd5ab64bf6f Mon Sep 17 00:00:00 2001 From: karmenrabar Date: Thu, 14 Sep 2023 09:57:35 +0200 Subject: [PATCH 001/104] Add memgraph tutorial --- README.md | 24 + .../memgraph/visualizing_iam_dataset.ipynb | 601 ++++++++++++++++++ 2 files changed, 625 insertions(+) create mode 100644 demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb diff --git a/README.md b/README.md index 4756c86654..73388abcdb 100644 --- a/README.md +++ b/README.md @@ -229,6 +229,30 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g.plot() ``` +* [Memgraph](https://memgraph.com/) ([notebook demo](demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb)) + + ```python + from neo4j import GraphDatabase + MEMGRAPH = { + 'uri': "bolt://localhost:7687", + 'auth': (" ", " ") + } + graphistry.register(bolt=MEMGRAPH) + ``` + + ```python + driver = GraphDatabase.driver(**MEMGRAPH) + with driver.session() as session: + session.run(""" + CREATE (per1:Person {id: 1, name: "Julie"}) + CREATE (fil2:File {id: 2, name: "welcome_to_memgraph.txt"}) + CREATE (per1)-[:HAS_ACCESS_TO]->(fil2) """) + g = graphistry.cypher(""" + MATCH (node1)-[connection]-(node2) + RETURN node1, connection, node2;""") + g.plot() + ``` + * [Azure Cosmos DB (Gremlin)](https://azure.microsoft.com/en-us/services/cosmos-db/) ```python diff --git a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb new file mode 100644 index 0000000000..c552e82bee --- /dev/null +++ b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb @@ -0,0 +1,601 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tutorial: Visualizing Identity and Access Management data set with Memgraph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook showcases the utilization of Graphistry to visualize data from Memgraph using a sample dataset related to a company's Identity and Access Management. We'll demonstrate how Graphistry streamlines the visualization of Cypher queries, making it easier to analyze extensive data effectively." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### About the dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Identity and Access Management (IAM) outlines who can access what, why, and when. Each organization's unique identity and structure shape how access is managed, forming the company's IAM. If the current IAM system becomes slow and unresponsive – unable to handle changes in team roles and permissions – graph databases are a leading solution. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### About Memgraph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Memgraph](https://memgraph.com/) is an open-source, in-memory graph database. It is compatible with Neo4j's Bolt protocol and supports the widely used Cypher query language for interacting with the database. Cypher provides a powerful and expressive way to work with graph structures and perform various operations on the nodes and relationships within a graph database.\n", + "\n", + "A convenient entry point to kickstart your journey with Memgraph is through Docker. By simply entering the following command in your terminal, you can set up the Memgraph Platform within a Docker container:\n" + ] + }, + { + "cell_type": "raw", + "metadata": {}, + "source": [ + "docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MEMGRAPH=\" --bolt-server-name-for-init=Neo4j/\" memgraph/memgraph-platform " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If everything went well, after a couple of seconds you should see a message that Memgraph Lab is running at localhost:3000. You can access it through your web browser and start exploring !" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Configuration and installation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To begin, make sure to install the Graphistry Python client and the Neo4j Bolt drivers. You can achieve this by removing the comment symbol (#) from the first two lines in the provided code snippet." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "#!pip install --user graphistry\n", + "#!pip install --user graphistry[bolt]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, import the necessary dependencies, including pandas, graphistry, and GraphDatabase. These libraries will be utilized to load and work with the data." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import graphistry\n", + "from neo4j import GraphDatabase" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Lastly, establish a connection with your Graphistry GPU server account. Make sure to substitute the connection string and password with your personal credentials. You can create your account [here](https://hub.graphistry.com/). For additional configuration options, refer to [GitHub](https://github.com/graphistry/pygraphistry#configure)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# To specify Graphistry account & server, use:\n", + "# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')\n", + "graphistry.register(api=3, username='k', password='123Rocky') " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Connecting to Graphistry and Memgraph" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll establish a connection to a Memgraph database using the Bolt protocol. The Bolt protocol is a binary communication protocol that facilitates interaction between the Python code and the Memgraph database.\n", + "\n", + "The URI includes the hostname (localhost) and the port number (7687) where the Memgraph database is listening for Bolt connections. The authentication part includes a tuple with the username and the password that you would use to authenticate and gain access to the Memgraph database. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "MEMGRAPH = {\n", + " 'uri': \"bolt://localhost:7687\", \n", + " 'auth': (\" \", \" \")\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After that, we can use the Graphistry library to register a connection to a database using the Bolt protocol and the provided configuration.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "graphistry.register(bolt=MEMGRAPH)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Uploading the dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now initialize a Memgraph driver instance. Following this, we'll be able to utilize the session.run() method to execute Cypher queries." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "driver = GraphDatabase.driver(**MEMGRAPH)\n", + "\n", + "with driver.session() as session: \n", + " session.run(\"\"\" CREATE (per1:Person {id: 1, name: \"Julie\"})\n", + "CREATE (per2:Person {id: 2, name: \"Peter\"})\n", + "CREATE (per3:Person {id: 3, name: \"Anna\"})\n", + "CREATE (per4:Person {id: 4, name: \"Carl\"})\n", + "CREATE (tea1:Team {id: 1, name: \"Engineering\"})\n", + "CREATE (tea2:Team {id: 2, name: \"Operations\"})\n", + "CREATE (tea3:Team {id: 3, name: \"Marketing\"})\n", + "CREATE (rep1:Repository {id: 1, name: \"Memgraph\"})\n", + "CREATE (rep2:Repository {id: 2, name: \"MAGE\"})\n", + "CREATE (rep3:Repository {id: 3, name: \"Marketing\"})\n", + "CREATE (com1:Company {id: 1, name: \"Memgraph\"})\n", + "CREATE (sto1:Storage {id: 1, name: \"Google Drive\"})\n", + "CREATE (sto2:Storage {id: 2, name: \"Notion\"})\n", + "CREATE (fol1:Folder {id: 1, name: \"engineering_folder\"})\n", + "CREATE (fol2:Folder {id: 2, name: \"operations_folder\"})\n", + "CREATE (acc1:Account {id: 1, name: \"Facebook\"})\n", + "CREATE (acc2:Account {id: 2, name: \"LinkedIn\"})\n", + "CREATE (acc3:Account {id: 3, name: \"HackerNews\"}) \n", + "CREATE (fil1:File {id: 1, name: \"welcome_to_engineering.txt\"})\n", + "CREATE (fil2:File {id: 2, name: \"welcome_to_memgraph.txt\"})\n", + "CREATE (fil3:File {id: 3, name: \"operations101.txt\"})\n", + "CREATE (fil4:File {id: 4, name: \"expenses2022.csv\"})\n", + "CREATE (fil5:File {id: 5, name: \"salaries2022.csv\"})\n", + "CREATE (fil6:File {id: 6, name: \"engineering101.txt\"})\n", + "CREATE (fil7:File {id: 7, name: \"working_with_github.txt\"})\n", + "CREATE (fil8:File {id: 8, name: \"working_with_notion.txt\"})\n", + "CREATE (fil9:File {id: 9, name: \"welcome_to_marketing.txt\"})\n", + "CREATE (per1)-[:HAS_ACCESS_TO]->(fil2)\n", + "CREATE (per2)-[:HAS_ACCESS_TO]->(fil2) \n", + "CREATE (per2)-[:IS_PART_OF]->(tea1)\n", + "CREATE (per2)-[:IS_PART_OF]->(com1)\n", + "CREATE (per2)-[:IS_PART_OF]->(tea2)\n", + "CREATE (per3)-[:IS_PART_OF]->(tea2)\n", + "CREATE (per3)-[:IS_PART_OF]->(tea3)\n", + "CREATE (per3)-[:IS_PART_OF]->(com1)\n", + "CREATE (per4)-[:IS_PART_OF]->(tea1)\n", + "CREATE (per4)-[:IS_PART_OF]->(com1)\n", + "CREATE (per4)-[:HAS_ACCESS_TO]->(fil2)\n", + "CREATE (com1)-[:HAS_TEAM]->(tea1)\n", + "CREATE (com1)-[:HAS_TEAM]->(tea3)\n", + "CREATE (com1)-[:HAS_TEAM]->(tea2)\n", + "CREATE (fil1)-[:IS_STORED_IN]->(sto1)\n", + "CREATE (fil1)-[:IS_STORED_IN]->(sto2)\n", + "CREATE (fol2)-[:IS_STORED_IN]->(sto1)\n", + "CREATE (fil9)-[:IS_STORED_IN]->(sto1)\n", + "CREATE (fil9)-[:IS_STORED_IN]->(sto2)\n", + "CREATE (fol1)-[:IS_STORED_IN]->(sto1)\n", + "CREATE (fil2)-[:CREATED_BY]->(per3)\n", + "CREATE (fol1)-[:HAS_ACCESS_TO]->(fil6)\n", + "CREATE (fol1)-[:HAS_ACCESS_TO]->(fil7)\n", + "CREATE (fol1)-[:HAS_ACCESS_TO]->(fil8)\n", + "CREATE (fol2)-[:HAS_ACCESS_TO]->(fil3)\n", + "CREATE (fol2)-[:HAS_ACCESS_TO]->(fil4)\n", + "CREATE (fol2)-[:HAS_ACCESS_TO]->(fil5)\n", + "CREATE (tea2)-[:HAS_ACCESS_TO]->(fol2)\n", + "CREATE (rep3)-[:HAS_ACCESS_TO]->(acc1)\n", + "CREATE (rep3)-[:HAS_ACCESS_TO]->(acc2)\n", + "CREATE (rep3)-[:HAS_ACCESS_TO]->(acc3)\n", + "CREATE (rep3)-[:HAS_ACCESS_TO]->(fil9)\n", + "CREATE (tea1)-[:HAS_ACCESS_TO]->(rep1)\n", + "CREATE (tea1)-[:HAS_ACCESS_TO]->(rep2)\n", + "CREATE (tea1)-[:HAS_ACCESS_TO]->(rep3)\n", + "CREATE (tea1)-[:HAS_ACCESS_TO]->(fil1)\n", + "CREATE (tea1)-[:HAS_ACCESS_TO]->(fol1)\n", + " \"\"\")" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "g = graphistry.cypher(\"\"\" MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2;\n", + " \"\"\")" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Visualization of the data \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After populating Memgraph instance, it's time to visualize the dataset with graphistry. But first, let's see the graph schema in Memgraph Lab. It defines the structure of your data and its relationships, providing a blueprint for how your data elements are connected and organized within the graph database and offers interactive graph visualizations.\n" + ] + }, + { + "attachments": { + "Screenshot from 2023-08-31 13-12-54.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Screenshot from 2023-08-31 13-12-54.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Plotting with grapistry is done by the following simple command:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "g.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Screenshot:" + ] + }, + { + "attachments": { + "Screenshot from 2023-08-31 18-48-56.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Screenshot from 2023-08-31 18-48-56.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can easily investigate which files Carl has access to." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "direct_file_access_Carl = graphistry.cypher(\"\"\" MATCH (j:Person {name:\"Carl\"})-[r:HAS_ACCESS_TO]->(n)\n", + "RETURN *; \"\"\")\n", + "direct_file_access_Carl.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Screenshot:\n" + ] + }, + { + "attachments": { + "Screenshot from 2023-08-31 18-50-30.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Screenshot from 2023-08-31 18-50-30.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Carl has direct access to a file. But, Since Team nodes have access to specific folders, if Carl is a part of a team, he indirectly has access to all files in that folder. With the next query we can see how a depth-first search is performed from a node with the label Person with the name Carl to the node with the label File. It finds a path from Carl to a file directly or through other nodes. The symbol * represents depth-first search and the number 3 is a maximum depth (maximum number of jumps)." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_file_access_Carl = graphistry.cypher(\"\"\"\n", + "MATCH p=(:Person {name:\"Carl\"})-[* ..3]->(:File)\n", + "RETURN p;\n", + " \"\"\")\n", + "all_file_access_Carl.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Screenshot:" + ] + }, + { + "attachments": { + "Screenshot from 2023-08-31 18-51-35.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Screenshot from 2023-08-31 18-51-35.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This can also be done for all Person nodes with executing the following query. This is an example why graph databases are great for Identity and Access Management." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "all_file_access = graphistry.cypher(\"\"\"\n", + "MATCH p=(:Person)-[* ..3]->(:File)\n", + "RETURN p;\n", + " \"\"\")\n", + "all_file_access.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Screenshot:" + ] + }, + { + "attachments": { + "Screenshot from 2023-08-31 18-52-42.png": { + "image/png": "" + } + }, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Screenshot from 2023-08-31 18-52-42.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Takeaway and further reading" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "Pygraphistry complements Memgraph by providing a specialized tool for creating rich and interactive visualizations of graph data stored in Memgraph. It allows users to gain deeper insights into their graph data by leveraging the advanced visualization capabilities of the Graphistry platform, especially when dealing with complex and extensive graph data sets. \n", + "\n", + "Feel free to get your hands on Graphistry and Memgraph and share your insights or questions with us on [Discord](https://discord.com/invite/memgraph) !\n", + "\n", + "You can find out more about building and scaling modern IAM systems with Memgraph [here](https://memgraph.com/identity-access-management?utm_source=memgraph&utm_medium=referral&utm_campaign=bfb_blog&utm_content=iam) and on blogposts [What Makes Memgraph Great for Real-Time Performance in IAM Systems](https://memgraph.com/blog/what-makes-memgraph-great-for-real-time-performance-in-iam-systems), [Benefits Graph Databases Bring to Identity and Access Management](https://memgraph.com/blog/benefits-graph-databases-bring-to-identity-and-access-management) and [How Graphs Solve Two Biggest Problems of Traditional IAM Systems](https://memgraph.com/blog/how-graphs-solves-two-biggest-problems-of-traditional-iam-systems).\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "vsc", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 568c83d0abdd1a006c364ab24bacdbb63f9246ad Mon Sep 17 00:00:00 2001 From: karmenrabar Date: Thu, 14 Sep 2023 11:26:48 +0200 Subject: [PATCH 002/104] Changed user/pass and updated screenshots --- .../memgraph/visualizing_iam_dataset.ipynb | 38 +++---------------- 1 file changed, 6 insertions(+), 32 deletions(-) diff --git a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb index c552e82bee..12e5062121 100644 --- a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb +++ b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ @@ -118,7 +117,7 @@ "source": [ "# To specify Graphistry account & server, use:\n", "# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')\n", - "graphistry.register(api=3, username='k', password='123Rocky') " + "# graphistry.register(..., personal_key_id='pkey_id', personal_key_secret='pkey_secret') # Key instead of username+password+org_name" ] }, { @@ -283,15 +282,10 @@ ] }, { - "attachments": { - "Screenshot from 2023-08-31 13-12-54.png": { - "image/png": "" - } - }, "cell_type": "markdown", "metadata": {}, "source": [ - "![Screenshot from 2023-08-31 13-12-54.png]()" + "![Screenshot](https://github.com/karmenrabar/pygraphistry_images/blob/main/memgraphlab.png?raw=true)" ] }, { @@ -346,15 +340,10 @@ ] }, { - "attachments": { - "Screenshot from 2023-08-31 18-48-56.png": { - "image/png": "" - } - }, "cell_type": "markdown", "metadata": {}, "source": [ - "![Screenshot from 2023-08-31 18-48-56.png]()" + "![Screenshot](https://github.com/karmenrabar/pygraphistry_images/blob/main/allaccess.png?raw=true)" ] }, { @@ -411,15 +400,10 @@ ] }, { - "attachments": { - "Screenshot from 2023-08-31 18-50-30.png": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAABogAAAIeCAYAAACIr41mAAAAi3pUWHRSYXcgcHJvZmlsZSB0eXBlIGV4aWYAAHjaVY7LDcQwCETvVJESho8Bl7OKHGk72PKD40jZvAMMCM1A4/c9aJswjKxFendHYd26fEokFgqwgGevuri7cimp8Z5JZQnvGbDn0Awvmnr6ERbhzXffpdxlqF6VFXRdTbP5Sn+MMuEr+r3nAfkPoBP6Ry0hbWvXWQAACghpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+Cjx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IlhNUCBDb3JlIDQuNC4wLUV4aXYyIj4KIDxyZGY6UkRGIHhtbG5zOnJkZj0iaHR0cDovL3d3dy53My5vcmcvMTk5OS8wMi8yMi1yZGYtc3ludGF4LW5zIyI+CiAgPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIKICAgIHhtbG5zOmV4aWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20vZXhpZi8xLjAvIgogICAgeG1sbnM6dGlmZj0iaHR0cDovL25zLmFkb2JlLmNvbS90aWZmLzEuMC8iCiAgIGV4aWY6UGl4ZWxYRGltZW5zaW9uPSIxNjcyIgogICBleGlmOlBpeGVsWURpbWVuc2lvbj0iNTQyIgogICB0aWZmOkltYWdlV2lkdGg9IjE2NzIiCiAgIHRpZmY6SW1hZ2VIZWlnaHQ9IjU0MiIKICAgdGlmZjpPcmllbnRhdGlvbj0iMSIvPgogPC9yZGY6UkRGPgo8L3g6eG1wbWV0YT4KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgCiAgICAgICAgICAgICAgICAgICAgICAgICAgIAo8P3hwYWNrZXQgZW5kPSJ3Ij8+0+OO8gAAAARzQklUCAgICHwIZIgAACAASURBVHja7N1rkJ2Ffef53zl9U6sldbcujRCoW41AGNAFITBgxwiwY2eS9czGztRWDSAZKvNqaqsSO4nf5uV613Fqq7bmzU5sc5vanZnUZGbijS/YICA2MaArYCQu6m7dr32RutXXc/YFhORgx0GXVp/m+Xyq9OJ/dE6fp/7Pq6e+/Txd2rL5nmoAAAAAAAAojLIVAAAAAAAAFItABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAAAAAABAwQhEAAAAAAAABSMQAQAAAAAAFIxABAAAAAAAUDACEQAAAAAAQME0Xu4PKJfLuWZlV5YsWZyZmZmcPTOYs2cHbRYAAAAAAKBOXXIgKpVK2bjpttx++8a0LGip+b8zZ87m7156JYcPHbFhAAAAAACAOlPasvme6sV+qFwu5wu/9dms7r7+n3xPtVrNSz/9efbte8OWAQAAAAAA6sgl3UF09z13fhCHjhw5mlde3pVTJ0+nsbEx3T3X5+6770zborbc86lPZnBo2J1EAAAAAAAAdeSiA9GSJYuzfsOtSZKD7/blmR89l2q1mg0bb0u5XM6e3fty7Ojx/M9f+mLa2hbm7nvuFIgAAAAAAADqSPliP7D2xhtSKpUyMzOTF1/4WarV955Qt6R9STo62pMko6Nj+buXXk6SLFu2NEuXdto0AAAAAABAnbjoO4iWL1+aJDlx/GQuXBjPuptvTENDQ5Z2dqRcLueWW29OZaaSd989+A+fWbEsZ88O2jYAAAAAAEAduOhA1NjUlCSZmJxMkqxbd2MamxqzeNGipFTKzZ+4KTMzMzlw4O1UKpWUy+U0vf8ZAAAAAAAA5t5FB6KxsbEk+eBxcn/9P76fJPn0Z+5NY0NDdjz3YpJk8eLFKZfL73/mgk0DAAAAAADUiYv+G0RHDh9NknR2duS661b9w39Uqx/8PaIk2bDx1iRJpVLJ8WPHbRoAAAAAAKBOXHQgOvhuX86fH02SPPi5+7Js2Xt/k2jnq7vz8ss7kyS33Hpzblt/S5LkwP63c+HCuE0DAAAAAADUiYt+xNzMTCU7nnsxv/07n09ra2t+98tfzDtvH8zpU6fT2NSY7u7VuWZl1wfvb21dkHK5nEqlYtsAAAAAAAB1oLRl8z3VS/lgz5ruPPjZ+9LU1PQr/396ejqNje/1p4MH+/PMD5+teQQdAAAAAAAAc6Nh1bXX/+mlfHB4aDj73zyQSqWSBS0L0tTUmEqlkjOnz2bf3tfz42d2pKNjSTo7Oz74d/Bgv40DAAAAAADMsUu+g+gj/fBSKZ/7/APp7e1Jkrz+2i/yty++ZOsAAAAAAABzqDybP7xarebHP3ou/X0DSZKlSzttHAAAAAAAYI7N6h1Ef69cLqe3tyfHj5/I6OiYrQMAAAAAAMyhxqvxJZVKJe+8c9C2AQAAAAAA6kDZCgAAAAAAAIpFIAIAAAAAACgYgQgAAAAAAKBgBCIAAAAAAICCEYgAAAAAAAAKRiACAAAAAAAoGIEIAAAAAACgYAQiAAAAAACAghGIAAAAAAAACkYgAgAAAAAAKBiBCAAAAAAAoGAarQAAACAplcspNTSkVC4npVJSTZJqqpVKqjMzqVYqlgQAAHxsCEQAAEBhlRob09KxNOW2tixc0pGmppb3QlHp/TdUk5lqJdMT4xkbGU5lbDSTQ2dTmZ62PAAAYF67IoGoqbkpi9raMj09k9HR0VT8Zh0AAFCnGhYsSEvnsizsXJrm1rZ0LV2a69f0ZPXqnixpa03rgpY0NjUl1Wqmp6YzPlPJ4NBQDh86lP4DB3Jm9HymxsZy/vTJTI4MpzIxbqkAAMC8c1mB6NpVK3PHHZuy6rprU3r/V+wmJyfzztsHs2vnnpw/P2rDAABAXSi3LEhb1zVZfM21uXbltbnl5puzfv36rFzZlcZywz//A+65OzOVSg4fOZLX9u3NL956O8ePHs3omVMZO3kiM5Pj7z+WDgAAoP6Vtmy+55IuYe765JZsvmPjP/n/U1NT+dEPn83hQ0dsGQAAmLuLnlIpLSuvTce1q9O1fHm+9OXfy9qe1R/8ktvleO0Xb+a//+f/lKGJyQwOHMyFk8ctHAAAmB/XSpcSiDZsvC33fuqTSZLR0dHs3f1aTp06k8amxqxefV1uve2WNDSUMzU1lf/2X7+Xs2cHbRoAALjqGhe2ZcnqnqxYdX2+8NkHs2HjxrS2tFzR7xi7cCGv79+fH/zN93PiyOGcO9SX6fELlg8AANS1hlXXXv+nF/OB1oWt+fwXHkxDQ0NOnTyd//ZX38vRo8dz/vxoRkbO5fChIzly+GjW3tibpqamtHe0560Db9s0AABw9ZRKaVy8OMtv+kTWb9yY7du358be3jQ1Nl7xr2pqasp1K1dm3c3rcm7sQs5Xqpkev5AZf5sIAACoYxcdiG75xLr0rOlOtVrNX/+P72d0dOyX3jM6Opapqal0d1+fJUsW58CBdzI5MWnbAADAVdHc0Zmum2/L1vvvz//ye1+64ncN/SqL2tqyccP6jI6cy9mpmUyOjWZmXCQCAADq00X/+lzXyq4kyckTpzI8PJLOpR1pbKj9MZVKJW+//W4+/Rv3JElWXtOVcyPnbBsAAJh1TR2d6brpE/nt3/mdfOquO1NK6ap9d2NDQ770e1/O4qUdeaZayel3DmRqyCO3AQCA+nPRgailuTnJe8/ZTpLf/dIX09j4y4HoL/7vJ1KpVFIul9OyoMWmAQCAWdewcGGW33BTHnjgwdx37z1zcgzlUim/+dnPZmZ6Jj+pJmcOvJGp835hDgAAqC8XHYguvB+GlixenCT59n948le+r62tLeVy+f3PeKwCAAAwuxqaW7J83W2588478/nPPZBU5+5Yyinl87/5mznadzB7xkYz/NabqcxMO0kAAEDdKF/sB44eOZ4kWbZ8aa55/3Fzv8r6DbckSarVao4fO27TAADArFq4clWWL16Uf/GFL6RULc358TSWy3n40ceydt3NWbS6xwkCAADqykUHonff7cvY2FiS5HOfuz+dnR2/9J6bP3FTNm5a/9773zmY0dExmwYAAGZNS8fSLF/dk4d///ezZNGiujmuBc3N+fyDD2RJ18o0LVzkRAEAAHWjYdW11//pxXygUqlkaGgka2/sTXNLcz5xy7p0dLZn8aK2XHf9tbnrk3dk/YZbUyq99xt7Z88Mpq9vwKYBAIBZ07n2pnzyk3fn3jvvrLtjW7FiRc4MDebEyLmMnz2dVKtOGAAAMOcuOhAlyfDwSIaGhrK6e3UaGxuzbNnSrO6+PtddtyqLF7/3W3HT09Mpl8tZuqwzixcvSn//IdsGAACuuJb2znStuSH/+ktfTtvChXV5jEs7OrJz775MjIxkZsLfaAUAAObeJQWiJBkcHMr+N99KUk1Lc3Oam5tTqVRy5vTZ7Nv7en78zI50dCxJZ2dHli1fKhIBAABXXKlczqKe3nz6U7+RLZtvr9vjbFvUloED+3Pq3GgmBs84cQAAwNxfT23ZfM+sPd+gVCrlc59/IL297/1B1n17X8/PfvpzWwcAAK6IhtaFWbnh9vy7f/tv093dXdfHemp4OH/+Z3+eo/tezfSYv9MKAADMrfJs/vBqtZof/+i59L//N4iWr1hu4wAAwBWzoKMzS5a0p2vlyro/1uXt7enqWpGmxe1OHAAAMOcaZ/sLKpVKfvTDZ9Pb25Pjx0/aOAAAcMU0LVqcDTevy4Lm5ro/1lKS7tXX590DS3PhxDEnDwAAmFONV+NLKpVK3nnnoG0DAABXTLmpKQuXtGfLvffOm2Ne092dhnkQswAAgAJcU1kBAAAwHzUuaE3bwrZ0LVs6b475+p6eNLe2ptwkEgEAAHNLIAIAAOanxqYsbGtLU7lh3hxy+5LFWbRgQRpbFzp/AADAnBKIAACAeanU1JjWhfMrtDQ3NqattTUNLQucQAAAYE4JRAAAwLzU2NiclpaWeXXMpZSysG1RSg0uxQAAgLnlqgQAAJinVzPllMvz75KmoaEh1VLJ+QMAAOb2ksoKAACA+aharaRSmZlfB12qZmZmOtVUnUAAAGBOCUQAAMC8ND05mfHxiXl1zJVKMnbuXDI17QQCAABzSiACAADmp+npjF8Ym1eHPDU9nZGRkVQnJpw/AABgTglEAADAvFSdmsr54eFMzaPHzA2NjGRsZiZT46NOIAAAMKcEIgAAYF6aGb+Q0fHxHD12Yt4c86GDBzM1MZ6KR8wBAABzTCACAADmpcr0VC6MDOfln/103hzzO2+/lZmJSScPAACYcwIRAAAwb02dP5fX33o745PzI7ocPnEq586cdOIAAIA5JxABAADz1sTIcEZHR3Pk0KG6P9bDx0/k1IkTqY76+0MAAMDcu6KBqKGhIQ0NmhMAAHB1VCbGc2F4KLv27ku1jo9zplLJD773vYycPZ3p8QtOHAAAMOeuWM1ZtmxpHvv9R/KVxx5Ja+sCmwUAAGZdtVLJ+eOHs2vPnpw4fbpuj/P4sWM50HcwoyePOWkAAEBduGKBaMGClpRKpTQ0lNPc0mKzAADAVTE1MpJzZ0/nJz/+cV0eXyXV7HjhxZw7eyaVC+4eAgAA6kPDqmuv/9PL+QELFizIpz9zT+686440NDQkSW668YY0Nzfl+LETqVartgwAAMyq6cnJXKhU07t2bTo7Ourq2Pbs3pNnnn02w+++nZnJSScLAACoC5d1B1FbW1u+/K//VZYsXpzdu/Z+8Porr+zKDWt781v/4nMplUq2DAAAzKqpc8MZOnE8Tzz9HzNy/nzdHNd0tZLvP/PjDB0/lqmxUScKAACoG5d1B9Fv/fbnMjw8ku//zTMZOXc+y5Z1ZmR4JC///NUc2P9ONm/ZlHK5nBPHT9o0AAAwq6bGRlNpac3o8HBuWb8+DeXynB7P+MREvvvkU3nnwP6cH+hLtVJxkgAAgLpxSYFo6dLOrLx2ZTZuWp+fv/RKzp07n4mJibx14J289dY7qVQqaWlpTlNTU9bdvDYnjp9MqVzK1OSUjQMAALOiWqlkZvR8hqZnMj05mZvWrZuzJxpMTU/nBz/8YV559dUMvnMglckJJwgAAKgrFx2IurpW5Eu/9y+zdm1vSqVSbrzphiTVHD16vOZ9v/vlL2bNmu60tLTklltvztob1mTf3tdtHAAAmDWV6alMjV/IqbHxTIyNzlkk+qv/8p/z4kt/l8G39mf6wpgTAwAA1J1GKwAAAD5OJoeHcvad/Xk+lYyPT+SL/+pfprWl5ap89/mxsfz3730vr778Sgb7D2Zq7LwTAgAA1KWLvoNobOxCDr7blyNHjuWGG9bkB9//cQ7sfzvVarXmfX0H+1OtJm1trfn//vqH2bfvjUxNecQcAAAw+2bGxzN5/lxOnjufgb6+rO5enUVti2b1bqKBw4fzn/7LX+a1PXsy+O5bmTo34kQAAAB166LvIKpWqzl7djBnzw7m+PETuWnd2hwaOJy2RW3ZuvXTqVareeZHz2V6ejo3rF2TN15/M6dOnbZpAADgqpo8N5LB/W/k9fELGTh2PF/43GezdevWNFzhSDQ9M5NnfvKTvPDTn2Xk1IkMvb3/l36BDgAAoN5c9B1E/9jRI8dy5113ZPXq69LY2JBbb/tE2tuXZHR0NPdt/XTOjZzL3774ki0DAABzolqpZHJkKFMXLqT/+Im8uX9/picnsrzrmjQ3Xd4Tt6enJjJy6mj+/V88nld37czZg29n9PjRVCsViwcAAOpeacvmey7rV9sWLFiQe+69K2tv7E1DQ0OSZGJiMm+8/mZefWVXKi6OAACAOtDQ3JLmZcvTtrwrXUuXZsMdd+TWG2/M8muuyaK2tjSUy7/28zOVSs6PjubUsSM5e+Z0nv3Zz9NUrWRl+6L84L/+VWamPVIbAACYPy47EP29665fld/5n76QJPl//5+/zPCQ520DAAD1acGyFVmwvCttS9rT0NiQFR0d6V69Otf1rMmKrq4sWLAgSXLhwoWcOXkyh/r6cvDtt3L2wnhmKjNZd921ue3G1eldtSrlcjlf+8M/8Vg5AABgXmm8Uj9o/MJ4qtVqKpVKJicmbRYAAKhb42dOZWLwTM41N6ehpTWDixbn7QP7U2psTFNzSxoaG5NqMjMzncmJ8WR6JpWJ8UycH8nM+HjuWLk0a6+//oOft237w3n8u09aLAAAMG9csUB05szZfPs/PJmkmpkZj5UDAADqW7VSycz4eGbGxzM5PPjei6VSSu//S5JKtZr8isdmv7bvtTz42fs/mDfdvjEdHR0ZGhqyWAAAYF4oX8kfNjMzIw4BAADzV7WaaqWSysxMKjMzvzIOJUlfX3/yoUfKrentsT8AAGDeKFsBAADAxXv88adq5q1bP2MpAADAvCEQAQAAXII9u/dm+B89Uq5nTU/WrHEXEQAAMD8IRAAAAJeo72B/zXz/A/dZCgAAMC8IRAAAAJfouR0v1Mw9Pe4gAgAA5geBCAAA4BL19/Wnv+8f7iJq72jP7bdvshgAAKDuCUQAAACX4blnn6+Zt33lYUsBAADqnkAEAABwGfbs2ZtUqzWv3b7ZXUQAAEB9E4gAAAAu0549e1MqldLY2JiGhob0rlljKQAAQF0rbdl8T9UaAACAj6vGxsYsW7Y0y5cvz4LWBalWqxkbHcupU6dy9uxgqtXLvyQqlUr5v/79/5lly5Zmeno6g4ND+V//3R9YPgAAUL/XSlYAAAB8HJVKpSxatCgbN23I2rU3pLOzIy0tLalWq7lw4UJOnzqd/Qfeymv7Xs/U1NRlf9e777ybtWtvyMTEREZGRvKVRx/Jd7/zpBMBAADUJYEIAAD4WFq4sDV3bNmcT3/63rS2tubQoUMZGDiUUqmUFStW5Lb1t2bltSuTarJr1+7L/r5jR4/VzO3t7U4CAABQtwQiAADgY6dcLmfFihW5++670tramjfffDM//duXcvbs4PuBaHlu37wpt9zyiXT3rM6+fa+lUqmko7Pjg7uNGsoNOXf+fAb6B3Lo0OFUq9W0trZm3bobs3jx4pw4eSpNTY255pquHDzYn1df3ZVtX3nkg2PoWdOTUql0RR5hBwAAcKUJRAAAwMdOc3NzrrtuVZYtX5bTp8/kpZ/9PAcP9n3w/+fOjWR4eDiHDh3OqVOnkiQdHe154IGt6e1dk6mpqUxOTqa9vT2H1/TkhedfTH//QBa0Lsj6DevT3b06AwOHsnRpZxYsWJCJ8YmcOXM2/X39792V9L4/+/P/PV/9gz92QgAAgLojEAEAAB87zc3NWdG1IqkmY6OjOXTocM3/z8xUcvLkqZw8+V4camhoSHt7e7q6ujI8PJxXXt6Zqemp3HvvPbnpprUZPDuY/v6BlJI0NJSzeMnirFx5TQ4e7MuhgUM5dux42traMjX9y3/LqKOjI0NDQ04KAABQV8pWAAAAfOwudMrltLS0pFqtZnLyvbuBfp1qtZrBwaE8//wLefWVXTl16lQmxicyPDyc5uaWdF3TVfPecqmcsbELefYnO/Lyy6/m6Pt/f2hsdCz50CPl1vT2OCEAAEDdcQcRAADwsVOtVjI9PZ1SKWlobEi5XE6lUvk176+mVCqla8WKdF3TldvW35JSqZxly5elVC6lubmp5v3TM9MZHBzM4ODgL/2sN37xZtrbl3wwb936mezetcdJAQAA6oo7iAAAgI+dqanpnD07mFKplIULF9bcAZQkpVIpS5YsyZ13bUl39+osWbIkmzZtyG985tO5fvX1OXrkWAYGBjI0OJTSe5+o+XxlppKJ8Ylf+d2nT52umXvWuIMIAACoPwIRAADwsTMxMZEjh4/k/PnRLFm8OFu2bE5bW1uS9+LQ4sWLc9cnt+TBB+/PnXdtSXv7kty07sY0Nzenv68/O55/IXt27834+Pj7n/nl76h86FFy/9gbr/+iZv7Ko9ucFAAAoK54xBwAAPCxMzMzk+PHj2fXrt25++67smnTxjQ3N+f0qdMplUpZ0bUi69bdlKamppw7dz6TU1OpVKopl8vvxaIbb0zXNV1ZufKaJElra2tWrVqVqekpywUAAD4WBCIAAOBj6fz50fzdSy+nUqnk5pvX5Y4tmzM1+V7gaWxqzOlTp/PG67/Izp27cuHCePbvP5BVq65Nd093WlpaMjJyLm+88WZKpXI6Oztz76fuzquv7PxI3/2Xf/lX+eM/+eoH88ZNG1IqlVL9NXcdAQAAXE0CEQAA8LFUqVRy+vSpvPD836avrz/Lli5Na2trqknGxsZy+tTpHD58JOfPn0+pVMq+va9lcnIy7e3tmZ6ezrGjx3L06LEcPXIkXV1dOXfuXIaHR/Lqq7vy7rt9OXHixAffVa1Wc+rUqXzve3+T0dGxnBsZcQIAAIC6Vtqy+R6/wgYAAHzslcvlNDU1plpNpqamfuXdPKVSKY2N7/0e3dTU1AevlcvlVKvVVCqVj/x9mzZtzPZHH6l57at/8MdOBAAAUB/XSFYAAAAUQaVSycTEZCYnJ//JR71Vq9VMTU19EIf+/rWZmZmLikNJMjQ8lHzoazo6OpwIAACgLghEAAAAs6C/byDVDxWi3t41FgMAANQFgQgAAAAAAKBgBCIAAIBZ8rU//JOa+ZHtD1kKAABQFwQiAAAAAACAghGIAAAAZtHuXXtq5q88us1SAACAOScQAQAAzKL+vv6auaOj3VIAAIA5JxABAADMoh07XqiZu3u6UyqVLAYAAJhTAhEAAMAse+7ZHTXz9u0PWwoAADCnBCIAAIBZ1vehx8xtvH2jpQAAAHNKIAIAAJhle/fsS7VarXlt8+bbLQYAAJgzAhEAAAAAAEDBCEQAAABXwZ7de2vmnjXdlgIAAMwZgQgAAOAqeOLxp2rm+7Z+xlIAAIA5IxABAABcJX0H+2rm+x/YaikAAMCcEIgAAACuksHBoZp5zZoeSwEAAOaEQAQAAHCVPPnE0zXzxk0bUi67LAMAAK4+VyIAAAAAAAAFIxABAABcRd/9zhM18ze/9Q1LAQAArjqBCAAAAAAAoGAEIgAAgKto7559qVarNa9t3ny7xQAAAFeVQAQAAAAAAFAwAhEAAMBV9rU//JOa+ZHtD1kKAABwVQlEAAAAAAAABSMQAQAAzIFdO3fXzI8+ts1SAACAq0YgAgAAAAAAKBiBCAAAYA48+cTTNfOGjRtSLrtEAwAArg5XHwAAAHOkr6+/Zt64aYOlAAAAV4VABAAAMEeee3ZHzbxt+8OWAgAAXBUCEQAAwBzp7xtItVqtea23d43FAAAAs04gAgAAmCPDw8PJhwJRR0eHxQAAALNOIAIAAAAAACgYgQgAAGAOfe2rX6+ZH9n+kKUAAACzTiACAAAAAAAoGIEIAABgju3aubtmfvSxbZYCAADMKoEIAAAAAACgYAQiAACAOdbfP1Azt7e3p1QqWQwAADBrBCIAAIA59vyOF2rm7p5ugQgAAJhVAhEAAAAAAEDBCEQAAAB1oO9gf828cdMGSwEAAGaNQAQAAFAHnnt2R828bfvDlgIAAMwagQgAAAAAAKBgBCIAAIA6MDQ8nFSrNa91dHZYDAAAMCsEIgAAgDow0D+Q6ocC0Q29aywGAACYFQIRAAAAAABAwQhEAAAAdeJrX/16zfzwtocsBQAAmBUCEQAAAAAAQMEIRAAAAAAAAAUjEAEAANSRXTt318yPPrbNUgAAgCtOIAIAAAAAACgYgQgAAKCOPL/jhZq5u7s7pVLJYgAAgCtKIAIAAKgj/f0DNXN7R7tABAAAXHECEQAAAAAAQMEIRAAAAHXmuWd31Mzbtj9sKQAAwBUlEAEAANSZvoP9NfPGTRssBQAAuKIEIgAAAAAAgIIRiAAAAOpMf/9AqtVqzWu9vWssBgAAuGIEIgAAgDozPDycfCgQdXZ2WAwAAHDFCEQAAAAAAAAFIxABAADUoa999es188PbHrIUAADgihGIAAAAAAAACkYgAgAAAAAAKBiBCAAAoE7t2rm7Zn70sW2WAgAAXBECEQAAAAAAQMEIRAAAAHXqySeerpk3bNyQctllHAAAcPlcWQAAAAAAABSMQAQAAAAAAFAwAhEAAEAd++63n6iZv/mtb1gKAABw2QQiAAAAAACAghGIAAAA6tjQ8HBSrda81tHZYTEAAMBlEYgAAADq2ED/QCofCkQ39K6xGAAA4LIIRAAAAAAAAAUjEAEAANS5vXv21szdPd2WAgAAXBaBCAAAoM498fjTNfN9Wz9jKQAAwGURiAAAAAAAAApGIAIAAAAAACgYgQgAAAAAAKBgBCIAAIB5YNfO3TXzo49ttxQAAOCSCUQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAADAPNDfP1Azt7e3p1QqWQwAAHBJBCIAAIB54PkdL9TM3T2rBSIAAOCSCUQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAADAPPHdbz9RM3/zW9+wFAAA4JIIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAMA8MTQ8nFSrNa91dnZYDAAAcNEEIgAAgHlioH8glQ8Fot4bei0GAAC4aAIRAAAAAABAwQhEAAAAAAAABSMQAQAAAAAAFIxABAAAMI88+fhTNfN99/2GpQAAABdNIAIAAJhH9uzZVzN393RbCgAAcNEEIgAAAAAAgIIRiAAAAAAAAApGIAIAAAAAACgYgQgAAGCe2bVzd8386GPbLQUAALgoAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAADAPPPkE0/XzBs2rk9DQ4PFAAAAH5lABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAMxD3/mLx2vm/+PP/jdLAQAAPjKBCAAAAAAAoGAE01+1YQAAIABJREFUIgAAAAAAgIIRiAAAAAAAAApGIAIAAAAAACgYgQgAAGAe2rfvtVQqlZrX7tiy2WIAAICPRCACAAAAAAAoGIEIAAAAAACgYAQiAAAAAACAghGIAAAAAAAACkYgAgAAAAAAKBiBCAAAAAAAoGAEIgAAAAAAgIIRiAAAAAAAAApGIAIAAAAAACgYgQgAAAAAAKBgBCIAAAAAAICCEYgAAADmqT/66tdr5ocf+TeWAgAAfCQCEQAAAAAAQMEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAAAAAABAwQhEAAAAAAAABSMQAQAAAAAAFIxABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAMxjO1/dVTM/+th2SwEAAP5ZAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAAAAAABAwQhEAAAAAAAABSMQAQAAAAAAFIxABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAADMYwP9AzVze3t7SqWSxQAAAL+WQAQAADCPPf/8izVzd8/qlMsu9QAAgF/PVQMAAAAAAEDBCEQAAADz2MOP/Juaed/e1zIzM2MxAADAryUQAQAAAAAAFIxABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAAAAAABAwQhEAAAA81hnZ0fN3NfXZykAAMA/SyACAACYx3pv6K2Zn/3JDksBAAD+WQIRAAAAAABAwQhEAAAAAAAABSMQAQAAAAAAFIxABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAwT33zW9+omZ968j9aCgAA8JEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAAAAAABAwQhEAAAAAAAABSMQAQAAAAAAFIxABAAAAAAAUDACEQAAwDy0YcP6lMu1l3Q7X91lMQAAwEciEAEAAAAAABSMQAQAAAAAAFAwAhEAAAAAAEDBCEQAAAAAAAAFIxABAAAAAAAUjEAEAAAAAABQMAIRAADAPPTAg1tr5iefeNpSAACAj0wgAgAAmIfW9K6pmffu2WcpAADARyYQAQAAAAAAFIxABAAAAAAAUDACEQAAwDzT09NdMw8PD6darVoMAADwkQlEAAAA88x9Wz9TMw/0H0qlUrEYAADgIxOIAAAAAAAACkYgAgAAAAAAKBiBCAAAAAAAoGAEIgAAgHlm8x2318zf+fbjlgIAAFwUgQgAAAAAAKBgBCIAAAAAAICCEYgAAAAAAAAKRiACAACYRzZt2lAzD/QPWAoAAHDRBCIAAIB55JHtD9fMzz//oqUAAAAXTSACAAAAAAAoGIEIAAAAAACgYAQiAAAAAACAghGIAAAA5omNGzekXK69jNv56i6LAQAALppABAAAAAAAUDACEQAAAAAAQMEIRAAAAAAAAAUjEAEAAMwTX3lsW838R1/9uqUAAACXRCACAAAAAAAoGIEIAAAAAACgYAQiAAAAAACAghGIAAAAAAAACkYgAgAAmAce2fZQzbxv72upVCoWAwAAXBKBCAAAAAAAoGAEIgAAAAAAgIIRiAAAAAAAAApGIAIAAJgHNt9xe838nW8/bikAAMAlE4gAAAAAAAAKRiACAAAAAAAoGIEIAAAAAACgYAQiAACAOvfNb32jZn7qiactBQAAuCwCEQAAAAAAQMEIRAAAAAAAAAUjEAEAAAAAABSMQAQAAFDHunu6Uy6Val5792CfxQAAAJdFIAIAAKhjHe3tyYcC0dDgkMUAAACXRSACAAAAAAAoGIEIAACgjt3/wNaa+cnHn7YUAADgsglEAAAAdWxNb0/NvGfPXksBAAAum0AEAAAAAABQMAIRAABAnXpk20M18769+1KpVCwGAAC4bAIRAAAAAABAwQhEAAAAAAAABSMQAQAA1KnNd9xeM3/n209YCgAAcEUIRAAAAAAAAAUjEAEAANShP/vWN2rmp5542lIAAIArRiACAAAAAAAoGIEIAACgzrS3tyelUs1rg4NDFgMAAFwxAhEAAECd6enpTulDgejgwT6LAQAArhiBCAAAAAAAoGAEIgAAgDqzprenZt67Z5+lAAAAV5RABAAAUGfuf2BrzfzE409ZCgAAcEUJRAAAAAAAAAUjEAEAANSRR7Y9VDPv27svlUrFYgAAgCtKIAIAAAAAACgYgQgAAKCO9PauqZmf/ckOSwEAAK44gQgAAKCOdHR21Mx9ff2WAgAAXHECEQAAAAAAQMEIRAAAAHVi2/aHa+bnd7xgKQAAwKwQiAAAAOrEpk0bauaB/gFLAQAAZoVABAAAAAAAUDACEQAAQB3YuHFDSuXaS7SdO3dbDAAAMCsEIgAAAAAAgIIRiAAAAAAAAApGIAIAAKgDX3lsW838R1/9uqUAAACzRiACAAAAAAAoGIEIAABgjj2y7aGaed/efalUKhYDAADMGoEIAAAAAACgYAQiAACAOdbZ2VEz9x3stxQAAGBWCUQAAABzbE3vmpr52Wd3WAoAADCrBCIAAAAAAICCEYgAAADm0KZNG2vm/r4BSwEAAGadQAQAADCHtm1/qGZ+fscLlgIAAMw6gQgAAAAAAKBgBCIAAIA5snHThpTKtZdlu3btthgAAGDWCUQAAAAAAAAFIxABAADMkfsf2FozP/H4U5YCAAD8/+zdd3Bc15nn/e+9twMa3QiNDBAAQSKQYCaYxCDmIMmWZY3kcfZ6duzdnXf23dnZ2vDuW7W771RtbZjd2Z0Ze4LtsT2eHaexbFmSbYkUcxYp5gQwA0QOjdRAo9O97x+gIEKASDCn36eKVei+5557znMPUH359DnngVCCSERERERE5CEpK5s86vXJE6cUFBEREREReSCUIBIREREREXkIvvyVL456ferkKWzbVmBEREREROSBUIJIRERERERERERERETkKaMEkYiIiIiIyEMwv2beqNff/97fKSgiIiIiIvLAKEEkIiIiIiIiIiIiIiLylFGCSERERERE5AH7k//9x6Ne/58f/FBBERERERGRB0oJIhERERERERERERERkaeMEkQiIiIiIiIP0Jy5szEMY9R7x44dV2BEREREROSBUoJIRERERERERERERETkKaMEkYiIiIiIyAP01d/5yqjX//pf/TsFRUREREREHjgliERERERERERERERERJ4yShCJiIiIiIg8IF/+yhdHvT554hS2bSswIiIiIiLywClBJCIiIiIi8oAEg5mjXl+9Wq+giIiIiIjIQ6EEkYiIiIiIyANSNqVs1OudO3YpKCIiIiIi8lAoQSQiIiIiIvIAfOUffWnU69279igoIiIiIiLy0LgUAhERERERkdtn2za2aYFpYFpuLI8b0+UGwwAcnGSSZCKOHY9h2A5TppZh2zamOfw9vfqrDQqiiIiIiIg8NEoQiYiIiIiITFAymcTBwEwLYKX68brcOI6D49gYhjGc/HEYfg8wHAfDHH7/Wz/+OWWTCpg7vZLS4mKOHTuugIqIiIiIyEOjBJGIiIiIiMgtJJIJjBQfVjCIablxud0UTqmgdPocCiaXk5Gbj8vtGXtePEZPeytt9Re5VnuK0xeucOxMLVnBTAgEcAYHMWxbARYRERERkQfOWDD/GUdhEBERERERGct2bGzLBal+PN4UCqdUMnPZaoorZ+BJ8d12fYl4jIZzJzmxazM9Ha3E43GSAwMwFMFw9GgmIiIiIiIPjhJEIiIiIiIi40gkEjhpASxvCpOnzWLxpk+TXVSKaVn3pP7Gi+c48u6btFy9hAHYvb2QiGMYhoIvIiIiIiL3nRJEIiIiIiIiH2GbJk6qn4zcfJY8/1uUzZiL25tyX65Vd3gvR7b9it5QF0QGIRJRkkhERERERO477UH0EGTnZFM+dSrFJZNIT0vDl5pKIpEgFApxvu4CZ86cxdY65CIiIiIiD5zjONgeN6T6qZi9gGdeeIWM3IL7es1pi1aQXVTKvjd/QsuVC9imgTE4BOi7fCIiIiIicv9oBtFD8MkXP0FZ2eSPPd7c3Mxbb/yaeCKuYImIiIiIPEBJy8T2pTJv1Sbmr3meQEbwgV07OjjAvjd/zIVjh0gMDWEOhDWTSERERERE7pt7MoOosKiAL3zplZHXP/7hz2luah1VZvacGWx8bg0A3/yzvyEajY4cM02TGTOnUTWtgrz8HFJSUkgkEvT29FFff41jR07S3x8eVd/iJTU8u2rpqPds2yY6FKOrK8SVKw2cOnGGSGRoVJlXPvMiZVNKb9mn83WXeOuNdz7sY2E+8xfMobCogEDAj2maRKNRujpDnK+7yPFjp3Fuc1PZttY2mpqbiQ5Fyc/PY2r5VACKiopYsKiGgwfe0wgVEREREXlAkqZBMsXH4vUvMmflBlLTMh7o9b2pflZ/5newLBfnDu/DdhwYHEQ5IhERERERuR/uyxJzy1Ys4mc/efNjv+3W1dVFIBAAIC0twEu/9QL5+bmjyliWh7z8HPLyc5g7bxa/fmszly/V3/S6pmniS02hOLWI4pIiFiycy29+9S71V6/ddh/CA/0jP1fPqOL5T6wf6c/QUBQ7mSTVn0pxyfC1MoLp7Ny2b0J1X750mX1799Pd3T3q/eXLlzG/Zh4A5eVTlSASEREREXlAkkDCm8KC1c8xa8W6B54cGnmmsSyWf/rzDPb3Ul97GgNwIoO6QSIiIiIics/dlwTR5MmlFBXn0tLUedNylmXx8iufJDcvG4BTJ0+zdet22tu7SPV5mTt/Li+8sAmPx82nPv0Cf/PtvyXcHxlTz3e+/T2aGztIJBKk+t0sWrKItWtXk5rq46WXn+dvv/cj+nrDY8779//2j0hJGX+jWbfbjdfrBWDVmuUYhkFfbx9//ud/ic+bBkB/uJvf/fpXKSkpZsGCeWzdsg2XdeuNa8+ePTfu+w0N9SMJopTr1xYRERERkfvLtpNEXW6q5ixg9vK1D3RZuXEf0txeln/6C4R/8E16OjtIxmOQSOhGiYiIiIjIPWXe6wrPn78IwCdffIGhoZt/023mrOkjyaH33z/K5rd3YicscrLySPVlcKH2Kv/w09cBsCyTTc+vH7eeaDRGIBAgMzMTj9vPiaNn+dlPfwEMJ3qeXfnMuOf5/X4CgcC4/z5IDrk9bvz+1Ot9uzCSHAJICwTZumU3r//iLb75jb8i3D90V7ELBD6su7evT6NTREREROQBGAIyC4pYsO4TZOQWPBJtSs/KZfFzr5DqD2CmZdz2ctYiIiIiIiK3cs8TRFvf3UY0GiM9PY3Fzyy4admqaRUjP7/+81+OuyRdS1M7TY0tAJSVlRLqbp9QOxqvtdLS3Hb9OlX09/ffUX/isTiDg8OzlubMnU1BUTYu14cTr0Jd3Vy+2EA0YpOZmXnHcbMsi/nz5428rqs9r9EpIiIiInKfJe0kjsfDwvWfJKeo9JFq2+TqOZRVz8FwWTget26WiIiIiIjcU/c8QRQOD7B/3/DeOWvWrsLh45dCyM7JAqCnu4esYP7Hlmtuah35ubCocMJtaWkePs80DXz+sQ9UAwMDJJJxvF7vmH832r/3EAAej4cvfulz/P6/+Bqf/+IrrFm7gorKqbjdd/+wtnLls2RlD8ejo6ODM2fOanSKiIiIiNxnEdumfM4CSqpm4vamPHLtm71qE2mZWbgzs+Bj9ngVERERERG5E/d8DyLDgEMHjzC/Zg6ZmRlsfG4d776za9yynuvfghuMRG5a59DQh0u3paR4iU5wk9ZoNDbys9frIRkfffy//vF/+thz//Ib3yUSGb7uieOn6eoKMb9mFuUV5bhcFkWTCiiaVEDNwrkMDUXZv/c9jh09dUcxW7hwATNnzQBgIDzA279+B9u2NTpFRERERO4jxzDA62XG4pVk5OQ/km1Mz8pl6uwFnNy/HdvtxozFdONEREREROSecN2XSl1u9uw6yIsvbWLOnFkc3H+I8VbMjkajeDxuAoHATetL8X34Tb6+vr4JN9uX6ht1nt838SXgmpqaycrKGnndeK2ZxmvNRCIRAuleysrKqK6eRtW0KlJSvKxdv5LGpkY62rpvK1azZs/kmaVLAIhEIrzxxlv03eFyeCIiIiIiMnGReIzS2QvIKpiEaVmPbDurFi7nwrGD2P4odiykGyciIiIiIveE635VfL7uIg31MymdXMyLL73AoYNHx5TpaO8iLS1Aenoa/QPdpPmD49ZVUjIJANt2CHX2EwwGJ9SG0snFAMRicSxj7HIR//mP/ic+n2/cc29MDt3I5/ORjMOlCw1cutBAamA3/+z3voZhwKzZM9jRtm/iD3rTKlm1auXww2kkwuuvv0EopAc+EREREZEHIWEYVMxdSHp27iPdzszcfLLyJzEUiZA0TAxHqw2IiIiIiMjdM+9n5bt27se2HQoLC6ismjLmeO25CyM/v/LKp8ddVq2yaioFhXkAnDl9ZsLJodlzZ5CVNTxj6PixE6Sk3Nl64tOrK3np5Rf4na99kZycsUmjyECMeHx47bq0tLQJ11s2pYz169dhGAZDkQi/fP1NQl1KDomIiIiIPAi24+D2+cgqKMKT4nvk21s+dyFejwc+sl+qiIiIiIjInXLdz8rb2zo4dfIMc+fNYsbM6jHHa8+dZ978WRRNKmD2nFl4vV5OHD9Lb28fXq+XisqpLF5SAwzvJ/Taa6+THhibpPF6PQyE+wj4A/j9fqpnVrFo8fB5kcgQv/zlm2QHx64pPjAwgNvjwp86/hJ30WiURCJJReVwcuuFFzew+Z1t9PWEiScSpKcFmL9g7sheSlcuX5lQXIqLJ/HccxsxTZOhSITXX3+Trq4ujUYRERERkQckatsUV84gkJn1WLS3qLwa0+XGSknBHoroBoqIiIiIyF1z3e8L7N93mOnTK/GmjP2mm+M4vPHLt/nEixsoLS2malolVdMqx5Tr6+vnu3/z/XGTQwBf/yf/eNz3+/r6+c63vjtucgjgv/7xf7pp2//0T/6Kixcuc/TISWoWzCE3N4cvffmz45Ztbm5h/973bzmLyDRNXvjkC7hcw6EfikZZtnzZmHLJZILf/PptjVARERERkfsgaicoLKvEF0h/LNobCGZhWhaW14sWmBMRERERkXvhvieIBgcGOXDgfVavWf6xx1/7hzeZNCmPmbNnUFpagt+fyuBghM7OLo4dPc7xo2fIyMiY8PU6O7s4duw47x86SWZm5h23PRwewOfzsWPbHs6draV6ZhVlkyeTnpGOaZoMDgzQ3NLKieMnuFBXP6El5kzTwON2j7zOzMwct40fLFsnIiIiIiL3gQGZefl4U/2PTZPTs3MZCPdh2zamaeoeioiIiIjI3T0WLZj/jKMwPDgul8U/+71/esty8Xicb/31dxQwEREREZH7oDs2xGf+5X+kYv6Sx6bN+9/4EeeOHCTS2IDLsnQTRURERETkrrgUggcrkUjyzW/8pQIhIiIiIvIQmZYFxuPV5tT0TCzLwrAM3UAREREREbn75yKFQEREREREnjaGafK4ZYhcHu/w0nKGHuNEREREROTu6clCRERERESePo/hQtvOB212NINIRERERETunhJEIiIiIiLy1LFt+7FrcyI2hG074Ni6gSIiIiIicteUIBIRERERkaeO49jYto3jPD5TiQb7ekgmEzi2EkQiIiIiInL3lCASEREREZGnjgFEIwMkE/HHps09HW0YjoNpaIk5ERERERG5e0oQiYiIiIjIU/ggZNDd1szQQPixaK9j2/R2toPjYJp6jBMRERERkXvxXCQiIiIiIvKU8ZgWLVcuEOnvfSzaG+7twXYcYpGIbp6IiIiIiNwTShCJiIiIiMhTx20YNF8+T7i3+7HYh+ha3SlwbJzYkG6eiIiIiIjcE0oQiYiIiIjIU8cyTaLhfkItTcSjj3bSxXEc6s8eJzIUgWhUN09ERERERO4JJYhEREREROSp5DFNLpw4RG9H6yPdzv7uTjqa6nEiQxiPwWwnERERERF5PChBJCIiIiIiT6UUl5eG2tN0NNaTiMcf2Xae2b+DWCxGYiCsmyYiIiIiIveMEkQiIiIiIvKUPgw5uB2oO3rgkZ1FFO4Jcfnk+yTicaxkUjdNRERERETu4TORiIiIiIjIU8rj2Fw4epDGC2cfub2IEvEYR7f9isFwP3ZPCMexdcNEREREROSeUYJIRERERESeWpblwu3AsR3v0HLlPMlk4pFpW9OFc1w8fgiSCcykkkMiIiIiInJvKUEkIiIiIiJPNZ/loquxnhN7ttLT3orjOA+9TX1dHRx6++cMRSIkQt26SSIiIiIics+5FIKHo6KinKJJRUwqKiQrOxvDMAD4+Wuv09LSogCJiIiIiDxAKcCZA7vIyM5lwbpP4s8IPrS2hHtCHHr7NbraWzGHhgBHN0hERERERO45JYgeRtBdFs89v0mBEBERERF5RFiGQSpwaPMb+FIDzFq+jpRA2sgXuR6UcE+I49t/zaXTxyE6NPxPRERERETkPlCC6CGzkzaO42C5LAVDREREROQh8lgu7ESCPW/8hHg0yqwVawkEszHN8T+rO46DnUxiGAamdfef5/u6Ojix823OHNqLE41iDAzopoiIiIiIyH1zzxJEhUUFfOFLr0yo7A++92M6O0MsXlLDs6uWjjpm2zbRoRhdXSGuXGng1IkzRCLjf2vO709l7rxZTJ5SSlYwE7fHzdDQEP19YS5dvMLpU+cIh0c/VL3ymRcpm1J6yzaer7vEW2+882H/CvOZv2AOhUUFBAJ+TNMkGo3S1RnifN1Fjh87PeG1ym3b4ciRozQ1NtHS3MqnX/4U+QX5Go0iIiIiIg9ZimkRi8fY8+aPCff3MGfFenImleJye0aVcxyb9qsXqTuwnYy8QqYvXYvXn3ZH10zEY/S0t3B061tcPHUMK5GAQSWHRERERETk/nooM4iuNlwmkJo57jHTNPGlplCcWkRxSRELFs7lV2++w7WG5lHlyivKeP4TG/B6Rz+o+f2p+P2pFBTmMb9mNq//4le0tnTcdhvDA/0jP1fPqOL5T6wfWV5iaCiKnUyS6k+luGS4nRnBdHZu2zehum3b5sD+gxp9IiIiIiKPII9hYtgOx3e8Q097CzWrn6ewfBq+tHRM07qeHLrA8S2v01R3Ck9KKslEgpnPbrztJFG4J0TzxVoOb36djtZmfBg4QxHdBBERERERue/uS4Jox/Zd7N758QkQvz9jzHvf+fb3aG7sIJFIkOp3s2jJItauXU1qqo+XX/kk3/7r7zMUiQOQm5fDJz/1HC6XhW07bNu2nfcPH6W3N0xGRoDFSxayZs1qUv2pfPq3Psmf/a+/wOtNHXPNf/9v/4iUlJRx2+h2u/F6vQCsWrMcwzDo6+3jz//8L/F5hx/6+sPd/O7Xv0pJSTELFsxj65ZtuKwUjSoRERERkcec27JwOQ7NdWdoa7jCrGdWUTF/CcH8Ivo72zi57Q2a6k6BA7HIIGd3D68+MJEkkeM4DPb10NfVTt37+zixdzuGYeCNxnAcW8EXEREREZEH4r4kiBKJBIFA4LbOiUZjo845cfQsPaFeXv3tT+N2u1m3fhW/fmsrAMuWL8J1fc+eH/79j2lv7cbjSiU3ezgJdPzIWcL9g9TUzOXM2XOEB/vHTRD5/X58Pt/NHww9bvz+4XPPn78wkhwCSAsE2bplN5nBNJqaGgn3D5GZqQSRiIiIiMiTwDAMUmyH+ECY9999i9ojB5gyfSYDna10N9djGCYYBgYfSRKt3IQ3dfTzkGPbDA0OMNjXw0BfN1fPHOf0vu1EIoP4LTdGPKrkkIiIiIiIPFCuR7lx9VebaG1tp6Agj6ppVfzkR6+Tnp5O2ZTJAHR0dNLWEhpZ+u1GF89f5eL5qwBkB+98f594LM7gYITUVB9z5s6mru4Cne29JBIJAEJd3YS6ugHIzMzUiBIRERERecK4MXCbFvHebuoObMdwkrhcblxuD6blAtMADGJDEc7ufgcHKK9Zhm3bJGJR4tEhYkMRuloauXTiMNcu1WIZFj6vl4Dt4CSGcBRmERERERF5wO5LgsjlchEOh8nOzh5zLJlMjiRXJqK5sYWCgjxM08Dnd+H1ukdmD126dHnc5NBEDQwM4Pa48KeOne0UjUZHft6/9xDrN67C4/HwxS99jkQiSXtbB60tbVy71kz91WvE43GNJhERERGRJ5jbAQcXSccgkYRIfDix43J7SAkE8PkDuHypXLtYR/O1q0TC/fR3ddHb1U4sFsXt9uC2LAKOCckEJBJKDImIiIiIyENzXxJEa9auYs3aVeMeO37sNNve3TXhuqLR2MjPXq8Xw/zwWGRw9OathmHwL/7wn46p41pDI7947Vdj3v+vf/yfPva6f/mN7xKJDAFw4vhpurpCzK+ZRXlFOS6XRdGkAoomFVCzcC5DQ1H2732PY0dPaUSJiIiIiDzBDMPEZQ0/lHgsMF0ubAPi4QGi4YHrpa5iGGAApmHgt1ykpfiwE3GcWFJBFBERERGRR8IDX2Kur6/ntsr7Un03nNtHqu/Ddbn9Af9HHtaMkdlFN4pGh267nU1NzWRlZY28brzWTOO1ZiKRCIF0L2VlZVRXT6NqWhUpKV7Wrl9JY1MjHW3dGlUiIiIiclscx8Z2EjjYgIOjaSWPjxvyPeY4h20gpig9doY9ZrWpAAAgAElEQVQXqjAwMDEN1/B+UyIiIiIiT5j7kiB6d8s2Th6vvSd1lU4uBiAWi2MZKdhJg1gsjsfjpqqygl3bD2Cawx/WbdvmT/74L0bOffWzLzJ5cunH1v2f/+h/4vP5xj12Y3LoRj6fj2QcLl1o4NKFBlIDu/lnv/c1DANmzZ7BjrZ9GlUiIiIiMmG2kxxODjm2giHyiBhO0jo4JHFwsHBhGJYCIyIiIiJPlEf6a1Cz584gKysTgOPHTpCSkoJlWVy6eAWAzGAmZVOLPvb81FT/XbdhenUlL738Ar/ztS+SkzM2aRQZiI3sP5SWlqYRJSIiIiIT5jg2jpJDIo/87+nwDD9N7RMRERGRJ8t9mUHkcrkIh8NkZ2ePezyZTJBIjF572+v10NXRQU5ONn6/n+qZVSxaXANAJDLEL3/5JtnBfAAO7j9MecUUPB43r37mFfbvO8jVK40MhAfweD3k5+cye84McnOHr/9BAuejBgYGcHtc+FMD4x6PRqMkEkkqKqcA8MKLG9j8zjb6esLEEwnS0wLMXzAXj8cNwJXLVyYUn5zcHJYuXTryOhjMHPl5xYplDF3fd6mjvZ2DB9/TKBURERF5QjnYOFpPTuTR/111nOFpRcNrz4mIiIiIPBGMBfOfuSdPpIVFBXzhS69MqOzh946ye9cBFi+p4dlVS29atq+vn+9867vguEe9Xzq5mE+8uJHUVN9Nzz9x/BQ//9lbZGRkAPDKZ16kbErphNr5p3/yVySTNmvWPUvNgjk3Ldvc3MK3/+oHE5pFVFJSzEuf/tQty9XXX+OtN9/SKBURERF5QtlOAtuJa88hkUf9wdkAy/BqLyIREREReaK4HsZFe3q7b3p8cGCQzs4ujh07zvuHTpKZmTmmTEN9I3/z7b+jsmoK06unkZ+fh9frJTI4SGdXiAsXLvDewcP4vGkjyaHbFQ4P4PP52LFtD+fO1lI9s4qyyZNJz0jHNE0GBwZobmnlxPETXKir1xJzIiIiInKbHCWHRB6H31QH0OQhEREREXnC3LMZRCIiIiIicntsJ07STigQIo8Bl6kZRCIiIiLyZNGnWxERERERERERERERkaeMSyEQEREREXl6GQZ8uHaWlryTBzTurg8+A3AAHAcNPRERERGRB0sJIhERERGRp1hKSgoZ6RmYlklvbx8DAwMKitx3hmGQkZlBWloa4fAAfb29JJJJBUZERERE5AGyigqL/z+FQURERETkwXOwcRz7obZh2rRpvPzyyyxatJienh5aWlqeyFgbgGVZuFzD35FzNFXqoXJ7PKxfv57Pf/5zeL0eGq81MjQ0NPH7aRi4XC5M07zt2Ud3eq5puDAMQzdPRERERJ4YmkEkIiIiIvIU+HARuftz3u3UP17ZiZx/p30AsFwu1q1bR0V5Odt3bOfcudp7Wv/D6uud1H+7/bzTPny07L0cR1lZQTZs2IjLZbF16zZaW1snfE/u5lwRERERkSeJEkQiIiIiIo84y7KwLJNkMomdtMEA0xx+z3Eckkkb27YxDQPTsjAMg2QyiWPbGKaJZVmY5vB/vdu2M3LsZgmE4fPM4VkW18+zk0lsxx7Zp8gwDEzTxDJNjJFy9qhyN7bJtu3hmTwuC2CkP4Zp4nJZwHC7k8nkyAwfwzBG2mEYBo4DdjJJ0rYnPAvIMk2KioooKSnGl+rD4/bgclkkkzbgYBompmWN9NWxbZL2cExvZ6aReT1mH8RruO0WOA6J6zE3LRPLcgEOiUQS206OiufN+npvYmnhsm44x3FGxkYyOTyjzTItTMscHmuAaVnYdpJkIolhGpimdb2N4NjOqFh90EbTMEheb6PL7cLAIPFBe2x73Ni53W5M0xge04nkcJ3jjE3TsigpKaW4uJhQqAuPx329z8P12LZ9vS/DfbPM678T19vz0XMtyyKp5e1ERERE5CmkBJGIiIiIyCNu5cqVLFu2lJ07d7F//37cbjdrVq9myTPP0NbWxs6dO6mrq6OgsIBVq1aTnZ3Nzp07OX++jhkzZrJ8+TImTSrGMAwaG6+xe/ceas+dI55IjHs9y+Wiuno6y5evoLS0BAODltZWDh48yKlTpxiKRMAwyMzIoGbBAhYuXEB2dja27dDS0sLBgwc4deo0kcggefn5rF69iuLiEo4cOUJKipdnnlmKy7I4dPgwR44coaqykhXPrsDn83Hi+Al27tpFW1srYJCXl8eSJYuZO3cuaWnphMP9HD16jIMHD9LV1TWhBM68+fNZu3YteXm5gMErr77KlCNT2L1nD5HBQWbNmsUzS5+hqGgSlmXRHQpx5OhRjhx5n1Coe8JJomlVVaxctYpkMsn58+cpKS5m9uzZDEYi7Ny5kytXLrN48WJq5teQSCbZv28f+w8coL+/H8O4dV/vLpZQUFDAypUrmTNnDqZpUnvuHC2trUyZUobjOOzcuYuGhgbWrFnN4kWLOXL0KI5js2TxEvbu28f2bdsoKS1h2bLlVFZWkpLipbe3jyNHjnD48GF6erpH2jh5chknTpzA5XKxaNFCUlP9XL16lW3btnH+/PnR4820mDN3DvPmzSc/P5/29nZ27drJyZOniMfjo8qalsXq1atZs2YNPp+P3NxcSktLOXToMIZhMH/+PM6eOcuu3bsJhUJMKZvCylWryM3NZf/+/fj9qTz77MpR527buo09e/fqD42IiIiIPHW0B5GIiIiIyEMy0T2IKioqmDFjBt3d3VxraCAzGGR+TQ2lpaWkpqbS1tpKw7UGysrKmD9vPkPRKLW1tVRUVPL888+TmprKqVOnaW5qoqS4hKlTp9LX10t7exu5uXlUVFTgdru5ePEira2tLFhQwwvPv0BeXi4NDdfo6GinoKCA6unVRKNRGpuaSE9LY8PGjay6nhC5ePESfX29FBUWUl09g2QiQWNTE/7UVKqrqykpLSUvN5eAP4BhGKSlpZFfUMCUKVMIBoMYhonPl0pxcTGRyCANDQ1kZQXZuHETS5YsoaW5hePHj+FyuZg7dx4ul4umpiai0egt45eenk5BQQEZGRkkEgna29u4fPkKnZ0dLF68mOeef55AIEB9fT0tLS0Eg0FmzpxJSoqX5uYWIpHIhO5nQUEBM2ZUk5OdTUFBAf7UVGzHISsrSDCYRVVVFcFgFnYySUZGBnl5efT09tLc1ERObs4t++r1eO44lineFFavWcOypUsZGopw+fIVvB4PVVWV5OXmkYgnuHzlMqFQN9OnVzN9+jQy0tPJzc3D7fFw8eIFbMdhw4YNzKiupr6hgQsXLhIMBpk1aybxeJxr166Rev1+l5WVUVBQgGFAe1s7Hreb4uJivF4v7W1tDAwOUllZSVlZGT6fj2AwCxie5ZSfn0dGRiahUBednV2jYmyaBpmZQSYVFREIBAiH+2ltbaW2tpa2tlYKC4uYNm0aA4MD9PT0sHTZUubPr6Gutpb33z+M2+0Zc+7FS5doa2u75f3VHkQiIiIi8qTRDCIRERERkUdcKNRFd3c3wcxMMjIzyMrKJisri7a2NkzTJCs7m1RfKpmZQTIyMqhvqMcwDCorK8nMzGT79u1s37aNpJ0kPDDA2rVrmTq1nEsXL425VlYwSFVVFTm5Oezfv58tW95lIBxmzdo1rHx2JYWFhWRnZTG5bDLTpk2jv7+fze9s5tDhw7hcLjasX8/qNWuYNm0aV65eZSAcxnEcvF4P7R3tbN68BcdxeO65TcycOZNQKMSunTtpaGjgueefZ9mypWRnZ5ORnkFlRSXl5eVcu9bI9h07OH/+PFOnTGHjRouKinIuXrzAyZOnbhm/uro6/H4/2dlZxKIxtm/fzpkzZ6moKGfa9Ol4vR62bdvOjh3bicViLF68mI0bNjJt2nSuXL7C4VBoQnvtOI6D40BGZgYn95zi3Xffpbx8Khs3bqKoqJDjx0+wZctm3G4Pzz33HNXV08nOysK0rAn1taO9445imZ6eQU5ODiUlJSQSCfbs2cvOnTvxeNxs2vQcz65YMdw/54M+DPfWclls37GDQ4cOkUgkmDJlCm1trbQ0N3Pq9GmaGhtZt34d69atJz8/j2AwCI4DjoNlmVy9eoXNmzfT1NTMokWL2LhxI5NLSykuKaG9o2Mkbslkkt27d3Ps2DHmzp3Dpo2byMvNJScnB6gbFeNk0ubIkSMEg0E2btxAfX0DmzdvpqWlBcMwyApmkZ2dxdy585g8uYzSkhIa6us5cvQI7e0dtLd3jHuuiIiIiMjTSAkiEREREZFHXFdXiO7uEBmZmWRkZJKTk0Oqz0dtXR2BQIDs7Gzy8vMIZmbi9Xrp6urC5XIRzMwkmUxSUJDP0mXLACgsLCCZSBAMBsnIzBxzrazsbILBLGKxOB0dnfT29gKwdetWtm7dOrKvUM2CBQSDQerqamlta8VxHOLxOK2trYRCIbKys8kKBhkIh0fqbm5uobW1Fa/XM1JvS0sLrW1t9IfD9Pb0kEgk8Hq9pPh8BLOGE15DkQjTpk2jsLAQvz8VX+rwjJPMzOBdxTUnO4fs7Cy6ukK0trYSjcYAaG1tpb29nWnTpxPMCmKY5rj75nyc/v4wrS0thMNhurt76O3tobCwkNbWFtra2gkEAvT29gDg8Xrx+VIm1NeO9o47iqXX6yUQCBAI+AmFQnR0dJBIJEgkErS1tdHX3z9uPxrqG7h69QqJ60sR1tdfpa21lazsbLKzsykoyCc/vwDHtknxppDi9Y6a0dXS0jIyM6ejo53Ozk4qKyvx+/2jrtPY2EhTUxOO49DZ2UVXKERuXh5er/e27qfjOJw4eYKc3FxWPvsskyeX0tvbx/tHjnBxnGSoiIiIiMjTTgkiEREREZFHXKiri1AoRFHRJIoKC8kMBhmMDHLp4kVKJ5dSVVXFlClTycjMINQdIhQK4XK5cLlduF0upk+vpqqqalSdlmXicbvHXMvjduNxu4nH46P3f7k+hcYAXG4XHo8H0zSIxWLEYh+Wi8VjxOMx/H4/rhvqdxyIx2LE43EsyyKRSA6Xj8WIx4YTM4lkkmQiCRi4XBYulxvDgLz8fLJzskftBeQ4Di6XdVdxdXs8uN0ewuEwsettGG5TnFg8jmGA2+3G5XKNOn4rsViM2PXYJZMJkskkyWSCeCyObdvD713vv2GAy3LdVl9vN5YG4Lne16GhvpGED0A8HifxkX1+PhAZGmJoaGiknaWlk9mwfj2VVVUkk0mGhoZwu924Pe7hgXHD6mvJpE0sFh/pZywWv95eE6/HM6o/0WiU2PXEUjKRIJlIYBjGHS3nNjQUpf7qVdqmVVFcXEzjtWs0Nl6b0FKOIiIiIiJPGyWIREREREQecf3hMF1dITweN1OnTsVyWXR2dtHS0oI/EMDr8VI+dSopKSnDMzC6Qvh8KcTjCSJDQ7zzztvs2bNnZPaPYRg4DC8lNmPGzFHXil+fWeJ2u3C73cP/Se84uN1uLNdwMsJO2sTjcWzbwe324L6eCLoxERG7nsC4U4lEkkQ8juPAmdOn2bxleKmyGxMHNyZRJsRhVBIjHh9uo8ftweN2Y1wv4vF48Hjcw7OiYnGSNyRU7odEMjGhvhbk59/xNYYTeTE8Hi8ej4fhag1SUlLweL0MDA6ODZdj49jDMfZ4PMydO5fqGTOoq61l85bNNNQ3sHr1ajZs2DA6sAwnIN1uN6ZlkUwmh2PqduM4w/1NJieesDFNA8MwAQfbdj72vhtARkYGM2bOID8/n0Q8QXlFBTMaG+no6GBoKKo/JiIiIiIiN37WVghERERERB59oa4uurt7KCktZVLRJLq6Ogl1h+jq6iQ8EKZ0cil5+fl0dXXR3R2ip6eH3p4eUrxegsGs4Rk9lkVJaQk1C2ooLi7B7faMuU5PTzfdPT243R5yc3PIyMjA5XaxctVK/sN/+I989atfpWjSJLq6uujp6SE3N5e8vDxcLguP10tBfgFZwSBd12c93SnHtunp6aG/r49gVpCsrCxclkVGRgbV1dOZNXMmmeMskffxFTo4gGkOz0yyTJNQaHi2VTAri7y8PDze4ZkthYUF5Obm0R3qpisUImnf39kntu3c276Oo6+vj/7+frKysigsLMTnSyUzM5OioiICgcAtz/f5UklN9WEYEOoO0dPTSzAri/yCArwpKZimgWlacMOsn7y8PPKvj4283FyysrPp7++nv6//tpJ7kyeXsW7dWhYtWkxWVtYN9xQsy8LtdmGaJi63i5qa+cyaNYvzdefZuXMnXV1d1NTUMHPmTMwbZySNOtfQHxgREREReSrd9Qyi0rIC+vp7Ri1TcDOpqX4MXLS1dCn6IiIiIiIT1Hk94VJYWEB///CMomg0dj0R001paSlDQ1G6uroIhwcYGBjk8pXLTC0vp6amhkDAT19fP9OmVREMBnn33a00NzeNuU5HRyeXL11i6tQpzJ07D78/QGRwkPKKCgCuXLlCa2sr4XCYKVPKWLhwIWvXrmVyaSmpfj8V5eXEE3Hq6mq5du0a2Tf+h/5tunzlMpevXGH27NmsXbuOkpJS8vLyqKys4MqVK/T29tLd3T2huqKx4WXMCosKeeaZZ3C5XDQ0NHDhwnmKigpZumwZObm5xGIxKisrycjI4L333uPSpQezd81E+no7y9x91LVr12hoqKe0tJQVK5ZTWVmBYRikpaURiURueX4kMkg4PEAyaTN9ejUejwe/309WVhZ9fX3k5uYxraqK+oYGABzbobKyEsuy6O3tpby8nJzsbM6cPUNjU9NttX3y5FJWrVpNS3MznZ2ddHV1EY0OEY1GKSubzJo1azl8+BA+n4/582uIxWKcOHmCEydOkkgmWbN6NTU1C+js6KThWsOYc/fu3fvA7rOIiIiIyKPkrhJEpWUFYNhs2LARn883oXNaWpo5cOAAaemp9PcN6g6IiIiIiNzqQ7tlEe7vp7e3F4Du7hD9/f14PR7C4QH6envBge7u7pH3AU6ePIXLcrF8xQoWLlgIBoRC3Wzftp0Tx49jmRZul2tkKTOXy4XH7ebkqZOYpsmKZ59lzpw5GAb09PSwc+dOjh49igEMhMPs2b2HRCLBvHnzWPHsChwHOjs72b5jB8ePH8cA3B43pjm8cIHlsvB6PHg8HixreA8ay7Jwezx4P3jPANMcXp6so7mDvXv2YBoG06unU1Y2mXgszoULF9i9azetra0jfb2VttY26hvqyS8oYPr0acRjMTo6Ojh06BCJRJJlS5eyaOFwjPr7w+zdu5f33nuPwYGBCV/jo7H0eoaX3/to/91uD+b1/pumhcfjJtQVumVf8wvy7yiWhmkSiUTYv/8AYDBr1iz8fj+nT50mkUwyf/58YHiGlWWZWJY50rYP6sOBc+fOUlhYQFXVNObPn09d3Xk2b95MeXk5ixcvYdbs2SSSyZF4nDt3jkQiwaJFi/B6vVy71sDhQ4dpb2/H43GPe51R48WyrvfFhWGAYZq43cNxbWxspL6hnurqGcydMwfLsnC5XBQVFbF/3z4uXryEAdTV1lJSUsL06dPp7Oygr69vzLlNTU1KEImIiIjIU8lYMP8Z505PzsxOYcOGjZSWltzWee++u5WO9k5amjSLSERERESeXrYTJ2nfeib+pk0bWbt2rQL20YcZDAzTHL102LhxdrBtm+Edhh69a9xvWzZvYfuOHdeXVHNj20lisTgGsGLFCtZv2EBbWyub39lMeUX5HY+1LVu2cPbsOZ7btInZc+bw9m9+w9Zt29i0cSNr1q55ZMfRls1b2LZ9+y3LuUzv9b2QRERERESeDHc1gyiRSEx45tCNvF7v7W8oKyIiIiLylNq8eQubN29RID4iGAxSXV1NQUH+TctdvXqV2tpaBgcjj+Q1HkysMlm3bj2LFi3kxPETbN+xHbfbQ0VlBX6/n/b2DrpCXVzcfOmuxlphQcGo147j8M7mzbyzebMGrIiIiIjII8alEIiIiIiIyOOop6eHQ4cOYZk3n9WRtJMkb1j67FG7xoOJVS+XLl1iypQy5tfUML+mBnBwHIcL589z7OhRurt77vo6DmDbNslkAltfChQREREReaQpQSQiIiIiIo8lx3FIJBIkHvNrPKhYHTt2jKtXrzJlShnZ2TkkEnGam1tobLzGwMC92R823N/P0WPHaGxs5NLlyxqkIiIiIiKPMCWIREREREREnhLd3d10d3fft/rDAwOcOnVKgRYREREReQw8kgmivLwcvvzVz/Jn/+uvSSQ+XKYhM5jB7379S/zlN75LJDLEqjXLSQsE+NVbm0fO+aC8bdt0dnRx8sQZzpyuHanjpZdfoLxiyrh7IH3nr39AODwwbpssy2L5iiVUTSvH4/Hg4NDS0sau7fvo7u7hn//B13G73QCYpoHjOHxwiXc37+D0qXMAfP5Lr5KWFuD73/0h8Vh8pP6ahXNZs3YFP//ZW1y90jDy/uTJJaxet4IffO/HAPzzP/g6LpdruP2OQ29vH+frLnHo0FES8dHfayybUsonP7WJI4ePc2D/4VHH/vkffJ1fvbl51LVulF+Qy9Jli5hUXITL5aK/P0ztuQscOvj+SIxfevkFikuK+Nvv/mjUNw6XP7uEFK+XbVt3j9wX2x4b79OnzvLu5p03HQOHDh5lz+4DI++bpskf/uvfGxkDANOmV1CzYC65edk4DoS6ujl+7NSY+z61vOz6xsEwOBih/uo1Du4/TF9f/6h7MF5b3/7NVmrPnr/lOBAREREREREREREReRw80jOIBgcjeDyeMe+3traRkZEBwFBsaNSx//Hf/hyPx0Nvby+lZYW8+pnfIiMzwP6974+U2bF9F8eOnL6ttjyzdCFFxQV88xvfIsWbSjQaZdmKhbz40iZ+8P2f8M0/+85I2d/9+hf56U9fI9wXHVVHbl4OhgFnTp8lOyed1uauUcd7enpZtWYZdbUX8Hq9H9uWv/qLbxEdshkaGsLrs3j11Zd55dUX+dlP3xhJgADMnTeLHdv2sGjJfPbvO4RhGKPq6Qp1jlv/pEmFvPLbn+LQwff58Y/+AdNw4/I4fO5zv81Lv/UCP/+Ht0bKRgYjLFw8h107Do6qI54Ynaz67//lT8e9l7e6/7NmT+fAgYMk4s64Y2DhovksWbqALZu38/7hY1imReGkHD73+d8mxefhyOGTI+fs3LGbY0dOk0gkiET7ef75TXzxK5/h73/wU/r7hxODTU0t/OSHv7jpOJhUXMi3//r7mKZrZBx88qWN/N33fzomxiIiIiIiIiIiIiIijyLzSe1YRkYGvd2DvP7aWyx5ZjGxxN2tqZ2dk8WF85dI8aYC4PV6OXbkDN//3v8ZU9ZyufH5Use8P3feTM6cqmX37j2sXrNqzPGG+kaamppZuHjOhNqUkpKC4bj52T+8QVp6gOKS/JFjgYCf/PxcTp44Q0dHJ4H0lAn3ddWa5Zw8cZqDB47gSwng9XqxjBRe+9mb5OXlkpcfHCm7b99ByivK8fnd9/wexmJxfvObzXzqpRfGnfHldrtYvmIxv/nVFi7UXSYjPYNAIEB/7xBvvfE2K1etoD88dvkMl8tFmj/I3t2HqL/awIJFc29rHFy5chXTdI2Mg+NHz/LjH76mvyYiIiIiIiIiIiIi8tgwn/QOhkK99HT3UFEx9a7qOV93keUrlrBw8VyKS4pwuSxs28YyPROaNeL2uKmonMr7h49gGV48bveYpJVpmrz2D7/g2ZUrSNjRid9Ew+LK5QYqq8pH3ps9dwZnz9RhWRZnT9exatWzE6rL4/FQUJjPnj17xxxzbLh6uYHKaRUj78Xjcd5849d87vOfIRqN3tvBaRpcudSI1+tl8pSiMceLi4vAMLh8qX7MsdaWDgYGBphaPuWm17h8qZ7qGdNvaxwsXrKQeTUzRsZBMpkkEbc1e0hEREREREREREREHhuP9BJzv/8vvoZpmiP/8W4ad5bP6u8fID09E2gFYPWalaxctWJUmfqr1/jlL379sXXUnrtAU1MzM2ZVsW7DKoLBDBrqm9i/7xCtLW23bEN1dRVXr1zD7R5eOu7UqXOsWbOafXsOjSqXmZHDwQPv89nPvcprP31zwn0MhwcoKMgFwDAMZs2ewd/97Y+u962RjZvWEuruJCuYc9N6vCleDAM62tvJCuaPOT4wMEhGesbIawODzvYeujq7WbRkLieP145b77/5f/5vYHQC5ac/fv2msTMwME2T7Vv38PKrn+QvvvFtTPPDpfe8vhQGBwY/NjETDg+QlZVFR9vH7w3U3xcmJeXD2VVFRYX8wb/6p6PamkwmR5YQrD13gc6OLiqqyli3YSXBYCb19Y0c2Hd4QuNAREREZOxnHnAUBpFH/vdURERERORJ80gniP77f/ufYH/YxMysAP/m3/7hbdeTlh5gYGBg5PUHe9Hcrv6+Ad7bf4z39h+ju7eT5cuf4TOffYm/+ovvkognb3runHkzCQYzmVr+u5jmh4mube/uHpWgADh25CTV1ZWUTC4gaScn1se0AOFwGICp5WUEAqn87te/gmEYGIaBZVk8u3IZZ06dv2k9kcEIjuOQGQyOezyQ5h8Vyw/s2LaHL3/1sxw7emLc8/7Hf/vGbe9B9IG2tg7Onqll/cbVbNuyZ+T9wYFB/AE/tm2PiulIWwMBBgcjtxwbH8QNoLn55nsQAXR2hujsDHFw/1F6+0MsXbqYz3z2Jb7xp3/NUzApT0RERO4pQxkikcfkV1VERERE5ElzVwmi1FQ/LS3N5Obm3NZ5HR0dE/qEHUjNHJVUSEsP3HYbi4oKyMzM4OyZWnwp/jvqp2VZVE0r59LFq8RiMQCCGTmcPX2RNWtXYZg3T+IUFOThS0nhz//3t0bNdlm38VlKywppbx29T47jOGx9dxe/9eqLvP6LW88i8no9VFZOHSk7Z+5MNr+9nbNn6kbKDEXD/N7v/xNOn6y76VJoiUSCxmvNLF++lAP7jow6lpLiZcrUyfzsp2MTKP394ZGZT1cu16eLxXwAAA9PSURBVOMPBO7pQD24/32+8tXPcSznBB9sR9R4rYlkMklOXgahzv5R5UtKJ+H3p1J7rg5/avrH1jtzVjV1tRfueBxkpGVdHwercXtN4lH9UREREZHb5ChDJPLoU4ZIRERERJ48d5UgMnBx4MAB2tra8Xq9Ezqno6ODaDRKa3Potq9nWdZtlS2bUsL6javZv/e9O04OwfASYwsXzaekdBJ7dx9kcDCCy2Uxa/YMDMOko62HzMzMjz1/7rxZ1NZeGJOYabrWypq1q/npj14fc05bawfnzp5nw8a12LY9fvwNg9y8HNatX0lHRyctzR24XCalk4v50Q9/OqrPKd4AYJAacBMZSNy0v7t27uO3P/cysVicC+evMDgwSG5uDqvXPUtrazttrV3jJpk+mPk0Y9Z0rl5puKcDNRaLs2P7Xj798ovY12dV2bbDnl0H+PwXPstbb/6GjrYQtm1TUlrMug0r2b/3vY9NDqWnp/HM0oXk5Gbzve/+gIz04ITHQWlZMTu27SEWjd8wDgw6O3rISM/UXxURERG5jc/T15dTdhyliEQe2d/T4d9VtOeoiIiIiDxh7ipB1NbSRVp6Kh3tnTjORB9pjTtKDk3UB3vdOI5NqKuH7dt2c6Hu8qgya9auYvWalWPOffs3W6k9O/4SbD//2ZvMXzCLL3z5VbxeL8lkktaWdr7/3f9z0+SQ1+thWnUl3/ubvxtz7NKlq2x6bi1DsTDOOEmg/Xvf4x/9488zEB69pNvv/f4/GZlF098f5vSps+zb8x5ut5tnli2mvv7auAmx87UXWbVqJe/8ZjsAX/v6V0ffz9Z2fvT3r9HW2sHfff/HLFw8ly98abi/fb19nDp1loP738flGn/YfDDz6Qtf+gy15+pGHft3/++/HFO+p6eX7//NDyd8by9dvEL1jEqys7NG3jt+7BTd3d3Mr5lDcXExpmnS1Rli67s7uXTh6qjzP9h7yjAMhoaGuHzpKn/6v785Kjk0aVIhf/iv/68x1z596izvbt7Jz3/2JgsXz+PL/+izpKSkDI+D1nb+/gc/UXJIREREbpthmJi4SJLAcD78PKhkkchD/t0c9cLENCwMzSISERERkSftc++C+c/o+VNERERE5CFyHPv/b+/On5u+7zyOv76SLMuyLcv4wK5vJw5gDgcDJjEZUtImDTRHw5IJIUfb6XaapGHavWZ39g/YaafT6ZFkeiSUkkAgbKEdwkJZ2tmQctg4UAeMje/bMj4lW5aM0bE/KFXqsbnCuDDV8/GbP/p+Dr01nx+sl77fj0LhgMIK/aWBogC39T/lSBhkyCSTYZFhcNYoAAAA/v5YKAEAAABwexmGSWbDSiEAAAAAAH8z/AwKAAAAAAAAAAAgxhAQAQAAAAAAAAAAxBgCIgAAAAAAAAAAgBhDQAQAAAAAAAAAABBjCIgAAAAAAAAAAABiDAERAAAAAAAAAABAjCEgAgAAAAAAAAAAiDEERAAAAAAAAAAAADGGgAgAAAAAAAAAACDGEBABAAAAAAAAAADEGAIiAAAAAAAAAACAGGO51QGcqcmyJ1k1Nua5/mQWixzJTnV19FN5AAAAAAAAAACA2+SWAyKTJaSCgnyVlJRIMq55rd/vV3V1lfILswiJAAAAAAAAAAAAbpNbCoicqUmy2eJVWVl5U/2OHv1fKg8AAAAAAAAAAHCb3NoZRIZktVpvqktCQoICgQCVBwAAAAAAAAAAuE1MlAAAAAAAAAAAACC2EBABAAAAAAAAAADEGMudurCc3GxVPrBaaWmpMplMmvRP6tzHF/RRTa0kqXxlmRYuLNG7O38zrd/SZaUqX1mmHb/aLUkymUxatbpcpaX3yJHiUOBKQC5Xv6qrzqi3xyVJeu7Fp9XU2Kqa6rPRcb7w8IMqKsrXW798J9qWnj5PL359s372+q/k90/OWHP5yjJ9ft0DCgaDkqRwOKSREbeOf1iljvYuSdKTT21Q8V2FCoVC0X4+n18N9Y068adqhcNhSVJiol33r6lQ8V0Fstvt8vv96mjv1snj1Rof90qSMjPT9cLXnlEgEIyO5R336uPauml1WvfQAwqFwjPWe/jQH3Sxvkmvfuebslgs0bnD4bA87jGdrj6rhvpGrVh1r9Y+GDlnyjAkwzCi43nHvXrzF2/rwXVrlJyUpIPvH4mua7Y5687X6+iRDyRJS5YuUvmKMiUm2mUYhtxuj06drFF7Wyc7EwAAAAAAAACAOXRHBkTx8VZt3PS4jhz+oxovtsgwDF0J+PTtrS/L5XKpt+eSJCkUDs3a/6+DjieeWq+UFIfeP3BYPd0ueSe8qli9XBs3Pa69e/brUv+Q2ls7lZubrZrqT8coLMyT2WKWe2xITkd6pK0oXy7XwKzh0F/09bm0Z9d+SVIgEFBmVqqef2GLXv/pzxUKGpKk4386pZrqP0fXOOH36NuvvqRLly6pubFdCQk2bXl+kwYGhrR92zvyTVyWb3JMX35svba8sEnb3nxbgSufvvcffO+nslqtCofDuhL069WtL8vV71Jvd6RO3d292rv7d9es+ba3dmjc45MkBYNBJSZb9a2X/lFt7a06U1OrM58ETplZqfrCF9dp98790/qHgkFNTk2vy/f/68dXPaMqvyBXax+s1K6de+VxjyscDistw6EXv/qcfvTDN2TIzO4EAAAAAAAAAGCO3JGPmHM4kmWxWHT+XL0MIxKqxFns2vn2XjU3td/wOAWFeSooyNP2be9oaHBUNptN6WnpamvpVtWp03rk0YckSW1tnSoozJff75ckOVNTZDabVVdXryVLF0fHKywqUHNjyw3Pb7FYNDI0ruHhEWVlZ856jWEYSrI71dXZq5ycbEnSyorlmpq6ot/uO6grUyHFxcUpJTlNx4+d1tDgsFasLLvqWFaLXRfrm7RgQclnrr/ZbNakLyj3qEepqc5pr83PzFZc3MzQx2S+uUAnLX2eRkfd8rjHo2sfGRrXmz/foWAgzM4EAAAAAAAAAGAO3ZEB0ciIW0NDw3r2+Y26u6RQjpRkSZJvwi+73X7D4xQW5aunp08mY+aNUs1N7crMzNCIe0Cuvn5dnrystAyHJKmoKF9dXb1qqG9Q2bJlkiRLnEU5udmqq7tw80U2mWSNs1719YyMNJWUFKuvt19SJNg6f65OJtPMj6epsVVLliy+5nxms1lm82e/OcxkMqnknmI5U53qd12ak8+4va1TaenztHbdfSq+q1A2m02SNDHhl8ViYWcCAAAAAAAAADCH7shv4oPBoHbv3Kfiu/O0cFGJ1n/5Efl9ftVfaFR11ZnoGT9ZWfP1zZdelPmv7l6xWCwaG4vclWKzxcvj9sw6h3d8QpLkcDhkGIba2zu1rGyZqk6cUUFhvi7UNaitpVtbntusQCCgoqJ8TU5elt83Fb2r6XpMJpMWLCyR05mipqYWWeMSJEmVa1ZrVUW5TCaTzGaz/D6/ak6fVePFVhmGIVt8vIaHR2Zft3dC9sRISBYIBGa8Pj8rU4tKF+jAgUPRttzcHH3nn78lyZhW49d/8mb0769+7TmFwyEZhklxcXEaGBjUvt8ckMKf/VFv//YfW6fNKUnv7f6t+l2X5B71aPu2Xbrr7gKtWFWmJ76yXoODQzr70cdqqG9iZwIAAAAAAAAAMIduQ0B0Y48PCwQCarrYrqaL7QoGg4pPMGvzs08rGLqi6lOR83sGBwb12k9+Oa1f+cqlur9ytSRpYsKnzPnps46flJwoSfL7J5WcaFN7W6cq11To+LFq5eXn6L/37pfT6ZTHPaaExDjl5n9Oba0d1w2HsrOz9MrWb8hkMikcDmtkeFTv7d4XDYck6eSJ6ugZRJlZqVq/4UuqOvWR4uPjI+v2+ZXsSJp93UmJ8vkij8ILhiJB2Xf/5WUZhiHDMOQd9+rDYyfV0+WK9unpuf4ZRDt+vSt6BtEDaytkNpvV09V3S5/0D7732lXPIJKkCa9P52obdK62QT6fT/mF2dr87NMaGRnWpf5hdicAAAAAAAAAAHPklgKiSf+UhoKD6u3tU07O526oT3NzsxyOFHk9Vw8AMjLSZEuwqburV1LkkWmBKen8uQbdXVIUDYiCoZDmzZs3rW+iPVlxljhJkceYraool3tsSE7H9KCotHSBhodGlJwYOWOno71LGx57WKlpSfK4xxRniYQ1nR3dWry4VPkFefrwgxPXfX8uV7/27Np/wzUc6B/VlamA7i1frIYLLZ+spVMrVixX7dkLMx63tmjxArW1tkfrIkk//uHPrhnE3Kzasxf09W9s0amTVZq6PDfnARUU5ml83KuR4VFJkt1u19CAR50d3cqcn0FABAAAAAAAAADAHLrlgCh/frYOHnxf6ekZ171+ampKk5OTCgWuffRRijNFj65/SPv3HZCrb1DhcFip85xauqxULc1tkYvCswcXJpNJ+uQun77efjU3tWrr1ld0+PBReUa9ssZbtWhRiVasulfv7fk0yLl8eUq9vf1av+FRNTW1RNs7O7q1Zu1qpaXNU+PFlps6A+lGHfu/E9r0zJP689la2eKTdKbmYy1cdI8ee+JhVZ06o/GxCTkcyaq4r1ypqU69s+NdxVsTZDHPzQ1gXu+ETlef1TObn9b2be/OyZlAxcUFys3P0e/2H9T42IQMw1BeXo7yC/J08kQ1OxMAAAAAAAAAgDl0y9/8d3W4ZEuwyj06fv2Lw5J7dOK6l7U0t+mIId1fWaH5WfNlGIZ8Ez7VX2jUyROnI3fO3OA5QIcOHtWSpQv08CPrlJGRoUAgoL7efr3963flcXunXdve2qG1n6/U+wf+J9rW29unjPQ0tba2zUk4FJnDpeamVm38hyd16OAfNTU1pT279qmsfLE2bnpcDkeyfD6/2lo79MZrv1C8NeGmxs/Ly9E//esrM9rrztfr6JEPZu3zUU2tlixdpAWlxWpt6vpM7+vf//O7M9rcbo+2v7VLxz44qeXlS/TkUxuUkpKicCisUbdHvz/0Bw0PudmZAAAAAAAAAADMIWPF8vvClAEAAAAAAAAAACB2mCgBAAAAAAAAAABAbCEgAgAAAAAAAAAAiDEERAAAAAAAAAAAADGGgAgAAAAAAAAAACDGEBABAAAAAAAAAADEGAIiAAAAAAAAAACAGENABAAAAAAAAAAAEGMIiAAAAAAAAAAAAGIMAREAAAAAAAAAAECMISACAAAAAAAAAACIMQREAAAAAAAAAAAAMYaACAAAAAAAAAAAIMb8P16flHflReabAAAAAElFTkSuQmCC" - } - }, "cell_type": "markdown", "metadata": {}, "source": [ - "![Screenshot from 2023-08-31 18-50-30.png]()" + "![Screenshot](https://github.com/karmenrabar/pygraphistry_images/blob/main/access3.png?raw=true)" ] }, { @@ -478,15 +462,10 @@ ] }, { - "attachments": { - "Screenshot from 2023-08-31 18-51-35.png": { - "image/png": "" - } - }, "cell_type": "markdown", "metadata": {}, "source": [ - "![Screenshot from 2023-08-31 18-51-35.png]()" + "![Screenshot](https://github.com/karmenrabar/pygraphistry_images/blob/main/access2.png?raw=true)" ] }, { @@ -545,15 +524,10 @@ ] }, { - "attachments": { - "Screenshot from 2023-08-31 18-52-42.png": { - "image/png": "" - } - }, "cell_type": "markdown", "metadata": {}, "source": [ - "![Screenshot from 2023-08-31 18-52-42.png]()" + "![Screenshot](https://github.com/karmenrabar/pygraphistry_images/blob/main/access.png?raw=true)" ] }, { From a5511d7c2ff693c630f030fdc35385f7a7b28b83 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Thu, 14 Sep 2023 18:07:52 -0400 Subject: [PATCH 003/104] docs(memgraph demo): update text --- .../memgraph/visualizing_iam_dataset.ipynb | 20 +++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb index 12e5062121..ec46361f35 100644 --- a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb +++ b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb @@ -11,7 +11,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This notebook showcases the utilization of Graphistry to visualize data from Memgraph using a sample dataset related to a company's Identity and Access Management. We'll demonstrate how Graphistry streamlines the visualization of Cypher queries, making it easier to analyze extensive data effectively." + "This notebook showcases using Graphistry to visualize data in Memgraph for a sample dataset of a company's Identity and Access Management records. We'll demonstrate how Graphistry streamlines the visualization of Cypher queries, making it easier and more effective to analyze rich and potentially large data in Memgraph." ] }, { @@ -32,6 +32,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "#### About Graphistry\n", + "\n", + "[Graphistry](https://www.graphistry.com) is a visual graph AI platform featuring rich point-and-click visual analytics and end-to-end GPU acceleration for exploring and analyzing many relationships. The OSS [PyGraphistry](https://github.com/graphistry/pygraphistry) library enables quickly visualizing large data from Memgraph, and provides a rich and easy dataframe-centric library for intermediate graph processing steps like data shaping, graph algorithms, graph layouts, autoML, autoAI, and data-driven visualization configuration. If you have a GPU where your PyGraphistry client is running, it supports automatic GPU acceleration for the locally executed steps. PyGraphistry is often used directly within data science notebooks and as a Python toolkit for building custom dashboards and webapps.\n", + "\n", "#### About Memgraph" ] }, @@ -106,7 +110,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lastly, establish a connection with your Graphistry GPU server account. Make sure to substitute the connection string and password with your personal credentials. You can create your account [here](https://hub.graphistry.com/). For additional configuration options, refer to [GitHub](https://github.com/graphistry/pygraphistry#configure)." + "Lastly, establish a connection with your Graphistry GPU server account. Make sure to substitute the connection string and password with your personal credentials. You can create your account [here](https://www.graphistry.com/get-started). For additional configuration options, refer to [GitHub](https://github.com/graphistry/pygraphistry#configure)." ] }, { @@ -278,7 +282,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "After populating Memgraph instance, it's time to visualize the dataset with graphistry. But first, let's see the graph schema in Memgraph Lab. It defines the structure of your data and its relationships, providing a blueprint for how your data elements are connected and organized within the graph database and offers interactive graph visualizations.\n" + "After populating your Memgraph instance, it's time to visualize the dataset with Graphistry. But first, let's see the graph schema in Memgraph Lab. It defines the structure of your data and its relationships, providing a blueprint for how your data elements are connected and organized within the graph database and offers interactive graph visualizations.\n" ] }, { @@ -542,11 +546,15 @@ "metadata": {}, "source": [ "\n", - "Pygraphistry complements Memgraph by providing a specialized tool for creating rich and interactive visualizations of graph data stored in Memgraph. It allows users to gain deeper insights into their graph data by leveraging the advanced visualization capabilities of the Graphistry platform, especially when dealing with complex and extensive graph data sets. \n", + "PyGraphistry complements Memgraph by providing a specialized tool for creating rich and interactive visualizations of graph data stored in Memgraph. It allows users to gain deeper insights into their graph data by leveraging the advanced visualization capabilities of the Graphistry platform, especially when dealing with complex and extensive graph data sets.\n", + "\n", + "The [PyGraphistry README.md](https://github.com/graphistry/pygraphistry) shares examples for how to take your Memgraph query result and perform on-the-fly steps like filtering, Pandas dataframe analysis, graph algorithm enrichments, autoML & autoAI analysis, new layouts, and configuring data-driven visualizations.\n", + "\n", + "Feel free to get your hands on Graphistry and Memgraph and share your insights or questions with us on the [Memgraph Discord](https://discord.com/invite/memgraph) and [Graphistry community Slack](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g)!\n", "\n", - "Feel free to get your hands on Graphistry and Memgraph and share your insights or questions with us on [Discord](https://discord.com/invite/memgraph) !\n", + "You can find out more about building and scaling modern IAM systems with Memgraph [here](https://memgraph.com/identity-access-management?utm_source=memgraph&utm_medium=referral&utm_campaign=bfb_blog&utm_content=iam) and on blogposts [What Makes Memgraph Great for Real-Time Performance in IAM Systems](https://memgraph.com/blog/what-makes-memgraph-great-for-real-time-performance-in-iam-systems), [Benefits Graph Databases Bring to Identity and Access Management](https://memgraph.com/blog/benefits-graph-databases-bring-to-identity-and-access-management) and [How Graphs Solve Two Biggest Problems of Traditional IAM Systems](https://memgraph.com/blog/how-graphs-solves-two-biggest-problems-of-traditional-iam-systems).\n", "\n", - "You can find out more about building and scaling modern IAM systems with Memgraph [here](https://memgraph.com/identity-access-management?utm_source=memgraph&utm_medium=referral&utm_campaign=bfb_blog&utm_content=iam) and on blogposts [What Makes Memgraph Great for Real-Time Performance in IAM Systems](https://memgraph.com/blog/what-makes-memgraph-great-for-real-time-performance-in-iam-systems), [Benefits Graph Databases Bring to Identity and Access Management](https://memgraph.com/blog/benefits-graph-databases-bring-to-identity-and-access-management) and [How Graphs Solve Two Biggest Problems of Traditional IAM Systems](https://memgraph.com/blog/how-graphs-solves-two-biggest-problems-of-traditional-iam-systems).\n" + "The [PyGraphistry demos folder](https://github.com/graphistry/pygraphistry/tree/master/demos) has more examples of how security operations and security data science teams are using Graphistry, including a free [GPU graph visualization & AI security analytics training from Nvidia GTC 2022](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51954/). You may also want to explore how [Louie.AI](https://www.louie.ai) is enabling analyst teams to talk directly to their data silos in natural language and get back analyses and visualizations, including Graphistry graph and AI visualizations.\n" ] } ], From 5490ea29cdf868e3f67312732bc214a113a17081 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Thu, 14 Sep 2023 18:09:33 -0400 Subject: [PATCH 004/104] docs(memgraph demo): update text 2 --- .../memgraph/visualizing_iam_dataset.ipynb | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb index ec46361f35..6c03d5a4b5 100644 --- a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb +++ b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb @@ -548,13 +548,11 @@ "\n", "PyGraphistry complements Memgraph by providing a specialized tool for creating rich and interactive visualizations of graph data stored in Memgraph. It allows users to gain deeper insights into their graph data by leveraging the advanced visualization capabilities of the Graphistry platform, especially when dealing with complex and extensive graph data sets.\n", "\n", - "The [PyGraphistry README.md](https://github.com/graphistry/pygraphistry) shares examples for how to take your Memgraph query result and perform on-the-fly steps like filtering, Pandas dataframe analysis, graph algorithm enrichments, autoML & autoAI analysis, new layouts, and configuring data-driven visualizations.\n", - "\n", "Feel free to get your hands on Graphistry and Memgraph and share your insights or questions with us on the [Memgraph Discord](https://discord.com/invite/memgraph) and [Graphistry community Slack](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g)!\n", "\n", "You can find out more about building and scaling modern IAM systems with Memgraph [here](https://memgraph.com/identity-access-management?utm_source=memgraph&utm_medium=referral&utm_campaign=bfb_blog&utm_content=iam) and on blogposts [What Makes Memgraph Great for Real-Time Performance in IAM Systems](https://memgraph.com/blog/what-makes-memgraph-great-for-real-time-performance-in-iam-systems), [Benefits Graph Databases Bring to Identity and Access Management](https://memgraph.com/blog/benefits-graph-databases-bring-to-identity-and-access-management) and [How Graphs Solve Two Biggest Problems of Traditional IAM Systems](https://memgraph.com/blog/how-graphs-solves-two-biggest-problems-of-traditional-iam-systems).\n", "\n", - "The [PyGraphistry demos folder](https://github.com/graphistry/pygraphistry/tree/master/demos) has more examples of how security operations and security data science teams are using Graphistry, including a free [GPU graph visualization & AI security analytics training from Nvidia GTC 2022](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51954/). You may also want to explore how [Louie.AI](https://www.louie.ai) is enabling analyst teams to talk directly to their data silos in natural language and get back analyses and visualizations, including Graphistry graph and AI visualizations.\n" + "The [PyGraphistry README.md](https://github.com/graphistry/pygraphistry) shares examples for how to take your Memgraph query result and perform on-the-fly steps like filtering, Pandas dataframe analysis, graph algorithm enrichments, autoML & autoAI analysis, new layouts, and configuring data-driven visualizations. The [PyGraphistry demos folder](https://github.com/graphistry/pygraphistry/tree/master/demos) has more examples of how security operations and security data science teams are using Graphistry, including a free [GPU graph visualization & AI security analytics training from Nvidia GTC 2022](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51954/). You may also want to explore how [Louie.AI](https://www.louie.ai) is enabling analyst teams to talk directly to their data silos in natural language and get back analyses and visualizations, including Graphistry graph and AI visualizations.\n" ] } ], From d64b71adf3abf7df7ef395a652555fbac4390d5f Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Thu, 14 Sep 2023 18:15:07 -0400 Subject: [PATCH 005/104] docs(memgraph demo): update text 3 --- .../demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb index 6c03d5a4b5..9d2658d427 100644 --- a/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb +++ b/demos/demos_databases_apis/memgraph/visualizing_iam_dataset.ipynb @@ -552,7 +552,7 @@ "\n", "You can find out more about building and scaling modern IAM systems with Memgraph [here](https://memgraph.com/identity-access-management?utm_source=memgraph&utm_medium=referral&utm_campaign=bfb_blog&utm_content=iam) and on blogposts [What Makes Memgraph Great for Real-Time Performance in IAM Systems](https://memgraph.com/blog/what-makes-memgraph-great-for-real-time-performance-in-iam-systems), [Benefits Graph Databases Bring to Identity and Access Management](https://memgraph.com/blog/benefits-graph-databases-bring-to-identity-and-access-management) and [How Graphs Solve Two Biggest Problems of Traditional IAM Systems](https://memgraph.com/blog/how-graphs-solves-two-biggest-problems-of-traditional-iam-systems).\n", "\n", - "The [PyGraphistry README.md](https://github.com/graphistry/pygraphistry) shares examples for how to take your Memgraph query result and perform on-the-fly steps like filtering, Pandas dataframe analysis, graph algorithm enrichments, autoML & autoAI analysis, new layouts, and configuring data-driven visualizations. The [PyGraphistry demos folder](https://github.com/graphistry/pygraphistry/tree/master/demos) has more examples of how security operations and security data science teams are using Graphistry, including a free [GPU graph visualization & AI security analytics training from Nvidia GTC 2022](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51954/). You may also want to explore how [Louie.AI](https://www.louie.ai) is enabling analyst teams to talk directly to their data silos in natural language and get back analyses and visualizations, including Graphistry graph and AI visualizations.\n" + "The [PyGraphistry README.md](https://github.com/graphistry/pygraphistry) shares examples for how to take your Memgraph query result and perform on-the-fly steps like filtering, Pandas dataframe analysis, graph algorithm enrichments, autoML & autoAI analysis, new layouts, and configuring data-driven visualizations. The [PyGraphistry demos folder](https://github.com/graphistry/pygraphistry/tree/master/demos) has more examples of how security operations and security data science teams are using Graphistry, including a free [GPU graph visualization & AI security analytics training from Nvidia GTC 2022](https://www.nvidia.com/en-us/on-demand/session/gtcspring23-dlit51954/). You may also want to explore how [Louie.AI](https://www.louie.ai) is enabling analyst teams to talk directly to their data silos in natural language and get back analyses and visualizations, including Graphistry graph and AI visualizations. Finally, you may consider [graph-app-kit](https://github.com/graphistry/graph-app-kit) as a maintained OSS Streamlit distribution and reference for building PyData dashboards with Graphistry and your Memgraph data.\n" ] } ], From e14b8396a200d7e5fedc83d7b0a8acaac2ddd240 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Sat, 16 Sep 2023 18:01:58 -0400 Subject: [PATCH 006/104] docs(changelog); add memgraph tutorial --- CHANGELOG.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index bf30f599cb..5269a61ad9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Docs + +* Memgraph: Add tutorial (https://github.com/graphistry/pygraphistry/pull/507 by https://github.com/karmenrabar) + ## [0.29.5 - 2023-08-23] ### Fixed From d4a3136d326888787fc18a7e21d9b02db313fffb Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Thu, 12 Oct 2023 16:01:31 +0800 Subject: [PATCH 007/104] fix (sso): In databricks, the HTML is not displayable, print out the sign in url to let user manually copy and paste in browser --- graphistry/pygraphistry.py | 1 + 1 file changed, 1 insertion(+) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index 9b4430d9d8..7be979818a 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -268,6 +268,7 @@ def _handle_auth_url(auth_url, sso_timeout): from IPython.display import display, HTML display(HTML(f'Login SSO')) print("Please click the above link to open browser to login") + print(f"If you cannot see the link, please open browser, browse to this link: {auth_url}") print("Please close browser tab after SSO login to back to notebook") # return HTML(make_iframe(auth_url, 20, extra_html=extra_html, override_html_style=override_html_style)) else: From 38f401a007a91dbee8ea604e3053c6dc548e25e2 Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Thu, 12 Oct 2023 22:30:53 +0800 Subject: [PATCH 008/104] wip (sso): Add explicit SSO login prompt option, display, browser or None (just print auth url) --- graphistry/pygraphistry.py | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index 7be979818a..4ab0460a91 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -202,7 +202,7 @@ def pkey_login(personal_key_id, personal_key_secret, org_name=None, fail_silent= return PyGraphistry.api_token() @staticmethod - def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SECONDS): + def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SECONDS, opt_into_type=None): """Authenticate with SSO and set token for reuse (api=3). :param org_name: Set login organization's name(slug). Defaults to user's personal organization. @@ -211,8 +211,8 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC :type idp_name: Optional[str] :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] - :returns: None. - :rtype: None + :returns: token or auth_url + :rtype: Optional[str] SSO Login logic. @@ -244,11 +244,12 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC auth_url = arrow_uploader.sso_auth_url # print("auth_url : {}".format(auth_url)) if auth_url and not PyGraphistry.api_token(): - PyGraphistry._handle_auth_url(auth_url, sso_timeout) + PyGraphistry._handle_auth_url(auth_url, sso_timeout, opt_into_type) + return auth_url @staticmethod - def _handle_auth_url(auth_url, sso_timeout): + def _handle_auth_url(auth_url, sso_timeout, opt_into_type): """Internal function to handle what to do with the auth_url based on the client mode python/ipython console or notebook. @@ -256,14 +257,14 @@ def _handle_auth_url(auth_url, sso_timeout): :type auth_url: str :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] - :returns: None. - :rtype: None + :returns: token + :rtype: token: Optional[str] SSO Login logic. """ - if in_ipython() or in_databricks(): # If run in notebook, just display the HTML + if in_ipython() or in_databricks() or opt_into_type == 'display': # If run in notebook, just display the HTML # from IPython.core.display import HTML from IPython.display import display, HTML display(HTML(f'Login SSO')) @@ -271,13 +272,16 @@ def _handle_auth_url(auth_url, sso_timeout): print(f"If you cannot see the link, please open browser, browse to this link: {auth_url}") print("Please close browser tab after SSO login to back to notebook") # return HTML(make_iframe(auth_url, 20, extra_html=extra_html, override_html_style=override_html_style)) - else: + elif opt_into_type == 'browser': print("Please minimize browser after SSO login to back to pygraphistry") import webbrowser input("Press Enter to open browser ...") # open browser to auth_url webbrowser.open(auth_url) + else: + print(f"Please open browser, browse to this link: {auth_url}") + print("Please run graphistry.sso_get_token() to complete the authentication if you get timeout error") if sso_timeout is not None: time.sleep(1) @@ -563,7 +567,8 @@ def register( org_name: Optional[str] = None, idp_name: Optional[str] = None, is_sso_login: Optional[bool] = False, - sso_timeout: Optional[int] = SSO_GET_TOKEN_ELAPSE_SECONDS + sso_timeout: Optional[int] = SSO_GET_TOKEN_ELAPSE_SECONDS, + sso_opt_into_type: Optional[str] = None ): """API key registration and server selection @@ -691,7 +696,7 @@ def register( PyGraphistry.api_token(token or PyGraphistry._config['api_token']) elif not (org_name is None) or is_sso_login: print(MSG_REGISTER_ENTER_SSO_LOGIN) - PyGraphistry.sso_login(org_name, idp_name, sso_timeout=sso_timeout) + PyGraphistry.sso_login(org_name, idp_name, sso_timeout=sso_timeout, sso_opt_into_type=None) @staticmethod def __check_login_type_to_reset_token_creds( From 5d365fec8e36a4a9df21dafb03b97d958bf3f70e Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Thu, 12 Oct 2023 22:43:59 +0800 Subject: [PATCH 009/104] fix (mypy): fix error of mypy check --- graphistry/arrow_uploader.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/arrow_uploader.py b/graphistry/arrow_uploader.py index 765fe3067e..e9e735be8a 100644 --- a/graphistry/arrow_uploader.py +++ b/graphistry/arrow_uploader.py @@ -656,7 +656,7 @@ def post_arrow(self, arr: pa.Table, graph_type: str, opts: str = ''): raise Exception('No success indicator in server response') return out except requests.exceptions.HTTPError as e: - logger.error('Failed to post arrow to %s (%s)', sub_path, e.request.url, exc_info=True) + logger.error('Failed to post arrow to %s (%s)', sub_path, "{}/{}{}".format(self.server_base_path, sub_path, f"?{opts}" if len(opts) > 0 else ""), exc_info=True) logger.error('%s', e) logger.error('%s', e.response.text) raise e From d3e1a2735319998c4de80e5d44fe3552b223f938 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Thu, 12 Oct 2023 11:33:11 -0400 Subject: [PATCH 010/104] fix(uploader): guard against potential null deref in exn handler --- CHANGELOG.md | 4 ++++ graphistry/arrow_uploader.py | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5269a61ad9..173231098e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * Memgraph: Add tutorial (https://github.com/graphistry/pygraphistry/pull/507 by https://github.com/karmenrabar) +### Fixed + +* Guard against potential `requests`` null dereference in uploader error handling + ## [0.29.5 - 2023-08-23] ### Fixed diff --git a/graphistry/arrow_uploader.py b/graphistry/arrow_uploader.py index 765fe3067e..0651f192d8 100644 --- a/graphistry/arrow_uploader.py +++ b/graphistry/arrow_uploader.py @@ -656,7 +656,7 @@ def post_arrow(self, arr: pa.Table, graph_type: str, opts: str = ''): raise Exception('No success indicator in server response') return out except requests.exceptions.HTTPError as e: - logger.error('Failed to post arrow to %s (%s)', sub_path, e.request.url, exc_info=True) + logger.error('Failed to post arrow to %s (%s)', sub_path, e.request.url if e.request is not None else "(No request)", exc_info=True) logger.error('%s', e) logger.error('%s', e.response.text) raise e From 7bfb22e69f4ed9b2732822510b7fa33dfe6250d9 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Thu, 12 Oct 2023 11:59:20 -0400 Subject: [PATCH 011/104] fix(sso display): correct val passing and add docs --- graphistry/pygraphistry.py | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index 4ab0460a91..d4f3330791 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -211,6 +211,8 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC :type idp_name: Optional[str] :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] + :param opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() + :type opt_into_type: Optional[Literal["display", "browser"]] :returns: token or auth_url :rtype: Optional[str] @@ -257,6 +259,8 @@ def _handle_auth_url(auth_url, sso_timeout, opt_into_type): :type auth_url: str :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] + :param opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() + :type opt_into_type: Optional[Literal["display", "browser"]] :returns: token :rtype: token: Optional[str] @@ -568,7 +572,7 @@ def register( idp_name: Optional[str] = None, is_sso_login: Optional[bool] = False, sso_timeout: Optional[int] = SSO_GET_TOKEN_ELAPSE_SECONDS, - sso_opt_into_type: Optional[str] = None + sso_opt_into_type: Optional[Literal["display", "browser"]] = None ): """API key registration and server selection @@ -612,6 +616,8 @@ def register( :type idp_name: Optional[str] :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] + :param sso_opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() + :type sso_opt_into_type: Optional[Literal["display", "browser"]] :returns: None. :rtype: None @@ -627,6 +633,14 @@ def register( import graphistry graphistry.register(api=3, protocol='http', server='200.1.1.1', org_name="org-name") + **Example: Override SSO url display method to use `display()`, `webbrowser.open()`, or just `print()`** + :: + + import graphistry + graphistry.register(api=3, protocol='http', server='200.1.1.1', org_name="org-name", sso_opt_into_type="display") + graphistry.register(api=3, protocol='http', server='200.1.1.1', org_name="org-name", sso_opt_into_type="browser") + graphistry.register(api=3, protocol='http', server='200.1.1.1', org_name="org-name", sso_opt_into_type=None) + **Example: Standard (2.0 api by username/password with org_name)** :: @@ -696,7 +710,7 @@ def register( PyGraphistry.api_token(token or PyGraphistry._config['api_token']) elif not (org_name is None) or is_sso_login: print(MSG_REGISTER_ENTER_SSO_LOGIN) - PyGraphistry.sso_login(org_name, idp_name, sso_timeout=sso_timeout, sso_opt_into_type=None) + PyGraphistry.sso_login(org_name, idp_name, sso_timeout=sso_timeout, opt_into_type=sso_opt_into_type) @staticmethod def __check_login_type_to_reset_token_creds( From 20680b23edc24494cf1e705ecc6d32c7e2a841fa Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Fri, 13 Oct 2023 08:40:46 +0800 Subject: [PATCH 012/104] fix (sso_login): fix missing parameter --- graphistry/pygraphistry.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index 4ab0460a91..d0c638a6af 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -202,7 +202,7 @@ def pkey_login(personal_key_id, personal_key_secret, org_name=None, fail_silent= return PyGraphistry.api_token() @staticmethod - def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SECONDS, opt_into_type=None): + def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SECONDS, sso_opt_into_type=None): """Authenticate with SSO and set token for reuse (api=3). :param org_name: Set login organization's name(slug). Defaults to user's personal organization. From b0b8a03f21638c1227a3c260fe1e370a0b00ce70 Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Fri, 13 Oct 2023 08:42:56 +0800 Subject: [PATCH 013/104] fix (sso_login): refactor opt_in_type --- graphistry/pygraphistry.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index d0c638a6af..f636873d6d 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -220,7 +220,7 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC if PyGraphistry._config['store_token_creds_in_memory']: PyGraphistry.relogin = lambda: PyGraphistry.sso_login( - org_name, idp_name, sso_timeout + org_name, idp_name, sso_timeout, sso_opt_into_type ) PyGraphistry._is_authenticated = False @@ -244,12 +244,12 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC auth_url = arrow_uploader.sso_auth_url # print("auth_url : {}".format(auth_url)) if auth_url and not PyGraphistry.api_token(): - PyGraphistry._handle_auth_url(auth_url, sso_timeout, opt_into_type) + PyGraphistry._handle_auth_url(auth_url, sso_timeout, sso_opt_into_type) return auth_url @staticmethod - def _handle_auth_url(auth_url, sso_timeout, opt_into_type): + def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): """Internal function to handle what to do with the auth_url based on the client mode python/ipython console or notebook. @@ -264,7 +264,7 @@ def _handle_auth_url(auth_url, sso_timeout, opt_into_type): """ - if in_ipython() or in_databricks() or opt_into_type == 'display': # If run in notebook, just display the HTML + if in_ipython() or in_databricks() or sso_opt_into_type == 'display': # If run in notebook, just display the HTML # from IPython.core.display import HTML from IPython.display import display, HTML display(HTML(f'Login SSO')) @@ -272,7 +272,7 @@ def _handle_auth_url(auth_url, sso_timeout, opt_into_type): print(f"If you cannot see the link, please open browser, browse to this link: {auth_url}") print("Please close browser tab after SSO login to back to notebook") # return HTML(make_iframe(auth_url, 20, extra_html=extra_html, override_html_style=override_html_style)) - elif opt_into_type == 'browser': + elif sso_opt_into_type == 'browser': print("Please minimize browser after SSO login to back to pygraphistry") import webbrowser From 0f0c745821d0ec592ee0a57ef6e86ba09be97e50 Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Fri, 13 Oct 2023 08:48:56 +0800 Subject: [PATCH 014/104] fix (refactor variable): Refactor opt_into_type to sso_opt_into_type --- graphistry/pygraphistry.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index cee85ea3fd..58918c36ab 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -211,8 +211,8 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC :type idp_name: Optional[str] :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] - :param opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() - :type opt_into_type: Optional[Literal["display", "browser"]] + :param sso_opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() + :type sso_opt_into_type: Optional[Literal["display", "browser"]] :returns: token or auth_url :rtype: Optional[str] @@ -259,8 +259,8 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): :type auth_url: str :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] - :param opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() - :type opt_into_type: Optional[Literal["display", "browser"]] + :param sso_opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() + :type sso_opt_into_type: Optional[Literal["display", "browser"]] :returns: token :rtype: token: Optional[str] @@ -710,7 +710,7 @@ def register( PyGraphistry.api_token(token or PyGraphistry._config['api_token']) elif not (org_name is None) or is_sso_login: print(MSG_REGISTER_ENTER_SSO_LOGIN) - PyGraphistry.sso_login(org_name, idp_name, sso_timeout=sso_timeout, opt_into_type=sso_opt_into_type) + PyGraphistry.sso_login(org_name, idp_name, sso_timeout=sso_timeout, sso_opt_into_type=sso_opt_into_type) @staticmethod def __check_login_type_to_reset_token_creds( From 8f540e4b7893ef6039bc8a0741a9b11914501275 Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Fri, 13 Oct 2023 08:51:01 +0800 Subject: [PATCH 015/104] fix (sso): Change text when got a token --- graphistry/pygraphistry.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index 58918c36ab..aa9667fae8 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -314,7 +314,7 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): # set org_name to sso org PyGraphistry._config['org_name'] = org_name - print("Successfully get a token") + print("Successfully got a token") return PyGraphistry.api_token() else: return None From fd0f0ad2cf8c7d2eb61b60210c59e355f43ed591 Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Fri, 13 Oct 2023 09:21:45 +0800 Subject: [PATCH 016/104] fix (sso): Add more logging to debug, refactor logging to use lazy logging --- graphistry/arrow_uploader.py | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/graphistry/arrow_uploader.py b/graphistry/arrow_uploader.py index e9e735be8a..c464604f8e 100644 --- a/graphistry/arrow_uploader.py +++ b/graphistry/arrow_uploader.py @@ -183,12 +183,12 @@ def __init__(self, # check current org_name from .pygraphistry import PyGraphistry if 'org_name' in PyGraphistry._config: - logger.debug("@ArrowUploader.__init__: There is an org_name : {}".format(PyGraphistry._config['org_name'])) + logger.debug("@ArrowUploader.__init__: There is an org_name : %s", PyGraphistry._config['org_name']) self.__org_name = PyGraphistry._config['org_name'] else: self.__org_name = None - logger.debug("2. @ArrowUploader.__init__: After set self.org_name: {}, self.__org_name : {}".format(self.org_name, self.__org_name)) + logger.debug("2. @ArrowUploader.__init__: After set self.org_name: %s, self.__org_name : %s", self.org_name, self.__org_name) def login(self, username, password, org_name=None): @@ -254,7 +254,7 @@ def _handle_login_response(self, out, org_name): del PyGraphistry._config['org_name'] else: if org_name in PyGraphistry._config: - logger.debug("@ArrowUploder, handle login reponse, org_name: {}".format(PyGraphistry._config['org_name'])) + logger.debug("@ArrowUploder, handle login reponse, org_name: %s", PyGraphistry._config['org_name']) PyGraphistry._config['org_name'] = logged_in_org_name # PyGraphistry.org_name(logged_in_org_name) except Exception: @@ -287,17 +287,18 @@ def sso_login(self, org_name=None, idp_name=None): url, data={'client-type': 'pygraphistry'}, verify=self.certificate_validation ) - # print(out.text) + json_response = None try: + logger.debug("@ArrowUploader.sso_login, out.text: %s", out.text) json_response = out.json() - logger.debug("@ArrowUploader.sso_login, json_response: {}".format(json_response)) + logger.debug("@ArrowUploader.sso_login, json_response: %s", json_response) self.token = None if not ('status' in json_response): raise Exception(out.text) else: if json_response['status'] == 'OK': - logger.debug("@ArrowUploader.sso_login, json_data : {}".format(json_response['data'])) + logger.debug("@ArrowUploader.sso_login, json_data : %s", json_response['data']) if 'state' in json_response['data']: self.sso_state = json_response['data']['state'] self.sso_auth_url = json_response['data']['auth_url'] @@ -336,7 +337,7 @@ def sso_get_token(self, state): if 'token' in json_response['data']: self.token = json_response['data']['token'] if 'active_organization' in json_response['data']: - logger.debug("@ArrowUploader.sso_get_token, org_name: {}".format(json_response['data']['active_organization']['slug'])) + logger.debug("@ArrowUploader.sso_get_token, org_name: %s", json_response['data']['active_organization']['slug']) self.org_name = json_response['data']['active_organization']['slug'] except Exception as e: @@ -382,7 +383,7 @@ def create_dataset(self, json): # noqa: F811 tok = self.token if self.org_name: json['org_name'] = self.org_name - logger.debug("@ArrowUploder create_dataset json: {}".format(json)) + logger.debug("@ArrowUploder create_dataset json: %s", json) res = requests.post( self.server_base_path + '/api/v2/upload/datasets/', verify=self.certificate_validation, @@ -490,7 +491,7 @@ def post(self, as_files: bool = True, memoize: bool = True): """ Note: likely want to pair with self.maybe_post_share_link(g) """ - logger.debug("@ArrowUploader.post, self.org_name : {}".format(self.org_name)) + logger.debug("@ArrowUploader.post, self.org_name : %s", self.org_name) if as_files: file_uploader = ArrowFileUploader(self) From 4eb13f402f0cd6f5afe7ce9627e25924b2cee59f Mon Sep 17 00:00:00 2001 From: Vaim Dev Date: Fri, 13 Oct 2023 09:25:07 +0800 Subject: [PATCH 017/104] fix (sso): Add additional message to tell user about the error --- graphistry/arrow_uploader.py | 1 + 1 file changed, 1 insertion(+) diff --git a/graphistry/arrow_uploader.py b/graphistry/arrow_uploader.py index c464604f8e..fa60202fb3 100644 --- a/graphistry/arrow_uploader.py +++ b/graphistry/arrow_uploader.py @@ -309,6 +309,7 @@ def sso_login(self, org_name=None, idp_name=None): except Exception: logger.error('Error: %s', out, exc_info=True) + print("\nThere is error with the sso login, please check your SSO and IDP configuration") raise return self From 222e064871fccbd5f624902e91bff25bad211861 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Thu, 12 Oct 2023 21:43:12 -0400 Subject: [PATCH 018/104] docs(sso auth): improve messages --- graphistry/arrow_uploader.py | 2 +- graphistry/pygraphistry.py | 15 +++++++-------- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/graphistry/arrow_uploader.py b/graphistry/arrow_uploader.py index fa60202fb3..111e349e3e 100644 --- a/graphistry/arrow_uploader.py +++ b/graphistry/arrow_uploader.py @@ -309,7 +309,7 @@ def sso_login(self, org_name=None, idp_name=None): except Exception: logger.error('Error: %s', out, exc_info=True) - print("\nThere is error with the sso login, please check your SSO and IDP configuration") + print("\nThere is error with the SSO login, please check your SSO and IDP configuration") raise return self diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index aa9667fae8..f4d00a1b23 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -211,7 +211,7 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC :type idp_name: Optional[str] :param sso_timeout: Set sso login getting token timeout in seconds (blocking mode), set to None if non-blocking mode. Default as SSO_GET_TOKEN_ELAPSE_SECONDS. :type sso_timeout: Optional[int] - :param sso_opt_into_type: Show the SSO url with display(), webbrowser.open(), or print() + :param sso_opt_into_type: Show the SSO URL with display(), webbrowser.open(), or print() :type sso_opt_into_type: Optional[Literal["display", "browser"]] :returns: token or auth_url :rtype: Optional[str] @@ -247,7 +247,6 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC # print("auth_url : {}".format(auth_url)) if auth_url and not PyGraphistry.api_token(): PyGraphistry._handle_auth_url(auth_url, sso_timeout, sso_opt_into_type) - return auth_url @staticmethod @@ -272,20 +271,20 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): # from IPython.core.display import HTML from IPython.display import display, HTML display(HTML(f'Login SSO')) - print("Please click the above link to open browser to login") - print(f"If you cannot see the link, please open browser, browse to this link: {auth_url}") + print("Please click the above URL to open browser to login") + print(f"If you cannot see the URL, please open browser, browse to this URL: {auth_url}") print("Please close browser tab after SSO login to back to notebook") # return HTML(make_iframe(auth_url, 20, extra_html=extra_html, override_html_style=override_html_style)) elif sso_opt_into_type == 'browser': - print("Please minimize browser after SSO login to back to pygraphistry") + print("Please minimize browser after your SSO login and go back to pygraphistry") import webbrowser input("Press Enter to open browser ...") # open browser to auth_url webbrowser.open(auth_url) else: - print(f"Please open browser, browse to this link: {auth_url}") - print("Please run graphistry.sso_get_token() to complete the authentication if you get timeout error") + print(f"Please open a browser, browse to this URL, and sign in: {auth_url}") + print("After, if you get timeout error, run graphistry.sso_get_token() to complete the authentication") if sso_timeout is not None: time.sleep(1) @@ -314,7 +313,7 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): # set org_name to sso org PyGraphistry._config['org_name'] = org_name - print("Successfully got a token") + print("Successfully logged in") return PyGraphistry.api_token() else: return None From 812a61f4693e8c69bc544ba4a45b633415f91d1b Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 12 Oct 2023 21:04:20 -0700 Subject: [PATCH 019/104] docs(changelog) --- CHANGELOG.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 173231098e..08ee62447d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +## [0.29.6 - 2023-10-23] + ### Docs * Memgraph: Add tutorial (https://github.com/graphistry/pygraphistry/pull/507 by https://github.com/karmenrabar) @@ -15,6 +17,11 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * Guard against potential `requests`` null dereference in uploader error handling +### Security + +* Add control `register(..., sso_opt_into_type='browser' | 'display' | None)` +* Fix display of SSO URL + ## [0.29.5 - 2023-08-23] ### Fixed From 3bdf4387f184bdc380288ad0377a81c971484821 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 28 Oct 2023 07:21:17 -0400 Subject: [PATCH 020/104] fix(ci) --- CHANGELOG.md | 5 +++++ graphistry/arrow_uploader.py | 2 +- graphistry/tests/test_hyper_dask.py | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 08ee62447d..70591ce6df 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,11 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Fixed + +* Type error in arrow uploader exception handler +* Parsing error in hypergraph dask tests + ## [0.29.6 - 2023-10-23] ### Docs diff --git a/graphistry/arrow_uploader.py b/graphistry/arrow_uploader.py index 111e349e3e..a96b17c5b3 100644 --- a/graphistry/arrow_uploader.py +++ b/graphistry/arrow_uploader.py @@ -660,7 +660,7 @@ def post_arrow(self, arr: pa.Table, graph_type: str, opts: str = ''): except requests.exceptions.HTTPError as e: logger.error('Failed to post arrow to %s (%s)', sub_path, "{}/{}{}".format(self.server_base_path, sub_path, f"?{opts}" if len(opts) > 0 else ""), exc_info=True) logger.error('%s', e) - logger.error('%s', e.response.text) + logger.error('%s', e.response.text if e.response else None) raise e except Exception as e: logger.error('Failed to post arrow to %s', sub_path, exc_info=True) diff --git a/graphistry/tests/test_hyper_dask.py b/graphistry/tests/test_hyper_dask.py index 9cd1abc664..54e256d74b 100644 --- a/graphistry/tests/test_hyper_dask.py +++ b/graphistry/tests/test_hyper_dask.py @@ -71,7 +71,7 @@ def honeypot_pdf() -> pd.DataFrame: #'graphistry/tests/data/honeypot.csv', dtype=base_csv_dtypes, parse_dates=["time(max)", "time(min)"], - date_parser=lambda v: pd.to_datetime(int(v)), + date_parser=lambda v: pd.to_datetime(int(float(v))), ) assert df.dtypes.to_dict() == base_dtypes assert len(df) == HONEYPOT_ROWS From 819c2bd8594650eed15f3c153d7afbc1aa54985d Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 28 Oct 2023 08:22:10 -0400 Subject: [PATCH 021/104] fix(igraph): ensure arrow friendly by default --- CHANGELOG.md | 11 ++++++++++- graphistry/PlotterBase.py | 4 ++-- graphistry/plugins/igraph.py | 16 ++++++++++++---- graphistry/tests/plugins/test_igraph.py | 16 ++++++++++++++-- 4 files changed, 38 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 70591ce6df..fd34ec256d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,10 +7,19 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Added + +* igraph: support `compute_igraph('community_optimal_modularity')` + ### Fixed * Type error in arrow uploader exception handler -* Parsing error in hypergraph dask tests +* igraph: default coerce Graph-type node labels to strings, enabling plotting of g.compute_igraph('k_core') + +### Infra + +* dask: Fixed parsing error in hypergraph dask tests +* igraph: Ensure in compute_igraph tests that default mode results coerce to arrow tables ## [0.29.6 - 2023-10-23] diff --git a/graphistry/PlotterBase.py b/graphistry/PlotterBase.py index 0af2641d2c..50833f4741 100644 --- a/graphistry/PlotterBase.py +++ b/graphistry/PlotterBase.py @@ -1462,9 +1462,9 @@ def to_igraph(self, def compute_igraph(self, - alg: str, out_col: Optional[str] = None, directed: Optional[bool] = None, use_vids: bool = False, params: dict = {} + alg: str, out_col: Optional[str] = None, directed: Optional[bool] = None, use_vids: bool = False, params: dict = {}, stringify_rich_types: bool = True ): - return compute_igraph_base(self, alg, out_col, directed, use_vids, params) + return compute_igraph_base(self, alg, out_col, directed, use_vids, params, stringify_rich_types) compute_igraph.__doc__ = compute_igraph_base.__doc__ diff --git a/graphistry/plugins/igraph.py b/graphistry/plugins/igraph.py index b7bdc0d405..d865c0f1e8 100644 --- a/graphistry/plugins/igraph.py +++ b/graphistry/plugins/igraph.py @@ -288,7 +288,8 @@ def compute_igraph( out_col: Optional[str] = None, directed: Optional[bool] = None, use_vids=False, - params: dict = {} + params: dict = {}, + stringify_rich_types=True ) -> Plottable: """Enrich or replace graph using igraph methods @@ -307,6 +308,9 @@ def compute_igraph( :param params: Any named parameters to pass to the underlying igraph method :type params: dict + :param stringify_rich_types: When rich types like igraph.Graph are returned, which may be problematic for downstream rendering, coerce them to strings + :type stringify_rich_types: bool + :returns: Plotter :rtype: Plotter @@ -374,10 +378,14 @@ def compute_igraph( return from_igraph(self, out) elif isinstance(out, list) and self._nodes is None: raise ValueError("No g._nodes table found; use .bind(), .nodes(), .materialize_nodes()") - elif len(out) == len(self._nodes): - clustering = out + elif isinstance(out, list) and len(out) == len(self._nodes): + if stringify_rich_types and len(out) > 0 and all((isinstance(c, igraph.Graph) for c in out)): + #ex: k_core + clustering = [str(c) for c in out] + else: + clustering = out else: - raise RuntimeError(f'Unexpected output type "{type(out)}"; should be VertexClustering, VertexDendrogram, Graph, or list_<|V|>') + raise RuntimeError(f'Unexpected output type "{type(out)}"; should be VertexClustering, VertexDendrogram, Graph, or list_<|V|>') ig.vs[out_col] = clustering diff --git a/graphistry/tests/plugins/test_igraph.py b/graphistry/tests/plugins/test_igraph.py index 94c2c282db..89617d02bf 100644 --- a/graphistry/tests/plugins/test_igraph.py +++ b/graphistry/tests/plugins/test_igraph.py @@ -1,3 +1,5 @@ +import pyarrow as pa + import graphistry, logging, pandas as pd, pytest, warnings from graphistry.tests.common import NoAuthTestCase from graphistry.constants import SRC, DST, NODE @@ -508,13 +510,23 @@ def test_all_calls(self): with warnings.catch_warnings(record=True) as w: # Cause all warnings to always be triggered. warnings.simplefilter("always") - assert compute_igraph(g, alg, **opts) is not None + g2 = compute_igraph(g, alg, **opts) + assert g2 is not None + assert g2._nodes is not None + assert g2._edges is not None + pa.Table.from_pandas(g2._nodes) + pa.Table.from_pandas(g2._edges) #assert len(w) == 1 assert issubclass(w[-1].category, DeprecationWarning) else: with warnings.catch_warnings(): warnings.filterwarnings("ignore", category=FutureWarning) - assert compute_igraph(g, alg, **opts) is not None + g2 = compute_igraph(g, alg, **opts) + assert g2 is not None + assert g2._nodes is not None + assert g2._edges is not None + pa.Table.from_pandas(g2._nodes) + pa.Table.from_pandas(g2._edges) @pytest.mark.skipif(not has_igraph, reason="Requires igraph") class Test_igraph_layouts(NoAuthTestCase): From 55aca238e8eafb9cd6bac96c1810b48214979735 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 28 Oct 2023 08:22:22 -0400 Subject: [PATCH 022/104] feat(igraph): expose community_optimal_modularity --- graphistry/plugins/igraph.py | 1 + 1 file changed, 1 insertion(+) diff --git a/graphistry/plugins/igraph.py b/graphistry/plugins/igraph.py index d865c0f1e8..7639d1ca74 100644 --- a/graphistry/plugins/igraph.py +++ b/graphistry/plugins/igraph.py @@ -267,6 +267,7 @@ def to_igraph( 'community_leading_eigenvector', 'community_leiden', 'community_multilevel', + 'community_optimal_modularity', 'community_spinglass', 'community_walktrap', 'constraint', From 4f78c5c1d016f04b53f1f25ae2a2d0451cb3fff8 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 28 Oct 2023 08:47:56 -0400 Subject: [PATCH 023/104] feat(igraph): add articulation_points --- CHANGELOG.md | 1 + graphistry/plugins/igraph.py | 14 +++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fd34ec256d..e1f7272f60 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Added * igraph: support `compute_igraph('community_optimal_modularity')` +* igraph: `compute_igraph('articulation_points')` labels nodes that are articulation points ### Fixed diff --git a/graphistry/plugins/igraph.py b/graphistry/plugins/igraph.py index 7639d1ca74..e69da424d0 100644 --- a/graphistry/plugins/igraph.py +++ b/graphistry/plugins/igraph.py @@ -250,6 +250,7 @@ def to_igraph( compute_algs = [ + 'articulation_points', 'authority_score', 'betweenness', 'bibcoupling', @@ -379,6 +380,12 @@ def compute_igraph( return from_igraph(self, out) elif isinstance(out, list) and self._nodes is None: raise ValueError("No g._nodes table found; use .bind(), .nodes(), .materialize_nodes()") + elif alg == 'articulation_points': + assert isinstance(out, list) # List[int] + membership = [0] * len(ig.vs) + for i in out: + membership[i] = 1 + clustering = membership elif isinstance(out, list) and len(out) == len(self._nodes): if stringify_rich_types and len(out) > 0 and all((isinstance(c, igraph.Graph) for c in out)): #ex: k_core @@ -386,7 +393,12 @@ def compute_igraph( else: clustering = out else: - raise RuntimeError(f'Unexpected output type "{type(out)}"; should be VertexClustering, VertexDendrogram, Graph, or list_<|V|>') + if isinstance(out, list) and len(out) > 0: + xtra = f" (element 0 type: {type(out[0])})" + else: + xtra = "" + + raise RuntimeError(f'Unexpected output type "{type(out)}"{xtra}; should be VertexClustering, VertexDendrogram, Graph, or list_<|V|>') ig.vs[out_col] = clustering From 0441ab64cb770d8bac11c83fcc8d2033f0264539 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 30 Oct 2023 10:11:21 -0400 Subject: [PATCH 024/104] test(igraph): chaining --- CHANGELOG.md | 3 ++- graphistry/tests/plugins/test_igraph.py | 29 +++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e1f7272f60..af40dbc0a4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,7 +20,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Infra * dask: Fixed parsing error in hypergraph dask tests -* igraph: Ensure in compute_igraph tests that default mode results coerce to arrow tables +* igraph: Ensure in compute_igraph tests that default mode results coerce to arrow tables +* igraph: Test chaining ## [0.29.6 - 2023-10-23] diff --git a/graphistry/tests/plugins/test_igraph.py b/graphistry/tests/plugins/test_igraph.py index 89617d02bf..377cefc300 100644 --- a/graphistry/tests/plugins/test_igraph.py +++ b/graphistry/tests/plugins/test_igraph.py @@ -478,6 +478,35 @@ def test_enrich_with_stat_direct(self): @pytest.mark.skipif(not has_igraph, reason="Requires igraph") class Test_igraph_compute(NoAuthTestCase): + def chain_1_rename(self, alg: str) -> None: + + g = graphistry.edges(edges3_df, 'a', 'b').materialize_nodes() + + g2 = compute_igraph(g, alg) + assert alg in g2._nodes + + g3 = compute_igraph(g2, alg, f'{alg}2') + assert f'{alg}2' in g3._nodes + assert g2._nodes[alg].equals(g3._nodes[alg]) + assert g2._nodes[alg].equals(g3._nodes[f'{alg}2']) + + g3b = compute_igraph(g2, alg) + assert alg in g3b._nodes + assert g3b._nodes.shape == g2._nodes.shape + + def test_chain_1_rename_pagerank(self): + self.chain_1_rename('pagerank') + + def test_chain_2_rename_articulation_points(self): + self.chain_1_rename('articulation_points') + + def test_chain_3_seq(self): + g = graphistry.edges(edges3_df, 'a', 'b').materialize_nodes() + g2 = compute_igraph(g, 'pagerank') + g3 = compute_igraph(g2, 'articulation_points') + assert 'pagerank' in g3._nodes + assert 'articulation_points' in g3._nodes + def test_all_calls(self): overrides = { 'bipartite_projection': { From c4521bc4d25a0a8ff5ecc9e06a0689f6082f4488 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Wed, 1 Nov 2023 09:09:09 -0400 Subject: [PATCH 025/104] harden(igraph): warn on bad invalid coercion calls --- graphistry/plugins/igraph.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/graphistry/plugins/igraph.py b/graphistry/plugins/igraph.py index e69da424d0..09538f6fab 100644 --- a/graphistry/plugins/igraph.py +++ b/graphistry/plugins/igraph.py @@ -105,6 +105,8 @@ def from_igraph(self, nodes_df = nodes_df[ node_attributes ] if g._nodes is not None and merge_if_existing: + if g._node is None: + raise ValueError('Non-None g._nodes and merge_if_existing=True, yet no g._node is defined') if len(g._nodes) != len(nodes_df): logger.warning('node tables do not match in length; switch merge_if_existing to False or load_nodes to False or add missing nodes') From 36ef22de0544041531c73f2f041abe12d4615f42 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Wed, 1 Nov 2023 09:13:24 -0400 Subject: [PATCH 026/104] fix(igraph): smarter integer index handling --- CHANGELOG.md | 1 + graphistry/plugins/igraph.py | 15 +- graphistry/tests/plugins/test_igraph.py | 174 ++++++++++++++++++++++++ 3 files changed, 189 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index af40dbc0a4..2425ce9bf4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,6 +16,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * Type error in arrow uploader exception handler * igraph: default coerce Graph-type node labels to strings, enabling plotting of g.compute_igraph('k_core') +* igraph: fix coercions when using numeric IDs that were confused by igraph swizzling ### Infra diff --git a/graphistry/plugins/igraph.py b/graphistry/plugins/igraph.py index 09538f6fab..f70b96f6c4 100644 --- a/graphistry/plugins/igraph.py +++ b/graphistry/plugins/igraph.py @@ -128,8 +128,21 @@ def from_igraph(self, node_id_col = None elif node_col in ig_vs_df: node_id_col = node_col + #User to_igraph() with numeric IDs may swizzle id mappings (ex: sparse numeric) so try to un-swizzle + #FIXME: how to handle dense edge case's swizzling? elif g._node is not None and g._nodes[g._node].dtype.name == ig_vs_df.reset_index()['vertex ID'].dtype.name: - node_id_col = None + found = False + #FIXME: This seems quite error prone... what if any fields already exist? + for c in ['name', 'id', 'idx', NODE]: + if c in ig_vs_df.columns: + if g._nodes[g._node].min() == ig_vs_df[c].min() and g._nodes[g._node].max() == ig_vs_df[c].max(): + if g._nodes[g._node].sort_values().equals(ig_vs_df[c].sort_values()): + node_id_col = c + found = True + break + if not found: + logger.debug('lacks matching sortable dimension, likely passed integers-as-vids, continue without remapping') + node_id_col = None elif 'name' in ig_vs_df: node_id_col = 'name' else: diff --git a/graphistry/tests/plugins/test_igraph.py b/graphistry/tests/plugins/test_igraph.py index 377cefc300..d6d191caaf 100644 --- a/graphistry/tests/plugins/test_igraph.py +++ b/graphistry/tests/plugins/test_igraph.py @@ -21,6 +21,14 @@ edges = [(0, 1), (0, 2), (0, 3), (1, 2), (2, 4)] names = ["my", "list", "of", "five", "edges"] +edges_sparse = [(2, 3), (3, 4), (6, 2)] +edges_sparse_renamed = [(0, 1), (1, 2), (3, 0)] +names_sparse = ['ab', 'bc', 'da'] +nodes_sparse = [2, 3, 4, 6] +nodes_sparse_renamed = [0, 1, 2, 3] +names_sparse_v = ['a', 'b', 'c', 'd'] +names_dense_v = ['u0', 'u1', 'a', 'b', 'c', 'u5', 'd'] + nodes = [0, 1, 2, 3, 4] names_v = ["eggs", "spam", "ham", "bacon", "yello"] @@ -47,6 +55,14 @@ 't': [0, 1, 0, 1] }) +edges4_df = pd.DataFrame({ + #no 0 + 's': [5, 6, 2, 5, 4, 2, 9, 4, 7, 10], + 'd': [10, 1, 8, 7, 10, 10, 3, 10, 1, 3], + 'w': [5.58851127, 9.12320228, 4.58717668, 6.59665844, 8.62772521, + 2.48654683, 1.4533045 , 4.47252362, 3.38562727, 9.16188751] +}) + @pytest.mark.skipif(not has_igraph, reason="Requires igraph") class Test_from_igraph(NoAuthTestCase): @@ -59,6 +75,15 @@ def test_minimal_edges(self): assert len(g._edges[g._source].dropna()) == len(edges) assert len(g._edges[g._destination].dropna()) == len(edges) + def test_minimal_edges_sparse(self): + ig = igraph.Graph(edges_sparse) + g = graphistry.from_igraph(ig, load_nodes=False) + assert g._nodes is None and g._node is None + assert len(g._edges) == len(edges_sparse) + assert g._source is not None and g._destination is not None + assert len(g._edges[g._source].dropna()) == len(edges_sparse) + assert len(g._edges[g._destination].dropna()) == len(edges_sparse) + def test_minimal_attributed_edges(self): ig = igraph.Graph(edges) ig.es["name"] = names @@ -70,6 +95,17 @@ def test_minimal_attributed_edges(self): assert len(g._edges[g._destination].dropna()) == len(edges) assert (g._edges['name'] == pd.Series(names)).all() + def test_minimal_attributed_edges_sparse(self): + ig = igraph.Graph(edges_sparse) + ig.es["name"] = names_sparse + g = graphistry.from_igraph(ig, load_nodes=False) + assert g._nodes is None and g._node is None + assert len(g._edges) == len(edges_sparse) + assert g._source is not None and g._destination is not None + assert len(g._edges[g._source].dropna()) == len(edges_sparse) + assert len(g._edges[g._destination].dropna()) == len(edges_sparse) + assert (g._edges['name'] == pd.Series(names_sparse)).all() + def test_minimal_nodes(self): ig = igraph.Graph(edges) g = graphistry.from_igraph(ig) @@ -82,6 +118,19 @@ def test_minimal_nodes(self): assert len(g._edges[g._source].dropna()) == len(edges) assert len(g._edges[g._destination].dropna()) == len(edges) + def test_minimal_nodes_sparse(self): + ig = igraph.Graph(edges_sparse) + g = graphistry.from_igraph(ig) + assert g._node is not None and g._nodes is not None + assert len(g._nodes) == max(nodes_sparse) + 1 + assert len(g._nodes) == len(names_dense_v) + assert g._nodes[g._node].sort_values().to_list() == list(range(max(nodes_sparse) + 1)) + assert g._nodes.columns == [ g._node ] + assert len(g._edges) == len(edges_sparse) + assert g._source is not None and g._destination is not None + assert len(g._edges[g._source].dropna()) == len(edges_sparse) + assert len(g._edges[g._destination].dropna()) == len(edges_sparse) + def test_minimal_nodes_attributed(self): ig = igraph.Graph(edges) ig.vs["name"] = names_v @@ -96,6 +145,21 @@ def test_minimal_nodes_attributed(self): assert len(g._edges[g._source].dropna()) == len(edges) assert len(g._edges[g._destination].dropna()) == len(edges) + def test_minimal_nodes_attributed_sparse(self): + ig = igraph.Graph(edges_sparse) + ig.vs["name"] = names_dense_v + g = graphistry.from_igraph(ig) + assert g._node is not None and g._nodes is not None + assert g._node == NODE + assert len(g._nodes) == max(nodes_sparse) + 1 + assert sorted(g._nodes.columns) == sorted([ NODE ]) + assert len(g._nodes) == len(names_dense_v) + assert g._nodes[g._node].sort_values().to_list() == sorted(names_dense_v) + assert len(g._edges) == len(edges_sparse) + assert g._source is not None and g._destination is not None + assert len(g._edges[g._source].dropna()) == len(edges_sparse) + assert len(g._edges[g._destination].dropna()) == len(edges_sparse) + def test_merge_existing_nodes(self): ig = igraph.Graph(edges) ig.vs["idx"] = ['a', 'b', 'c', 'd', 'e'] @@ -236,6 +300,104 @@ def test_minimal_edges(self): })) assert g2._node == NODE + def test_sparse_edges_renamed(self): + g = graphistry.edges(pd.DataFrame([{'s': s, 'd': d} for (s, d) in edges_sparse]), 's', 'd') + ig = g.to_igraph() + logger.debug('ig: %s', ig) + g2 = graphistry.from_igraph(ig) + assert g2._edges.shape == g._edges.shape + assert g2._source == SRC_IGRAPH + assert g2._destination == DST_IGRAPH + assert g2._edge is None + logger.debug('g2._nodes: %s', g2._nodes) + assert sorted(g2._nodes[g2._node].to_list()) == sorted(nodes_sparse) + assert g2._node == NODE + + def test_swizzles_1_none(self): + g = graphistry.edges(pd.DataFrame({'s': ['a', 'b'], 'd': ['b', 'a'], 'v': ['aa', 'bb']}), 's', 'd') + ig = g.to_igraph() + g2 = g.from_igraph(ig) + assert g2._edges.equals(g._edges) + + gb = g.nodes(pd.DataFrame({'n': ['a', 'b'], 'v': ['aa', 'bb']}), 'n') + ig = gb.to_igraph() + gb2 = gb.from_igraph(ig) + assert gb2._nodes.equals(gb._nodes) + + gc = g.nodes(pd.DataFrame({'n': ['b', 'a'], 'v': ['bb', 'aa']}), 'n') + ig = gc.to_igraph() + gc2 = gc.from_igraph(ig) + assert gc2._nodes.equals(gc._nodes) + + gd = g.materialize_nodes() + ig = gd.to_igraph() + gd2 = gd.from_igraph(ig) + assert gd2._nodes.equals(gd._nodes) + + def test_swizzles_1_none_numeric(self): + g = graphistry.edges(pd.DataFrame({'s': [0, 1], 'd': [0, 1], 'v': ['aa', 'bb']}), 's', 'd') + ig = g.to_igraph() + g2 = g.from_igraph(ig) + assert g2._edges.equals(g._edges) + + gb = g.nodes(pd.DataFrame({'n': [0, 1], 'v': ['aa', 'bb']}), 'n') + ig = gb.to_igraph() + gb2 = gb.from_igraph(ig) + assert gb2._nodes.equals(gb._nodes) + + gc = g.nodes(pd.DataFrame({'n': [1, 0], 'v': ['bb', 'aa']}), 'n') + ig = gc.to_igraph() + gc2 = gc.from_igraph(ig) + assert gc2._nodes.equals(gc._nodes) + + gd = g.materialize_nodes() + ig = gd.to_igraph() + gd2 = gd.from_igraph(ig) + assert gd2._nodes.equals(gd._nodes) + + def test_swizzles_2_sparse(self): + g = graphistry.edges(pd.DataFrame({'s': [1, 2], 'd': [1, 2], 'v': ['11', '22']}), 's', 'd') + ig = g.to_igraph() + g2 = g.from_igraph(ig) + assert g2._edges.equals(g._edges) + + gb = g.nodes(pd.DataFrame({'n': [1, 2], 'v': ['11', '22']}), 'n') + ig = gb.to_igraph() + gb2 = gb.from_igraph(ig) + assert gb2._nodes.equals(gb._nodes) + + gc = g.nodes(pd.DataFrame({'n': [2, 1], 'v': ['22', '11']}), 'n') + ig = gc.to_igraph() + gc2 = gc.from_igraph(ig) + assert gc2._nodes.equals(gc._nodes) + + gd = g.materialize_nodes() + ig = gd.to_igraph() + gd2 = gd.from_igraph(ig) + assert gd2._nodes.equals(gd._nodes) + + def test_swizzles_2_dense(self): + g = graphistry.edges(pd.DataFrame({'s': [1, 0], 'd': [1, 0], 'v': ['11', '00']}), 's', 'd') + ig = g.to_igraph() + g2 = g.from_igraph(ig) + assert g2._edges.equals(g._edges) + + gb = g.nodes(pd.DataFrame({'n': [1, 0], 'v': ['11', '00']}), 'n') + ig = gb.to_igraph() + gb2 = gb.from_igraph(ig) + assert gb2._nodes.equals(gb._nodes) + + gc = g.nodes(pd.DataFrame({'n': [0, 1], 'v': ['00', '11']}), 'n') + ig = gc.to_igraph() + gc2 = gc.from_igraph(ig) + assert gc2._nodes.equals(gc._nodes) + + gd = g.materialize_nodes() + ig = gd.to_igraph() + gd2 = gd.from_igraph(ig) + assert gd2._nodes.equals(gd._nodes) + + def test_minimal_edges_renamed(self): g = (graphistry .edges(pd.DataFrame({ @@ -507,6 +669,18 @@ def test_chain_3_seq(self): assert 'pagerank' in g3._nodes assert 'articulation_points' in g3._nodes + def test_chain_4_sparse(self): + + #From https://github.com/graphistry/pygraphistry/pull/513#issuecomment-1784161313 + + g = graphistry.edges(edges4_df, 's', 'd').materialize_nodes() + g2 = g.compute_igraph('articulation_points') + assert 'articulation_points' in g2._nodes + g2b = g.compute_igraph('community_optimal_modularity') + assert 'community_optimal_modularity' in g2b._nodes + g3 = g2.compute_igraph('community_optimal_modularity') + assert g3._nodes.community_optimal_modularity.equals(g2b._nodes.community_optimal_modularity) + def test_all_calls(self): overrides = { 'bipartite_projection': { From 61f497dbfd58ae4b6d65c482bdc795ce042fad02 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Wed, 1 Nov 2023 09:14:56 -0400 Subject: [PATCH 027/104] infra(tests): mount source folder --- CHANGELOG.md | 1 + docker/test-cpu-local.sh | 1 + docker/test-gpu-local.sh | 1 + 3 files changed, 3 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2425ce9bf4..248f45f117 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -23,6 +23,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * dask: Fixed parsing error in hypergraph dask tests * igraph: Ensure in compute_igraph tests that default mode results coerce to arrow tables * igraph: Test chaining +* tests: mount source folders to enable dev iterations without rebuilding ## [0.29.6 - 2023-10-23] diff --git a/docker/test-cpu-local.sh b/docker/test-cpu-local.sh index 8925048a62..9d7392605d 100755 --- a/docker/test-cpu-local.sh +++ b/docker/test-cpu-local.sh @@ -46,6 +46,7 @@ docker run \ -e WITH_TYPECHECK=$WITH_TYPECHECK \ -e WITH_BUILD=$WITH_BUILD \ -e WITH_TEST=$WITH_TEST \ + -v "`pwd`/../graphistry:/opt/pygraphistry/graphistry:ro" \ --rm \ ${NETWORK} \ graphistry/test-cpu:${TEST_CPU_VERSION} \ diff --git a/docker/test-gpu-local.sh b/docker/test-gpu-local.sh index d481054c47..d0d239d023 100755 --- a/docker/test-gpu-local.sh +++ b/docker/test-gpu-local.sh @@ -39,6 +39,7 @@ docker run \ -e WITH_TYPECHECK=$WITH_TYPECHECK \ -e WITH_TEST=$WITH_TEST \ -e WITH_BUILD=$WITH_BUILD \ + -v "`pwd`/../graphistry:/opt/pygraphistry/graphistry:ro" \ --security-opt seccomp=unconfined \ --rm \ ${NETWORK} \ From 11e838dd38cb4e2d6f48ebdb00cdacb296166249 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 2 Nov 2023 11:04:55 -0400 Subject: [PATCH 028/104] docs(changelog): 0.29.7 --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 248f45f117..396671f961 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +## [0.29.7 - 2023-11-02] + ### Added * igraph: support `compute_igraph('community_optimal_modularity')` From ee2de1d0c965c16b774cfde1c60643af26f16b3f Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Fri, 17 Nov 2023 14:51:27 -0600 Subject: [PATCH 029/104] fix: element_id not defined in Neptune bolt python results --- graphistry/bolt_util.py | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/graphistry/bolt_util.py b/graphistry/bolt_util.py index 361ae401aa..4e1dd460b5 100644 --- a/graphistry/bolt_util.py +++ b/graphistry/bolt_util.py @@ -34,8 +34,8 @@ def bolt_graph_to_edges_dataframe(graph): { relationship_id_key: relationship.element_id, # noqa: E241 relationship_type_key: relationship.type, # noqa: E241 - start_node_id_key: relationship.start_node.element_id, # noqa: E241 - end_node_id_key: relationship.end_node.element_id # noqa: E241 + start_node_id_key: relationship.start_node.element_id if 'element_id' in relationship.start_node else relationship.start_node.id, # noqa: E241 + end_node_id_key: relationship.end_node.element_id if 'element_id' in relationship.end_node else relationship.end_node.id, # noqa: E241 } ) for relationship in graph.relationships @@ -56,9 +56,9 @@ def bolt_graph_to_nodes_dataframe(graph) -> pd.DataFrame: util.merge_two_dicts( { key: value for (key, value) in node.items() }, util.merge_two_dicts( - { - node_id_key: node.element_id, - node_type_key: ",".join(sorted([str(label) for label in node.labels])) + { + node_id_key: node.element_id, + node_type_key: ",".join(sorted([str(label) for label in node.labels])) }, { node_label_prefix_key + str(label): True for label in node.labels })) for node in graph.nodes @@ -171,12 +171,12 @@ def flatten_spatial(df : pd.DataFrame, col) -> pd.DataFrame: all_t0 = (with_vals.apply(lambda s: s.__class__) == t0.__class__).all() # type: ignore except: all_t0 = False - + if all_t0: out_df = flatten_spatial_col(df, col) else: out_df[col] = df[col].apply(stringify_spatial) - + return out_df From c60448a24669280a59e6ef992df38b1aa45185e3 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Fri, 17 Nov 2023 16:59:49 -0600 Subject: [PATCH 030/104] fix: use id if element_id is missing from Neptune --- graphistry/bolt_util.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/bolt_util.py b/graphistry/bolt_util.py index 4e1dd460b5..6d138fa491 100644 --- a/graphistry/bolt_util.py +++ b/graphistry/bolt_util.py @@ -32,7 +32,7 @@ def bolt_graph_to_edges_dataframe(graph): util.merge_two_dicts( { key: value for (key, value) in relationship.items() }, { - relationship_id_key: relationship.element_id, # noqa: E241 + relationship_id_key: relationship.element_id if 'element_id' in relationship else relationship.id, # noqa: E241 relationship_type_key: relationship.type, # noqa: E241 start_node_id_key: relationship.start_node.element_id if 'element_id' in relationship.start_node else relationship.start_node.id, # noqa: E241 end_node_id_key: relationship.end_node.element_id if 'element_id' in relationship.end_node else relationship.end_node.id, # noqa: E241 From a09c865e420e1659cddc9b408c077528897bbff1 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Fri, 17 Nov 2023 17:30:11 -0600 Subject: [PATCH 031/104] PR review: use hasattr move conditional check outside of loop --- graphistry/bolt_util.py | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/graphistry/bolt_util.py b/graphistry/bolt_util.py index 6d138fa491..a201631902 100644 --- a/graphistry/bolt_util.py +++ b/graphistry/bolt_util.py @@ -28,15 +28,28 @@ def to_bolt_driver(driver=None): #TODO catch additional encodings def bolt_graph_to_edges_dataframe(graph): + + for relationship in graph.relationships: + if hasattr(relationship, 'element_id'): + map_dict = { + relationship_id_key: relationship.element_id, # noqa: E241 + relationship_type_key: relationship.type, # noqa: E241 + start_node_id_key: relationship.start_node.element_id, # noqa: E241 + end_node_id_key: relationship.end_node.element_id, # noqa: E241 + } + else: + map_dict = { + relationship_id_key: relationship.id, # noqa: E241 + relationship_type_key: relationship.type, # noqa: E241 + start_node_id_key: relationship.start_node.id, # noqa: E241 + end_node_id_key: relationship.end_node.id, # noqa: E241 + } + break + df = pd.DataFrame([ util.merge_two_dicts( { key: value for (key, value) in relationship.items() }, - { - relationship_id_key: relationship.element_id if 'element_id' in relationship else relationship.id, # noqa: E241 - relationship_type_key: relationship.type, # noqa: E241 - start_node_id_key: relationship.start_node.element_id if 'element_id' in relationship.start_node else relationship.start_node.id, # noqa: E241 - end_node_id_key: relationship.end_node.element_id if 'element_id' in relationship.end_node else relationship.end_node.id, # noqa: E241 - } + map_dict ) for relationship in graph.relationships ]) From 01efc6a47fa99ae2541f5cd6d69e2a278825d0ca Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Fri, 17 Nov 2023 17:40:28 -0600 Subject: [PATCH 032/104] fix: neptune: check if element_id exists, if not, use id --- graphistry/bolt_util.py | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/graphistry/bolt_util.py b/graphistry/bolt_util.py index a201631902..2542979744 100644 --- a/graphistry/bolt_util.py +++ b/graphistry/bolt_util.py @@ -65,12 +65,19 @@ def bolt_graph_to_edges_dataframe(graph): def bolt_graph_to_nodes_dataframe(graph) -> pd.DataFrame: + for node in graph.nodes: + if hasattr(node, 'element_id'): + map_id_col = node.element_id + else: + map_id_col = node.id + break + df = pd.DataFrame([ util.merge_two_dicts( { key: value for (key, value) in node.items() }, util.merge_two_dicts( { - node_id_key: node.element_id, + node_id_key: map_id_col, node_type_key: ",".join(sorted([str(label) for label in node.labels])) }, { node_label_prefix_key + str(label): True for label in node.labels })) From b86bab4e9bf13869eb550b41e787194bc9f07db5 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Sat, 18 Nov 2023 17:19:03 -0600 Subject: [PATCH 033/104] fix: break conditional out of node and edge loops --- graphistry/bolt_util.py | 90 ++++++++++++++++++++++++++--------------- 1 file changed, 58 insertions(+), 32 deletions(-) diff --git a/graphistry/bolt_util.py b/graphistry/bolt_util.py index 2542979744..d80b88d9ec 100644 --- a/graphistry/bolt_util.py +++ b/graphistry/bolt_util.py @@ -29,31 +29,40 @@ def to_bolt_driver(driver=None): #TODO catch additional encodings def bolt_graph_to_edges_dataframe(graph): + is_neptune=False + for relationship in graph.relationships: - if hasattr(relationship, 'element_id'): - map_dict = { - relationship_id_key: relationship.element_id, # noqa: E241 - relationship_type_key: relationship.type, # noqa: E241 - start_node_id_key: relationship.start_node.element_id, # noqa: E241 - end_node_id_key: relationship.end_node.element_id, # noqa: E241 - } - else: - map_dict = { - relationship_id_key: relationship.id, # noqa: E241 - relationship_type_key: relationship.type, # noqa: E241 - start_node_id_key: relationship.start_node.id, # noqa: E241 - end_node_id_key: relationship.end_node.id, # noqa: E241 - } + # neptune results not returing element_id, so use id instead + is_neptune = not hasattr(relationship, 'element_id') break + if is_neptune: + map_dict_df = pd.DataFrame ([ + { + relationship_id_key: relationship.id, # noqa: E241 + relationship_type_key: relationship.type, # noqa: E241 + start_node_id_key: relationship.start_node.id, # noqa: E241 + end_node_id_key: relationship.end_node.id, # noqa: E241 + } + for relationship in graph.relationships]) + else: + map_dict_df = pd.DataFrame ([ + { + relationship_id_key: relationship.element_id, # noqa: E241 + relationship_type_key: relationship.type, # noqa: E241 + start_node_id_key: relationship.start_node.element_id, # noqa: E241 + end_node_id_key: relationship.end_node.element_id, # noqa: E241 + } + for relationship in graph.relationships]) + df = pd.DataFrame([ - util.merge_two_dicts( - { key: value for (key, value) in relationship.items() }, - map_dict - ) + { key: value for (key, value) in relationship.items() } for relationship in graph.relationships ]) - if len(df) == 0: + + joined_df = map_dict_df.join(df) + + if len(joined_df) == 0: util.warn('Query returned no edges; may have surprising visual results or need to add missing columns for encodings') return pd.DataFrame({ relationship_id_key: pd.Series([], dtype='int32'), @@ -61,35 +70,52 @@ def bolt_graph_to_edges_dataframe(graph): start_node_id_key: pd.Series([], dtype='int32'), end_node_id_key: pd.Series([], dtype='int32') }) - return neo_df_to_pd_df(df) + return neo_df_to_pd_df(joined_df) def bolt_graph_to_nodes_dataframe(graph) -> pd.DataFrame: + + is_neptune=False + for node in graph.nodes: - if hasattr(node, 'element_id'): - map_id_col = node.element_id - else: - map_id_col = node.id + # neptune results not returing element_id, so use id instead + is_neptune = not hasattr(node, 'element_id') break + if is_neptune: + map_dict_df = pd.DataFrame ([ + { + node_id_key: node.id, + node_type_key: ",".join(sorted([str(label) for label in node.labels])) + } + for node in graph.nodes + ]) + else: + map_dict_df = pd.DataFrame ([ + { + node_id_key: node.element_id, + node_type_key: ",".join(sorted([str(label) for label in node.labels])) + } + for node in graph.nodes + ]) + df = pd.DataFrame([ util.merge_two_dicts( { key: value for (key, value) in node.items() }, - util.merge_two_dicts( - { - node_id_key: map_id_col, - node_type_key: ",".join(sorted([str(label) for label in node.labels])) - }, - { node_label_prefix_key + str(label): True for label in node.labels })) + { node_label_prefix_key + str(label): True for label in node.labels }) for node in graph.nodes ]) - if len(df) == 0: + + joined_df = map_dict_df.merge(df, how='outer', left_index=True, right_index=True) + + if len(joined_df) == 0: util.warn('Query returned no nodes') return pd.DataFrame({ node_id_key: pd.Series([], dtype='int32'), node_type_key: pd.Series([], dtype='object') }) - return neo_df_to_pd_df(df) + return neo_df_to_pd_df(joined_df) + # Knowing a col is all-spatial, flatten into primitive cols From 46846ee8121225d955ca5218e77696fb6099baa1 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Mon, 20 Nov 2023 15:12:28 -0600 Subject: [PATCH 034/104] new notebook for neptune cypher using bolt --- .../neptune_cypher_viz_using_bolt.ipynb | 211 ++++++++++++++++++ 1 file changed, 211 insertions(+) create mode 100755 demos/demos_databases_apis/neptune/neptune_cypher_viz_using_bolt.ipynb diff --git a/demos/demos_databases_apis/neptune/neptune_cypher_viz_using_bolt.ipynb b/demos/demos_databases_apis/neptune/neptune_cypher_viz_using_bolt.ipynb new file mode 100755 index 0000000000..49cf8e156a --- /dev/null +++ b/demos/demos_databases_apis/neptune/neptune_cypher_viz_using_bolt.ipynb @@ -0,0 +1,211 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "10436f61-3f82-4316-b9be-b6a70746d4f7", + "metadata": {}, + "source": [ + "## Graphistry for Neptune using pygraphistry bolt connector \n", + "\n", + "#### This example uses pygraphistry bolt helper class to run queries against AWS Neptune and retrieve query results as graph, then the bolt helper function extracts all the nodes and edges into the dataframes automatically. Then visualize the resulting datasets using Graphistry. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b55398ab", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install --user neo4j" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8481cc10-0407-4675-a966-4b09c411cb45", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "!pip install --user awswrangler" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3f53062", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install --user graphistry" + ] + }, + { + "cell_type": "markdown", + "id": "e7daa787", + "metadata": {}, + "source": [ + "## make sure to restart kernel after pip install " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c499ed6d-a4fc-44a6-9bd6-62f65dadc329", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import awswrangler as wr\n", + "import pandas as pd\n", + "import graphistry\n", + "graphistry.__version__" + ] + }, + { + "cell_type": "markdown", + "id": "c509ab26-78fb-4a50-accd-9a5f06a7f0b3", + "metadata": {}, + "source": [ + "### Configure graphistry connnection " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3cd273d7-e81b-4a2e-81f8-09a2f1bf3a11", + "metadata": {}, + "outputs": [], + "source": [ + "# To specify Graphistry account & server, use:\n", + "# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')\n", + "\n", + "# To run from a graphistry-host jupyter notebook: \n", + "# graphistry.register(api=3, username=\"...\", password=\"...\", protocol=\"http\", server=\"nginx\") \n", + "\n", + "# to use personal keys:\n", + "# graphistry.register(api=3, protocol=\"...\", server=\"...\", personal_key_id='pkey_id', personal_key_secret='pkey_secret') # Key instead of username+password+org_name\n", + "\n", + "# For more options, see https://github.com/graphistry/pygraphistry#configure\n", + "\n", + "graphistry.register(api=3, username=\"...\", password=\"...\", protocol=\"...\", server=\"...\") " + ] + }, + { + "cell_type": "markdown", + "id": "909e7d08", + "metadata": {}, + "source": [ + "## Configure Neptune connection endpoint: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e2d83081", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# update with your Neptune endpoint name: \n", + "url='NEPTUNE_NAME.REGION.neptune.amazonaws.com' " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a053c49b-95e7-4ad3-9c16-853738437c8f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "iam_enabled = False # Set to True/False based on the configuration of your cluster\n", + "neptune_port = 8182 # Set to the Neptune Cluster Port, Default is 8182\n", + "neptune_region = 'us-east-1' # Set to neptune region\n", + "\n", + "client = wr.neptune.connect(url, neptune_port, iam_enabled=iam_enabled, region=neptune_region)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50391086-8ec6-4dc8-b77d-a75b8822b0e6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# check status of neptune connection: \n", + "client.status()" + ] + }, + { + "cell_type": "markdown", + "id": "5404d9b2", + "metadata": {}, + "source": [ + "## Connect to Neptune using pygraphistry bolt connector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b81686d7", + "metadata": {}, + "outputs": [], + "source": [ + "from neo4j import GraphDatabase\n", + "uri = f\"bolt://{url}:8182\"\n", + "driver = GraphDatabase.driver(uri, auth=(\"ignored\", \"ignored\"), encrypted=True)\n", + "\n", + "graphistry.register(bolt=driver)\n", + "g = graphistry.cypher(\"MATCH (a)-[r]->(b) return a, r, b limit 10000\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37270027", + "metadata": {}, + "outputs": [], + "source": [ + "g.plot()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c48999ee", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 3d1b650f91c68c2bb15b5dbc2ac3581b35c8e823 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Mon, 20 Nov 2023 19:52:21 -0600 Subject: [PATCH 035/104] cleanup: linter issues --- graphistry/bolt_util.py | 44 +++++++++++++++++++++-------------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/graphistry/bolt_util.py b/graphistry/bolt_util.py index d80b88d9ec..0529458734 100644 --- a/graphistry/bolt_util.py +++ b/graphistry/bolt_util.py @@ -29,7 +29,7 @@ def to_bolt_driver(driver=None): #TODO catch additional encodings def bolt_graph_to_edges_dataframe(graph): - is_neptune=False + is_neptune = False for relationship in graph.relationships: # neptune results not returing element_id, so use id instead @@ -37,26 +37,28 @@ def bolt_graph_to_edges_dataframe(graph): break if is_neptune: - map_dict_df = pd.DataFrame ([ + map_dict_df = pd.DataFrame([ { relationship_id_key: relationship.id, # noqa: E241 relationship_type_key: relationship.type, # noqa: E241 - start_node_id_key: relationship.start_node.id, # noqa: E241 - end_node_id_key: relationship.end_node.id, # noqa: E241 + start_node_id_key: relationship.start_node.id, # noqa: E241 + end_node_id_key: relationship.end_node.id, # noqa: E241 } for relationship in graph.relationships]) else: - map_dict_df = pd.DataFrame ([ + map_dict_df = pd.DataFrame([ { relationship_id_key: relationship.element_id, # noqa: E241 relationship_type_key: relationship.type, # noqa: E241 - start_node_id_key: relationship.start_node.element_id, # noqa: E241 - end_node_id_key: relationship.end_node.element_id, # noqa: E241 + start_node_id_key: relationship.start_node.element_id, # noqa: E241 + end_node_id_key: relationship.end_node.element_id, # noqa: E241 } for relationship in graph.relationships]) df = pd.DataFrame([ - { key: value for (key, value) in relationship.items() } + { + key: value for (key, value) in relationship.items() + } for relationship in graph.relationships ]) @@ -75,7 +77,7 @@ def bolt_graph_to_edges_dataframe(graph): def bolt_graph_to_nodes_dataframe(graph) -> pd.DataFrame: - is_neptune=False + is_neptune = False for node in graph.nodes: # neptune results not returing element_id, so use id instead @@ -83,20 +85,20 @@ def bolt_graph_to_nodes_dataframe(graph) -> pd.DataFrame: break if is_neptune: - map_dict_df = pd.DataFrame ([ - { - node_id_key: node.id, - node_type_key: ",".join(sorted([str(label) for label in node.labels])) - } - for node in graph.nodes + map_dict_df = pd.DataFrame([ + { + node_id_key: node.id, + node_type_key: ",".join(sorted([str(label) for label in node.labels])) + } + for node in graph.nodes ]) else: - map_dict_df = pd.DataFrame ([ - { - node_id_key: node.element_id, - node_type_key: ",".join(sorted([str(label) for label in node.labels])) - } - for node in graph.nodes + map_dict_df = pd.DataFrame([ + { + node_id_key: node.element_id, + node_type_key: ",".join(sorted([str(label) for label in node.labels])) + } + for node in graph.nodes ]) df = pd.DataFrame([ From cbce06ea0522fb679f4d0a762f2fe59ea51a6707 Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Fri, 1 Dec 2023 00:39:04 -0800 Subject: [PATCH 036/104] docs(changelog) --- CHANGELOG.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 396671f961..bbd3cfb220 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,14 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Added + +* Neptune: Can now use PyGraphistry OpenCypher/BOLT bindings with Neptune, in addition to existing Gremlin bindings + +### Docs + +* Neptune: Initial tutorial for using PyGraphistry with Amazon Neptune's OpenCypher/BOLT bindings + ## [0.29.7 - 2023-11-02] ### Added From a3d56b23a743a9c3bef90b318eecbb71622e35bc Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 1 Dec 2023 01:45:36 -0800 Subject: [PATCH 037/104] fix(hop) --- CHANGELOG.md | 4 +++ graphistry/compute/hop.py | 2 +- graphistry/tests/test_compute_hops.py | 42 +++++++++++++++++++++++++++ 3 files changed, 47 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 396671f961..9d8bfa9149 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Fixed + +* chain/hop: source_node_match was being mishandled when multiple node attributes exist + ## [0.29.7 - 2023-11-02] ### Added diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 365cedbd88..396715515e 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -59,7 +59,7 @@ def hop(self: Plottable, raise ValueError('Source and destination binding cannot be None, please set g._source and g._destination via bind() or edges()') hops_remaining = hops - wave_front = filter_by_dict(nodes[[ g2._node ]], source_node_match) + wave_front = filter_by_dict(nodes, source_node_match)[[ g2._node ]] matches_nodes = None matches_edges = edges_indexed[[EDGE_ID]][:0] diff --git a/graphistry/tests/test_compute_hops.py b/graphistry/tests/test_compute_hops.py index fd53fe9dcb..02c696472b 100644 --- a/graphistry/tests/test_compute_hops.py +++ b/graphistry/tests/test_compute_hops.py @@ -138,3 +138,45 @@ def test_hop_pre_post_match_1(self): assert (g2._nodes[g2._node].sort_values().to_list() == # noqa: W504 sorted(['e', 'l'])) assert g2._edges.shape == (1, 3) + + def test_hop_filter_types(self): + + e_df = pd.DataFrame({ + 's': ['a', 'a', 'd', 'd', 'f', 'f'], + 'd': ['b', 'b', 'e', 'e', 'g', 'g'], + 't': ['x', 'h', 'x', 'h', 'x', 'h'] + }) + n_df = pd.DataFrame({ + 'n': ['a', 'b', 'd', 'e', 'f', 'g'], + 't': ['x', 'm', 'x', 'n', 'x', 'o'] + }) + g = CGFull().edges(e_df, 's', 'd').nodes(n_df, 'n') + + g2a = g.hop(source_node_match={'n': 'a'}) + assert g2a._nodes.shape == (2, 2) + assert g2a._edges.shape == (2, 3) + + g2b = g.hop(source_node_match={'t': 'm'}, direction='forward') + assert g2b._nodes.shape == (0, 2) + assert g2b._edges.shape == (0, 3) + + g3a = g.hop(edge_match={'t': 'h', 's': 'a'}) + assert g3a._nodes.shape == (2, 2) + assert g3a._edges.shape == (1, 3) + + #TODO investigate + #g4a = g.hop(destination_node_match={'t': 'n'}, direction='reverse') + #assert g4a._nodes.shape == (2, 2) + #assert g4a._edges.shape == (2, 3) + + g4a = g.hop(destination_node_match={'t': 'n'}) + assert g4a._nodes.shape == (2, 2) + assert g4a._edges.shape == (2, 3) + + #TODO investigate setting to reverse + g5a = g.hop( + source_node_match={'t': 'x', 'n': 'a'}, + edge_match={'t': 'h'}, + destination_node_match={'t': 'm'}) + assert g5a._nodes.shape == (2, 2) + assert g5a._edges.shape == (1, 3) From 462d35d68020decef3e14fc044351341834f6a54 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 1 Dec 2023 03:35:31 -0800 Subject: [PATCH 038/104] feat(is_in) --- CHANGELOG.md | 4 +++ README.md | 8 +++-- graphistry/__init__.py | 3 +- graphistry/compute/__init__.py | 1 + graphistry/compute/ast.py | 3 +- graphistry/compute/chain.py | 15 ++++++++++ graphistry/compute/filter_by_dict.py | 30 ++++++++++++++++--- graphistry/tests/test_compute_chain.py | 6 +++- .../tests/test_compute_filter_by_dict.py | 16 +++++++++- graphistry/tests/test_compute_hops.py | 6 +++- 10 files changed, 80 insertions(+), 12 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9d8bfa9149..c227f37572 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Added + +* chain/hop: `is_in()` membership predicate, `.chain([ n({'type': is_in(['a', 'b'])}) ])` + ### Fixed * chain/hop: source_node_match was being mishandled when multiple node attributes exist diff --git a/README.md b/README.md index 73388abcdb..bf67192a51 100644 --- a/README.md +++ b/README.md @@ -1108,7 +1108,8 @@ g2.plot() # nodes are values from cols s, d, k1 destination_node_match={"k2": 2}) .chain([ # filter to subgraph n(), - n({'k2': 0}), + n({'k2': 0, "m": 'ok'}), #specific values + n({'type': is_in(["type1", "type2"])}), #multiple valid values n(name="start"), # add column 'start':bool e_forward({'k1': 'x'}, hops=1), # same API as hop() e_undirected(name='second_edge'), @@ -1200,10 +1201,11 @@ g5.plot() Rich compound patterns are enabled via `.chain()`: ```python -from graphistry import n, e_forward, e_reverse, e_undirected +from graphistry import n, e_forward, e_reverse, e_undirected, is_in g2.chain([ n() ]) -g2.chain([ n({"v": 1, "y": True}) ]) +g2.chain([ n({"x": 1, "y": True}) ]), +g2.chain([ n({"z": is_in([1,2,4,'z'])}) ]), # multiple valid values g2.chain([ e_forward({"type": "x"}, hops=2) ]) # simple multi-hop g3 = g2.chain([ n(name="start"), # tag node matches diff --git a/graphistry/__init__.py b/graphistry/__init__.py index 64c2f29ba0..02e467afea 100644 --- a/graphistry/__init__.py +++ b/graphistry/__init__.py @@ -50,7 +50,8 @@ ) from graphistry.compute import ( - n, e_forward, e_reverse, e_undirected + n, e_forward, e_reverse, e_undirected, + is_in, IsIn ) from graphistry.Engine import Engine diff --git a/graphistry/compute/__init__.py b/graphistry/compute/__init__.py index 3c7c7f45e3..0b26750541 100644 --- a/graphistry/compute/__init__.py +++ b/graphistry/compute/__init__.py @@ -2,3 +2,4 @@ from .ast import ( n, e_forward, e_reverse, e_undirected ) +from .filter_by_dict import is_in, IsIn diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 2f08271c6c..c81d583bf9 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -1,7 +1,8 @@ -from typing import Any, Optional +from typing import Any, List, Optional import pandas as pd from graphistry.Plottable import Plottable +from .filter_by_dict import is_in, IsIn import logging logger = logging.getLogger(__name__) diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 4920b74c9f..0ff193f58e 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -141,6 +141,21 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: ]) print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ])) + **Example: Filter by multiple node types at each step using is_in** + + :: + + from graphistry.ast import n, e_forward, e_reverse, is_in + + g_risky = g.chain([ + n({"type": is_in(["person", "company"])}), + e_forward({"e_type": is_in(["owns", "reviews"])}, to_fixed=True), + n({"type": is_in(["transaction", "account"])}, name="hit"), + e_reverse(to_fixed=True), + n({"risk2": True}) + ]) + print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ])) + """ if len(ops) == 0: diff --git a/graphistry/compute/filter_by_dict.py b/graphistry/compute/filter_by_dict.py index 678353dcd9..5aa9ef77fc 100644 --- a/graphistry/compute/filter_by_dict.py +++ b/graphistry/compute/filter_by_dict.py @@ -1,9 +1,17 @@ -from typing import Optional, TYPE_CHECKING +from typing import Any, Dict, List, Optional, TYPE_CHECKING import pandas as pd from graphistry.Plottable import Plottable +class IsIn(): + def __init__(self, options: List[Any]) -> None: + self.options = options + +def is_in(options: List[Any]) -> IsIn: + return IsIn(options) + + def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: """ return df where rows match all values in filter_dict @@ -12,11 +20,25 @@ def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: if filter_dict is None or filter_dict == {}: return df - for col in filter_dict.keys(): + ins: Dict[str, IsIn] = {} + for col, val in filter_dict.items(): if col not in df.columns: raise ValueError(f'Key "{col}" not in columns of df, available columns are: {df.columns}') - - hits = (df[list(filter_dict)] == pd.Series(filter_dict)).all(axis=1) + if isinstance(val, IsIn): + ins[col] = val + filter_dict_concrete = filter_dict if not ins else { + k: v + for k, v in filter_dict.items() + if not isinstance(v, IsIn) + } + + if filter_dict_concrete: + hits = (df[list(filter_dict_concrete)] == pd.Series(filter_dict_concrete)).all(axis=1) + else: + hits = df[[]].assign(x=True).x + if ins: + for col, val in ins.items(): + hits = hits & df[col].isin(val.options) return df[hits] diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index f66f3151b5..5779b21056 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -2,7 +2,7 @@ from common import NoAuthTestCase from graphistry.tests.test_compute_hops import hops_graph -from graphistry.compute.ast import n, e_forward, e_reverse, e_undirected +from graphistry.compute.ast import n, e_forward, e_reverse, e_undirected, is_in class TestComputeChainMixin(NoAuthTestCase): @@ -100,3 +100,7 @@ def test_chain_named(self): assert sorted(g2._edges[ g2._edges.e2 ][g2._source].to_list()) == ["g", "l"] assert sorted(g2._edges[ g2._edges.e2 ][g2._destination].to_list()) == ["a", "b"] assert sorted(g2._nodes[ g2._nodes.n2 ][g2._node].to_list()) == ["a", "b"] + + def test_chain_is_in(self): + g = hops_graph() + assert g.chain([n({'node': is_in(['e', 'k'])})])._nodes.shape == (2, 2) diff --git a/graphistry/tests/test_compute_filter_by_dict.py b/graphistry/tests/test_compute_filter_by_dict.py index e76f0c1598..5babdd9211 100644 --- a/graphistry/tests/test_compute_filter_by_dict.py +++ b/graphistry/tests/test_compute_filter_by_dict.py @@ -1,7 +1,7 @@ import pandas as pd from functools import lru_cache -from graphistry.compute.filter_by_dict import filter_by_dict +from graphistry.compute.filter_by_dict import filter_by_dict, is_in, IsIn from graphistry.tests.test_compute import CGFull @lru_cache(maxsize=1) @@ -106,3 +106,17 @@ def test_kv_multiple_good(self): def test_kv_multiple_bad(self): g = hops_graph() assert g.filter_edges_by_dict({'i': -100, 'type': 'e'})._edges.equals(g._edges[:0]) + +class TestIsIn(object): + + def test_standalone(self): + g = hops_graph() + assert g.filter_nodes_by_dict({'node': is_in(['a'])})._nodes.equals(g._nodes[:1]) + assert g.filter_nodes_by_dict({'node': is_in(['a', 'b'])})._nodes.equals(g._nodes[:2]) + + def test_combined(self): + g = hops_graph() + assert g.filter_nodes_by_dict({'node': is_in(['a', 'b']), 'type': 'n'})._nodes.equals(g._nodes[:2]) + assert g.filter_nodes_by_dict({'node': is_in(['a', 'b']), 'type': 'bad'})._nodes.equals(g._nodes[:0]) + assert g.filter_nodes_by_dict({'node': is_in(['a', 'bad']), 'type': 'n'})._nodes.equals(g._nodes[:1]) + assert g.filter_nodes_by_dict({'node': is_in(['a', 'bad']), 'type': 'bad'})._nodes.equals(g._nodes[:0]) diff --git a/graphistry/tests/test_compute_hops.py b/graphistry/tests/test_compute_hops.py index 02c696472b..5db33ff6c1 100644 --- a/graphistry/tests/test_compute_hops.py +++ b/graphistry/tests/test_compute_hops.py @@ -2,9 +2,9 @@ from common import NoAuthTestCase from functools import lru_cache +from graphistry.compute.filter_by_dict import is_in from graphistry.tests.test_compute import CGFull - @lru_cache(maxsize=1) def hops_graph(): nodes_df = pd.DataFrame([ @@ -180,3 +180,7 @@ def test_hop_filter_types(self): destination_node_match={'t': 'm'}) assert g5a._nodes.shape == (2, 2) assert g5a._edges.shape == (1, 3) + + def test_is_in(self): + g = hops_graph() + assert g.hop(source_node_match={'node': is_in(['e', 'k'])})._edges.shape == (3, 3) From fb4a038eb469213bd163fe08f8d77ae2493473a1 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 1 Dec 2023 03:46:08 -0800 Subject: [PATCH 039/104] fix(docs): IsIn --- docs/source/conf.py | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/conf.py b/docs/source/conf.py index 5b421716ad..b7748c38a2 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -68,6 +68,7 @@ ('py:class', 'graphistry.embed_utils.HeterographEmbedModuleMixin'), ('py:class', 'graphistry.PlotterBase.PlotterBase'), ('py:class', 'graphistry.compute.ast.ASTObject'), + ('py:class', 'graphistry.compute.filter_by_dict.IsIn'), ('py:class', 'Plotter'), ('py:class', 'Plottable'), ('py:class', 'CuGraphKind'), From 9aae0f9bd0fa718371446a53e49c73e9cc863d49 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 1 Dec 2023 19:26:22 -0800 Subject: [PATCH 040/104] fix(logger defaults): do not override default to DEBUG --- CHANGELOG.md | 1 + graphistry/compute/ast.py | 2 +- graphistry/compute/chain.py | 2 +- graphistry/compute/collapse.py | 4 ++-- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c227f37572..7c3f1b0b17 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Fixed * chain/hop: source_node_match was being mishandled when multiple node attributes exist +* compute logging no longer default-overrides level to DEBUG ## [0.29.7 - 2023-11-02] diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index c81d583bf9..68e447d660 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -6,7 +6,7 @@ import logging logger = logging.getLogger(__name__) -logger.setLevel(logging.DEBUG) +#logger.setLevel(logging.DEBUG) ############################################################################## diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 0ff193f58e..6d68a1e14e 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -7,7 +7,7 @@ import logging logger = logging.getLogger(__name__) -logger.setLevel(logging.DEBUG) +#logger.setLevel(logging.DEBUG) ############################################################################### diff --git a/graphistry/compute/collapse.py b/graphistry/compute/collapse.py index e9b06e512c..ad29a6f4c1 100644 --- a/graphistry/compute/collapse.py +++ b/graphistry/compute/collapse.py @@ -4,12 +4,12 @@ from graphistry.PlotterBase import Plottable logger = logging.getLogger("collapse") -logger.setLevel(logging.DEBUG) +#logger.setLevel(logging.DEBUG) # create console handler and set level to debug # best for development or debugging consoleHandler = logging.StreamHandler() -consoleHandler.setLevel(logging.DEBUG) +#consoleHandler.setLevel(logging.DEBUG) # create formatter formatter = logging.Formatter(': %(message)s') From b3ebd2272f49a202e80f2ea9db0ca9c5760a8950 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 2 Dec 2023 18:37:52 -0800 Subject: [PATCH 041/104] refactor(ast reverse): more legible, maybe a fix --- graphistry/compute/ast.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 68e447d660..4eb7a1ba79 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -136,6 +136,12 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame]) -> def reverse(self) -> 'ASTEdge': # updates both edges and nodes + if self._direction == 'reverse': + direction = 'forward' + elif self._direction == 'forward': + direction = 'reverse' + else: + direction = 'undirected' return ASTEdge( direction=( 'forward' if self._direction == 'reverse' else 'reverse' From 8989e3be6e88051dcd2005ca26e84e2797acdbaa Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 2 Dec 2023 18:38:07 -0800 Subject: [PATCH 042/104] refactor(ast reverse): more legible, maybe a fix --- graphistry/compute/ast.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 4eb7a1ba79..1f526fef8c 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -143,9 +143,7 @@ def reverse(self) -> 'ASTEdge': else: direction = 'undirected' return ASTEdge( - direction=( - 'forward' if self._direction == 'reverse' else 'reverse' - ) if self._direction != 'undirected' else 'undirected', + direction=direction, edge_match=self._edge_match, hops=self._hops, to_fixed_point=self._to_fixed_point, From 26caa13fc2743d5ab8e33a70bfeb10f763480301 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 2 Dec 2023 18:39:57 -0800 Subject: [PATCH 043/104] refactor(chain): make step clearer --- graphistry/compute/chain.py | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 6d68a1e14e..cd2220816d 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -190,14 +190,17 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: #forwards g_stack : List[Plottable] = [] for op in ops: + prev_step_nodes = ( # start from only prev step's wavefront node + None # first uses full graph + if len(g_stack) == 0 + else g_stack[-1]._nodes + ) g_step = ( op( - g=g, - prev_node_wavefront=( - None # first uses full graph - if len(g_stack) == 0 - else g_stack[-1]._nodes - ))) + g=g, # transition via any original edge + prev_node_wavefront=prev_step_nodes, + ) + ) g_stack.append(g_step) encountered_nodes_df = pd.concat([ From ad3567dbc0af5cac1477306ed960b81b34e2a2cf Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 2 Dec 2023 18:43:49 -0800 Subject: [PATCH 044/104] refactor(chain backwards): clearer indexing --- graphistry/compute/chain.py | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index cd2220816d..464d630aa6 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -210,9 +210,14 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: logger.debug('============ BACKWARDS ============') - #backwards - g_stack_reverse : List[Plottable] = [g_stack[-1]] + g_stack_reverse : List[Plottable] = [] for (op, g_step) in zip(reversed(ops), reversed(g_stack)): + prev_loop_step = g_stack[-1] if len(g_stack_reverse) == 0 else g_stack_reverse[-1] + if len(g_stack_reverse) == len(g_stack) - 1: + prev_orig_step = None + else: + prev_orig_step = g_stack[-(len(g_stack_reverse) + 2)] + assert prev_loop_step._nodes is not None g_step_reverse = ( (op.reverse())( @@ -227,10 +232,10 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: g_stack_reverse.append(g_step_reverse) logger.debug('============ COMBINE NODES ============') - final_nodes_df = combine_steps(g, 'nodes', list(zip(reversed(ops), g_stack_reverse[1:]))) + final_nodes_df = combine_steps(g, 'nodes', list(zip(ops, reversed(g_stack_reverse)))) logger.debug('============ COMBINE EDGES ============') - final_edges_df = combine_steps(g, 'edges', list(zip(reversed(ops), g_stack_reverse[1:]))) + final_edges_df = combine_steps(g, 'edges', list(zip(ops, reversed(g_stack_reverse)))) if added_edge_index: final_edges_df = final_edges_df.drop(columns=['index']) From 0cdbaee80719200185dfc4f3e40cd4936f722ac1 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sat, 2 Dec 2023 18:55:24 -0800 Subject: [PATCH 045/104] fix(chain): add target_wave_front check --- graphistry/Plottable.py | 3 +- graphistry/compute/ast.py | 16 +- graphistry/compute/chain.py | 21 +- graphistry/compute/hop.py | 28 ++- graphistry/tests/test_compute_chain.py | 300 +++++++++++++++++++++++++ 5 files changed, 351 insertions(+), 17 deletions(-) diff --git a/graphistry/Plottable.py b/graphistry/Plottable.py index 1f20ca35a6..4157781e46 100644 --- a/graphistry/Plottable.py +++ b/graphistry/Plottable.py @@ -209,7 +209,8 @@ def hop(self, edge_match: Optional[dict] = None, source_node_match: Optional[dict] = None, destination_node_match: Optional[dict] = None, - return_as_wave_front: bool = False + return_as_wave_front: bool = False, + target_wave_front: Optional[pd.DataFrame] = None ) -> 'Plottable': if 1 + 1: raise RuntimeError('should not happen') diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 1f526fef8c..38e91766d1 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -1,4 +1,4 @@ -from typing import Any, List, Optional +from typing import Any, List, Optional, cast import pandas as pd from graphistry.Plottable import Plottable @@ -20,7 +20,7 @@ def __init__(self, name: Optional[str] = None): self._name = name pass - def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame]) -> Plottable: + def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: raise RuntimeError('__call__ not implemented') def reverse(self) -> 'ASTObject': @@ -45,12 +45,17 @@ def __init__(self, filter_dict: Optional[dict] = None, name: Optional[str] = Non def __repr__(self) -> str: return f'ASTNode(filter_dict={self._filter_dict}, name={self._name})' - def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame]) -> Plottable: + def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: out_g = (g .nodes(prev_node_wavefront if prev_node_wavefront is not None else g._nodes) .filter_nodes_by_dict(self._filter_dict) .edges(g._edges[:0]) ) + if target_wave_front is not None: + assert g._node is not None + reduced_nodes = cast(pd.DataFrame, out_g._nodes).merge(target_wave_front[[g._node]], on=g._node, how='inner') + out_g = out_g.nodes(reduced_nodes) + if self._name is not None: out_g = out_g.nodes(out_g._nodes.assign(**{self._name: True})) @@ -111,7 +116,7 @@ def __init__( def __repr__(self) -> str: return f'ASTEdge(direction={self._direction}, edge_match={self._edge_match}, hops={self._hops}, to_fixed_point={self._to_fixed_point}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, name={self._name})' - def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame]) -> Plottable: + def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: out_g = g.hop( nodes=prev_node_wavefront, @@ -121,7 +126,8 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame]) -> source_node_match=self._source_node_match, edge_match=self._edge_match, destination_node_match=self._destination_node_match, - return_as_wave_front=True + return_as_wave_front=True, + target_wave_front=target_wave_front ) if self._name is not None: diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 464d630aa6..4efdc47647 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -30,8 +30,14 @@ def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable logger.debug('EDGES << recompute forwards given reduced set') steps = [ ( - op, - op(g=g.edges(g_step._edges), prev_node_wavefront=g_step._nodes) + op, # forward op + op( + g=g.edges(g_step._edges), # transition via any found edge + prev_node_wavefront=g_step._nodes, # start from where backwards step says is reachable + + #target_wave_front=steps[i+1][1]._nodes # end at where next backwards step says is reachable + target_wave_front=None # ^^^ optimization: valid transitions already limit to known-good ones + ) ) for (op, g_step) in steps ] @@ -199,6 +205,7 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: op( g=g, # transition via any original edge prev_node_wavefront=prev_step_nodes, + target_wave_front=None # implicit any ) ) g_stack.append(g_step) @@ -221,12 +228,16 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: g_step_reverse = ( (op.reverse())( - # all encountered nodes + step's edges - g=g_step.nodes(encountered_nodes_df), + # Edges: edges used in step (subset matching prev_node_wavefront will be returned) + # Nodes: nodes reached in step (subset matching prev_node_wavefront will be returned) + g=g_step, # check for hits against fully valid targets - prev_node_wavefront=g_stack_reverse[-1]._nodes + # ast will replace g.node() with this as its starting points + prev_node_wavefront=prev_loop_step._nodes, + # only allow transitions to these nodes (vs prev_node_wavefront) + target_wave_front=prev_orig_step._nodes if prev_orig_step is not None else None ) ) g_stack_reverse.append(g_step_reverse) diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 396715515e..68b6c1479c 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -6,27 +6,37 @@ def hop(self: Plottable, - nodes: Optional[pd.DataFrame] = None, + nodes: Optional[pd.DataFrame] = None, # chain: incoming wavefront hops: Optional[int] = 1, to_fixed_point: bool = False, direction: str = 'forward', edge_match: Optional[dict] = None, source_node_match: Optional[dict] = None, destination_node_match: Optional[dict] = None, - return_as_wave_front = False + return_as_wave_front = False, + target_wave_front: Optional[pd.DataFrame] = None # chain: limit hits to these for reverse pass ) -> Plottable: """ Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). - hops: how many hops to consider, if any bound (default 1) + hops: consider paths of length 1 to 'hops' steps, if any (default 1). to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: 'forward', 'reverse', 'undirected' edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) source_node_match: dict of kv-pairs to match nodes before hopping destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use) + target_wave_front: Only consider these nodes for reachability (primarily for internal use by reverse pass) + """ + + """ + When called by chain() during reverse phase: + - return_as_wave_front: True + - this hop will be `op.reverse()` + - nodes will be the wavefront of the next step + """ if not to_fixed_point and not isinstance(hops, int): @@ -82,8 +92,9 @@ def hop(self: Plottable, new_node_ids_forward = hop_edges_forward[[g2._destination]].rename(columns={g2._destination: g2._node}).drop_duplicates() if destination_node_match is not None: + base_nodes = target_wave_front if target_wave_front is not None else g2._nodes new_node_ids_forward = filter_by_dict( - g2._nodes.merge(new_node_ids_forward, on=g2._node, how='inner'), + base_nodes.merge(new_node_ids_forward, on=g2._node, how='inner'), destination_node_match )[[g2._node]] hop_edges_forward = hop_edges_forward.merge( @@ -105,8 +116,9 @@ def hop(self: Plottable, new_node_ids_reverse = hop_edges_reverse[[g2._source]].rename(columns={g2._source: g2._node}).drop_duplicates() if destination_node_match is not None: + base_nodes = target_wave_front if target_wave_front is not None else g2._nodes new_node_ids_reverse = filter_by_dict( - g2._nodes.merge(new_node_ids_reverse, on=g2._node, how='inner'), + base_nodes.merge(new_node_ids_reverse, on=g2._node, how='inner'), destination_node_match )[[g2._node]] hop_edges_reverse = hop_edges_reverse.merge( @@ -161,7 +173,11 @@ def hop(self: Plottable, #hydrate nodes if self._nodes is not None: - final_nodes = self._nodes.merge( + if target_wave_front is not None: + rich_nodes = target_wave_front + else: + rich_nodes = self._nodes + final_nodes = rich_nodes.merge( matches_nodes if matches_nodes is not None else wave_front[:0], on=self._node, how='inner') diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index 5779b21056..e65da06e4d 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -1,9 +1,29 @@ +from functools import lru_cache +from typing import Dict, List import pandas as pd from common import NoAuthTestCase +from graphistry.tests.test_compute import CGFull from graphistry.tests.test_compute_hops import hops_graph from graphistry.compute.ast import n, e_forward, e_reverse, e_undirected, is_in +import logging +logger = logging.getLogger() +logger.setLevel(logging.DEBUG) + + +@lru_cache(maxsize=1) +def chain_graph(): + return CGFull().edges( + pd.DataFrame({ + 's': ['a', 'b', 'c'], + 'd': ['b', 'c', 'd'] + }), + 's', 'd' + ).nodes( + pd.DataFrame({'n': ['a', 'b', 'c', 'd']}), + 'n' + ) class TestComputeChainMixin(NoAuthTestCase): @@ -104,3 +124,283 @@ def test_chain_named(self): def test_chain_is_in(self): g = hops_graph() assert g.chain([n({'node': is_in(['e', 'k'])})])._nodes.shape == (2, 2) + + def test_post_hop_node_match(self): + + ns = pd.DataFrame({ + 'n': [1, 5], + 'category': ['Port', 'Other'], + }) + + es = pd.DataFrame({ + 's': [1, 1], + 'd': [1, 5] + }) + + g = CGFull().edges(es, 's', 'd').nodes(ns, 'n') + + g2 = g.chain([ + n({'category': 'Port'}), + e_undirected(), + n({'category': 'Port'}) + ]) + assert len(g2._nodes) == 1 + + +def compare_graphs(g, nodes: List[Dict[str, str]], edges: List[Dict[str, str]]) -> None: + assert g._nodes.sort_values(by='n').to_dict(orient='records') == nodes + assert g._edges.sort_values(by=['s', 'd']).to_dict(orient='records') == edges + + +class TestComputeChainWavefront1Mixin(NoAuthTestCase): + """ + Test individual steps for 0-hop and 1-hop + """ + + def test_hop_chain_0(self): + + g = chain_graph() + + g2 = g.chain([ + n({'n': 'a'}) + ]) + + assert g2._nodes.to_dict(orient='records') == [{'n': 'a'}] + assert g2._edges.to_dict(orient='records') == [] + + g3 = g.chain([ + n({'n': 'd'}) + ]) + + assert g3._nodes.to_dict(orient='records') == [{'n': 'd'}] + assert g3._edges.to_dict(orient='records') == [] + + def test_hop_chain_1_forward(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'b'}] + g_out_nodes = [{'n': 'a'}, {'n': 'b'}] + g_out_edges = [{'s': 'a', 'd': 'b'}] + + g2_forward = g.hop( + nodes = pd.DataFrame({'n': ['a']}), + hops = 1, + to_fixed_point = False, + direction = 'forward', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g2_forward, g_out_nodes_hop, g_out_edges) + + g2_forward_triple = g.chain([ + e_forward({}, source_node_match={'n': 'a'}, hops=1) + ]) + compare_graphs(g2_forward_triple, g_out_nodes, g_out_edges) + + g2_forward_chain = g.chain([ + n({'n': 'a'}), + e_forward({}, hops=1) + ]) + compare_graphs(g2_forward_chain, g_out_nodes, g_out_edges) + + g2_forward_chain_closed = g.chain([ + n({'n': 'a'}), + e_forward({}, hops=1), + n({}) + ]) + compare_graphs(g2_forward_chain_closed, g_out_nodes, g_out_edges) + + def test_hop_chain_1_reverse(self): + + g = chain_graph() + + g_out_nodes_hop = [] + g_out_nodes = [] + g_out_edges = [] + + g2_reverse = g.hop( + nodes = pd.DataFrame({'n': ['a']}), + hops = 1, + to_fixed_point = False, + direction = 'reverse', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g2_reverse, g_out_nodes_hop, g_out_edges) + + g2_reverse_triple = g.chain([ + e_reverse({}, source_node_match={'n': 'a'}, hops=1) + ]) + compare_graphs(g2_reverse_triple, g_out_nodes, g_out_edges) + + g2_reverse_chain = g.chain([ + n({'n': 'a'}), + e_reverse({}, hops=1) + ]) + compare_graphs(g2_reverse_chain, g_out_nodes, g_out_edges) + + g2_reverse_chain_closed = g.chain([ + n({'n': 'a'}), + e_reverse({}, hops=1), + n({}) + ]) + compare_graphs(g2_reverse_chain_closed, g_out_nodes, g_out_edges) + + def test_hop_chain_1_undirected(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'b'}] + g_out_nodes = [{'n': 'a'}, {'n': 'b'}] + g_out_edges = [{'s': 'a', 'd': 'b'}] + + g2_undirected = g.hop( + nodes = pd.DataFrame({'n': ['a']}), + hops = 1, + to_fixed_point = False, + direction = 'undirected', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g2_undirected, g_out_nodes_hop, g_out_edges) + + g2_undirected_triple = g.chain([ + e_undirected({}, source_node_match={'n': 'a'}, hops=1) + ]) + compare_graphs(g2_undirected_triple, g_out_nodes, g_out_edges) + + g2_undirected_chain = g.chain([ + n({'n': 'a'}), + e_undirected({}, hops=1) + ]) + compare_graphs(g2_undirected_chain, g_out_nodes, g_out_edges) + + g2_undirected_chain_closed = g.chain([ + n({'n': 'a'}), + e_undirected({}, hops=1), + n({}) + ]) + compare_graphs(g2_undirected_chain_closed, g_out_nodes, g_out_edges) + + def test_hop_chain_1_end_forward(self): + + g = chain_graph() + + g_out_nodes_hop = [] + g_out_nodes = [] + g_out_edges = [] + + g3_forward = g.hop( + nodes = pd.DataFrame({'n': ['d']}), + hops = 2, + to_fixed_point = False, + direction = 'forward', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g3_forward, g_out_nodes_hop, g_out_edges) + + g3_forward_triple = g.chain([ + e_forward({}, source_node_match={'n': 'd'}, hops=1) + ]) + compare_graphs(g3_forward_triple, g_out_nodes, g_out_edges) + + g3_forward_chain = g.chain([ + n({'n': 'd'}), + e_forward({}, hops=1) + ]) + compare_graphs(g3_forward_chain, g_out_nodes, g_out_edges) + + g3_forward_chain_closed = g.chain([ + n({'n': 'd'}), + e_forward({}, hops=1), + n({}) + ]) + compare_graphs(g3_forward_chain_closed, g_out_nodes, g_out_edges) + + def test_hop_chain_1_end_reverse(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'c'}] + g_out_nodes = [{'n': 'c'}, {'n': 'd'}] + g_out_edges = [{'s': 'c', 'd': 'd'}] + + g3_reverse = g.hop( + nodes = pd.DataFrame({'n': ['d']}), + hops = 1, + to_fixed_point = False, + direction = 'reverse', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g3_reverse, g_out_nodes_hop, g_out_edges) + + g3_reverse_triple = g.chain([ + e_reverse({}, source_node_match={'n': 'd'}, hops=1) + ]) + compare_graphs(g3_reverse_triple, g_out_nodes, g_out_edges) + + g3_reverse_chain = g.chain([ + n({'n': 'd'}), + e_reverse({}, hops=1) + ]) + compare_graphs(g3_reverse_chain, g_out_nodes, g_out_edges) + + g3_reverse_chain_closed = g.chain([ + n({'n': 'd'}), + e_reverse({}, hops=1), + n({}) + ]) + compare_graphs(g3_reverse_chain_closed, g_out_nodes, g_out_edges) + + def test_hop_chain_1_end_undirected(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'c'}] + g_out_nodes = [{'n': 'c'}, {'n': 'd'}] + g_out_edges = [{'s': 'c', 'd': 'd'}] + + g3_undirected = g.hop( + nodes = pd.DataFrame({'n': ['d']}), + hops = 1, + to_fixed_point = False, + direction = 'undirected', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g3_undirected, g_out_nodes_hop, g_out_edges) + + g3_undirected_triple = g.chain([ + e_undirected({}, source_node_match={'n': 'd'}, hops=1) + ]) + compare_graphs(g3_undirected_triple, g_out_nodes, g_out_edges) + + g3_undirected_chain = g.chain([ + n({'n': 'd'}), + e_undirected({}, hops=1) + ]) + compare_graphs(g3_undirected_chain, g_out_nodes, g_out_edges) + + g3_undirected_chain_closed = g.chain([ + n({'n': 'd'}), + e_undirected({}, hops=1), + n({}) + ]) + compare_graphs(g3_undirected_chain_closed, g_out_nodes, g_out_edges) + + From edc1f11379af7551da79190755cc8c989e937fa5 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 16:05:25 -0800 Subject: [PATCH 046/104] fix(test): order-agnostic --- graphistry/tests/test_compute_chain.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index e65da06e4d..a02af0a59f 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -101,7 +101,7 @@ def test_chain_multi(self): e_forward({}, hops=1) ]) - assert g2b._nodes.equals(g2a._nodes) + assert g2b._nodes.sort_values(by=['node']).reset_index(drop=True).equals(g2a._nodes.sort_values(by=['node']).reset_index(drop=True)) assert g2b._edges.equals(g2b._edges) def test_chain_named(self): From 149dad5cac0f6afcab22b5afcda5b30cde3faabe Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 16:05:46 -0800 Subject: [PATCH 047/104] test(hop): 2-step --- graphistry/tests/test_compute_chain.py | 242 +++++++++++++++++++++++++ 1 file changed, 242 insertions(+) diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index a02af0a59f..839e32c7fc 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -404,3 +404,245 @@ def test_hop_chain_1_end_undirected(self): compare_graphs(g3_undirected_chain_closed, g_out_nodes, g_out_edges) +class TestComputeChainWavefront2Mixin(NoAuthTestCase): + """ + Test individual steps for 2-hop + """ + + def test_hop_chain_2(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'b'}, {'n': 'c'}] + g_out_nodes = [{'n': 'a'}, {'n': 'b'}, {'n': 'c'}] + g_out_edges = [{'s': 'a', 'd': 'b'}, {'s': 'b', 'd': 'c'}] + + g2_forward = g.hop( + nodes = pd.DataFrame({'n': ['a']}), + hops = 2, + to_fixed_point = False, + direction = 'forward', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g2_forward, g_out_nodes_hop, g_out_edges) + + # source _node_match would require each hop to start with {'n': 'a'} + #g2_forward_triple = g.chain([ + # e_forward({}, source_node_match={'n': 'a'}, hops=2) + #]) + #compare_graphs(g2_forward_triple, g_out_nodes, g_out_edges) + + g2_forward_chain = g.chain([ + n({'n': 'a'}), + e_forward({}, hops=2) + ]) + compare_graphs(g2_forward_chain, g_out_nodes, g_out_edges) + + g2_forward_chain_closed = g.chain([ + n({'n': 'a'}), + e_forward({}, hops=2), + n({}) + ]) + compare_graphs(g2_forward_chain_closed, g_out_nodes, g_out_edges) + + + def test_hop_chain_2_reverse(self): + + g = chain_graph() + + g_out_nodes_hop = [] + g_out_nodes = [] + g_out_edges = [] + + g2_reverse = g.hop( + nodes = pd.DataFrame({'n': ['a']}), + hops = 2, + to_fixed_point = False, + direction = 'reverse', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g2_reverse, g_out_nodes_hop, g_out_edges) + + # source _node_match would require each hop to start with {'n': 'a'} + #g2_reverse_triple = g.chain([ + # e_reverse({}, source_node_match={'n': 'a'}, hops=2) + #]) + #compare_graphs(g2_reverse_triple, g_out_nodes, g_out_edges) + + g2_reverse_chain = g.chain([ + n({'n': 'a'}), + e_reverse({}, hops=2) + ]) + compare_graphs(g2_reverse_chain, g_out_nodes, g_out_edges) + + g2_reverse_chain_closed = g.chain([ + n({'n': 'a'}), + e_reverse({}, hops=2), + n({}) + ]) + compare_graphs(g2_reverse_chain_closed, g_out_nodes, g_out_edges) + + def test_hop_chain_2_undirected(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'a'}, {'n': 'b'}, {'n': 'c'}] + g_out_nodes = [{'n': 'a'}, {'n': 'b'}, {'n': 'c'}] + g_out_edges = [{'s': 'a', 'd': 'b'}, {'s': 'b', 'd': 'c'}] + + g2_undirected = g.hop( + nodes = pd.DataFrame({'n': ['a']}), + hops = 2, + to_fixed_point = False, + direction = 'undirected', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g2_undirected, g_out_nodes_hop, g_out_edges) + + # source _node_match would require each hop to start with {'n': 'a'} + #g2_undirected_triple = g.chain([ + # e_undirected({}, source_node_match={'n': 'a'}, hops=2) + #]) + #compare_graphs(g2_undirected_triple, g_out_nodes, g_out_edges) + + g2_undirected_chain = g.chain([ + n({'n': 'a'}), + e_undirected({}, hops=2) + ]) + compare_graphs(g2_undirected_chain, g_out_nodes, g_out_edges) + + g2_undirected_chain_closed = g.chain([ + n({'n': 'a'}), + e_undirected({}, hops=2), + n({}) + ]) + compare_graphs(g2_undirected_chain_closed, g_out_nodes, g_out_edges) + + + def test_hop_chain_2_end(self): + + g = chain_graph() + + g_out_nodes_hop = [] + g_out_nodes = [] + g_out_edges = [] + + g3_forward = g.hop( + nodes = pd.DataFrame({'n': ['d']}), + hops = 2, + to_fixed_point = False, + direction = 'forward', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g3_forward, g_out_nodes_hop, g_out_edges) + + # source _node_match would require each hop to start with {'n': 'd'} + #g3_forward_triple = g.chain([ + # e_forward({}, source_node_match={'n': 'd'}, hops=2) + #]) + #compare_graphs(g3_forward_triple, g_out_nodes, g_out_edges) + + g3_forward_chain = g.chain([ + n({'n': 'd'}), + e_forward({}, hops=2) + ]) + compare_graphs(g3_forward_chain, g_out_nodes, g_out_edges) + + g3_forward_chain_closed = g.chain([ + n({'n': 'd'}), + e_forward({}, hops=2), + n({}) + ]) + compare_graphs(g3_forward_chain_closed, g_out_nodes, g_out_edges) + + + def test_hop_chain_2_end_reverse(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'b'}, {'n': 'c'}] + g_out_nodes = [{'n': 'b'}, {'n': 'c'}, {'n': 'd'}] + g_out_edges = [{'s': 'b', 'd': 'c'}, {'s': 'c', 'd': 'd'}] + + g3_reverse = g.hop( + nodes = pd.DataFrame({'n': ['d']}), + hops = 2, + to_fixed_point = False, + direction = 'reverse', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g3_reverse, g_out_nodes_hop, g_out_edges) + + # source _node_match would require each hop to start with {'n': 'd'} + # g3_reverse_triple = g.chain([ + # e_reverse({}, source_node_match={'n': 'd'}, hops=2) + #]) + #compare_graphs(g3_reverse_triple, g_out_nodes, g_out_edges) + + g3_reverse_chain = g.chain([ + n({'n': 'd'}), + e_reverse({}, hops=2) + ]) + compare_graphs(g3_reverse_chain, g_out_nodes, g_out_edges) + + g3_reverse_chain_closed = g.chain([ + n({'n': 'd'}), + e_reverse({}, hops=2), + n({}) + ]) + compare_graphs(g3_reverse_chain_closed, g_out_nodes, g_out_edges) + + + def test_hop_chain_2_end_undirected(self): + + g = chain_graph() + + g_out_nodes_hop = [{'n': 'b'}, {'n': 'c'}, {'n': 'd'}] + g_out_nodes = [{'n': 'b'}, {'n': 'c'}, {'n': 'd'}] + g_out_edges = [{'s': 'b', 'd': 'c'}, {'s': 'c', 'd': 'd'}] + + g3_undirected = g.hop( + nodes = pd.DataFrame({'n': ['d']}), + hops = 2, + to_fixed_point = False, + direction = 'undirected', + source_node_match = None, + edge_match = None, + destination_node_match = None, + return_as_wave_front = True + ) + compare_graphs(g3_undirected, g_out_nodes_hop, g_out_edges) + + # source _node_match would require each hop to start with {'n': 'd'} + #g3_undirected_triple = g.chain([ + # e_undirected({}, source_node_match={'n': 'd'}, hops=2) + #]) + #compare_graphs(g3_undirected_triple, g_out_nodes, g_out_edges) + + g3_undirected_chain = g.chain([ + n({'n': 'd'}), + e_undirected({}, hops=2) + ]) + compare_graphs(g3_undirected_chain, g_out_nodes, g_out_edges) + + g3_undirected_chain_closed = g.chain([ + n({'n': 'd'}), + e_undirected({}, hops=2), + n({}) + ]) + compare_graphs(g3_undirected_chain_closed, g_out_nodes, g_out_edges) From 7c9bf31b315b321ab7fd5ec4161c744d024a3933 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 16:08:28 -0800 Subject: [PATCH 048/104] fix(hop): source_node_match within multi-hop not just first hop --- graphistry/compute/hop.py | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 68b6c1479c..049fed8b2e 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -69,21 +69,36 @@ def hop(self: Plottable, raise ValueError('Source and destination binding cannot be None, please set g._source and g._destination via bind() or edges()') hops_remaining = hops - wave_front = filter_by_dict(nodes, source_node_match)[[ g2._node ]] + + wave_front = nodes[[g2._node]][:0] + matches_nodes = None matches_edges = edges_indexed[[EDGE_ID]][:0] + #richly-attributed subset for dest matching & return-enriching + base_target_nodes = target_wave_front if target_wave_front is not None else g2._nodes + + first_iter = True while True: if not to_fixed_point and hops_remaining is not None: if hops_remaining < 1: break hops_remaining = hops_remaining - 1 + + assert len(wave_front.columns) == 1, "just indexes" + wave_front_iter : pd.DataFrame = ( + filter_by_dict( + nodes if first_iter else wave_front.merge(nodes, on=g2._node, how='left'), + source_node_match + )[[ g2._node ]] + ) + first_iter = False hop_edges_forward = None new_node_ids_forward = None if direction in ['forward', 'undirected']: hop_edges_forward = ( - wave_front.merge( + wave_front_iter.merge( edges_indexed[[g2._source, g2._destination, EDGE_ID]].assign(**{g2._node: edges_indexed[g2._source]}), how='inner', on=g2._node) @@ -94,7 +109,7 @@ def hop(self: Plottable, if destination_node_match is not None: base_nodes = target_wave_front if target_wave_front is not None else g2._nodes new_node_ids_forward = filter_by_dict( - base_nodes.merge(new_node_ids_forward, on=g2._node, how='inner'), + base_target_nodes.merge(new_node_ids_forward, on=g2._node, how='inner'), destination_node_match )[[g2._node]] hop_edges_forward = hop_edges_forward.merge( @@ -106,8 +121,9 @@ def hop(self: Plottable, hop_edges_reverse = None new_node_ids_reverse = None if direction in ['reverse', 'undirected']: + #TODO limit by target_wave_front if exists? hop_edges_reverse = ( - wave_front.merge( + wave_front_iter.merge( edges_indexed[[g2._destination, g2._source, EDGE_ID]].assign(**{g2._node: edges_indexed[g2._destination]}), how='inner', on=g2._node) @@ -118,7 +134,7 @@ def hop(self: Plottable, if destination_node_match is not None: base_nodes = target_wave_front if target_wave_front is not None else g2._nodes new_node_ids_reverse = filter_by_dict( - base_nodes.merge(new_node_ids_reverse, on=g2._node, how='inner'), + base_target_nodes.merge(new_node_ids_reverse, on=g2._node, how='inner'), destination_node_match )[[g2._node]] hop_edges_reverse = hop_edges_reverse.merge( From c2766631026f01d0d371b317d8ad0968032d7abc Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 16:09:15 -0800 Subject: [PATCH 049/104] docs(changelog) --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7c3f1b0b17..ffcbfdf5ff 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Fixed * chain/hop: source_node_match was being mishandled when multiple node attributes exist +* chain: backwards validation pass was too permissive; add `target_wave_front` check` +* hop: multi-hops with `source_node_match` specified was not checking intermediate hops * compute logging no longer default-overrides level to DEBUG ## [0.29.7 - 2023-11-02] From 8452edb5f11c62196ac1350d6efe698512bdd370 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 17:24:15 -0800 Subject: [PATCH 050/104] feat(hop chain): df query --- CHANGELOG.md | 4 ++ README.md | 54 ++++++++++++++++++++++-- graphistry/Plottable.py | 3 ++ graphistry/compute/ast.py | 58 ++++++++++++++++++++------ graphistry/compute/hop.py | 55 ++++++++++++++---------- graphistry/tests/test_compute_chain.py | 44 +++++++++++++++++++ graphistry/tests/test_compute_hops.py | 21 +++++++++- 7 files changed, 200 insertions(+), 39 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ffcbfdf5ff..452346e1d5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Added * chain/hop: `is_in()` membership predicate, `.chain([ n({'type': is_in(['a', 'b'])}) ])` +* hop: optional df queries - `hop(..., source_node_query='...', edge_query='...', destination_node_query='...')` +* chain: optional df queries: + - `chain([n(query='...')])` + - `chain([e_forward(..., source_node_query='...', edge_query='...', destination_node_query='...')])` ### Fixed diff --git a/README.md b/README.md index bf67192a51..70c8692891 100644 --- a/README.md +++ b/README.md @@ -1073,7 +1073,7 @@ g.addStyle(logo={ The below methods let you quickly manipulate graphs directly and with dataframe methods: Search, pattern mine, transform, and more: ```python -from graphistry import n, e_forward, e_reverse, e_undirected +from graphistry import n, e_forward, e_reverse, e_undirected, is_in g = (graphistry .edges(pd.DataFrame({ 's': ['a', 'b'], @@ -1101,19 +1101,41 @@ g2.plot() # nodes are values from cols s, d, k1 .hop( # filter to subgraph #almost all optional direction='forward', # 'reverse', 'undirected' - hops=1, # number or None if to_fixed_point + hops=2, # number (1..n hops, inclusive) or None if to_fixed_point to_fixed_point=False, - source_node_match={"k2": 0}, + + #every edge source node must match these + source_node_match={"k2": 0, "k3": is_in(['a', 'b', 3, 4])}, + source_node_query='k2 == 0', + + #every edge must match these edge_match={"k1": "x"}, - destination_node_match={"k2": 2}) + edge_query='k1 == "x"', + + #every edge destination node must match these + destination_node_match={"k2": 2}, + destination_node_query='k2 == 2 or k2 == 4', + ) .chain([ # filter to subgraph n(), n({'k2': 0, "m": 'ok'}), #specific values n({'type': is_in(["type1", "type2"])}), #multiple valid values + n(query='k2 == 0 or k2 == 4'), #dataframe query n(name="start"), # add column 'start':bool e_forward({'k1': 'x'}, hops=1), # same API as hop() e_undirected(name='second_edge'), + e_reverse( + {'k1': 'x'}, # edge property match + hops=2, # 1 to 2 hops + #same API as hop() + source_node_match={"k2": 2}, + source_node_query='k2 == 2 or k2 == 4', + edge_match={"k1": "x"}, + edge_query='k1 == "x"', + destination_node_match={"k2": 0}, + destination_node_query='k2 == 0') ]) + # replace as one node the node w/ given id + transitively connected nodes w/ col=attr .collapse(node='some_id', column='some_col', attribute='some val') ``` @@ -1126,6 +1148,30 @@ g = hg['graph'] # g._edges: | src, dst, user, email, org, time, ... | g.plot() ``` +```python +hg = graphistry.hypergraph( + df, + ['from_user', 'to_user', 'email', 'org'], + direct=True, + opts={ + + # when direct=True, can define src -> [ dst1, dst2, ...] edges + 'EDGES': { + 'org': ['from_user'], # org->from_user + 'from_user': ['email', 'to_user'], #from_user->email, from_user->to_user + }, + + 'CATEGORIES': { + # determine which columns share the same namespace for node generation: + # - if user 'louie' is both a from_user and to_user, show as 1 node + # - if a user & org are both named 'louie', they will appear as 2 different nodes + 'user': ['from_user', 'to_user'] + } +}) +g = hg['graph'] +g.plot() +``` + #### Generate node table ```python diff --git a/graphistry/Plottable.py b/graphistry/Plottable.py index 4157781e46..56cb22cc9b 100644 --- a/graphistry/Plottable.py +++ b/graphistry/Plottable.py @@ -209,6 +209,9 @@ def hop(self, edge_match: Optional[dict] = None, source_node_match: Optional[dict] = None, destination_node_match: Optional[dict] = None, + source_node_query: Optional[str] = None, + destination_node_query: Optional[str] = None, + edge_query: Optional[str] = None, return_as_wave_front: bool = False, target_wave_front: Optional[pd.DataFrame] = None ) -> 'Plottable': diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 38e91766d1..c7edcf9eb9 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -34,13 +34,14 @@ class ASTNode(ASTObject): """ Internal, not intended for use outside of this module. """ - def __init__(self, filter_dict: Optional[dict] = None, name: Optional[str] = None): + def __init__(self, filter_dict: Optional[dict] = None, name: Optional[str] = None, query: Optional[str] = None): super().__init__(name) if filter_dict == {}: filter_dict = None self._filter_dict = filter_dict + self._query = query def __repr__(self) -> str: return f'ASTNode(filter_dict={self._filter_dict}, name={self._name})' @@ -49,6 +50,7 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta out_g = (g .nodes(prev_node_wavefront if prev_node_wavefront is not None else g._nodes) .filter_nodes_by_dict(self._filter_dict) + .nodes(lambda g_dynamic: g_dynamic._nodes.query(self._query) if self._query is not None else g_dynamic._nodes) .edges(g._edges[:0]) ) if target_wave_front is not None: @@ -92,6 +94,9 @@ def __init__( to_fixed_point: bool = DEFAULT_FIXED_POINT, source_node_match: Optional[dict] = DEFAULT_FILTER_DICT, destination_node_match: Optional[dict] = DEFAULT_FILTER_DICT, + source_node_query: Optional[str] = None, + destination_node_query: Optional[str] = None, + edge_query: Optional[str] = None, name: Optional[str] = None ): @@ -112,9 +117,12 @@ def __init__( self._source_node_match = source_node_match self._edge_match = edge_match self._destination_node_match = destination_node_match + self._source_node_query = source_node_query + self._destination_node_query = destination_node_query + self._edge_query = edge_query def __repr__(self) -> str: - return f'ASTEdge(direction={self._direction}, edge_match={self._edge_match}, hops={self._hops}, to_fixed_point={self._to_fixed_point}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, name={self._name})' + return f'ASTEdge(direction={self._direction}, edge_match={self._edge_match}, hops={self._hops}, to_fixed_point={self._to_fixed_point}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: @@ -127,7 +135,10 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta edge_match=self._edge_match, destination_node_match=self._destination_node_match, return_as_wave_front=True, - target_wave_front=target_wave_front + target_wave_front=target_wave_front, + source_node_query=self._source_node_query, + destination_node_query=self._destination_node_query, + edge_query=self._edge_query ) if self._name is not None: @@ -154,7 +165,10 @@ def reverse(self) -> 'ASTEdge': hops=self._hops, to_fixed_point=self._to_fixed_point, source_node_match=self._destination_node_match, - destination_node_match=self._source_node_match + destination_node_match=self._source_node_match, + source_node_query=self._destination_node_query, + destination_node_query=self._source_node_query, + edge_query=self._edge_query ) e = ASTEdge # noqa: E305 @@ -168,7 +182,10 @@ def __init__(self, source_node_match: Optional[dict] = DEFAULT_FILTER_DICT, destination_node_match: Optional[dict] = DEFAULT_FILTER_DICT, to_fixed_point: bool = DEFAULT_FIXED_POINT, - name: Optional[str] = None + name: Optional[str] = None, + source_node_query: Optional[str] = None, + destination_node_query: Optional[str] = None, + edge_query: Optional[str] = None ): super().__init__( direction='forward', @@ -177,11 +194,14 @@ def __init__(self, source_node_match=source_node_match, destination_node_match=destination_node_match, to_fixed_point=to_fixed_point, - name=name + name=name, + source_node_query=source_node_query, + destination_node_query=destination_node_query, + edge_query=edge_query ) def __repr__(self) -> str: - return f'ASTEdgeForward(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name})' + return f'ASTEdgeForward(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' e_forward = ASTEdgeForward # noqa: E305 @@ -195,7 +215,10 @@ def __init__(self, source_node_match: Optional[dict] = DEFAULT_FILTER_DICT, destination_node_match: Optional[dict] = DEFAULT_FILTER_DICT, to_fixed_point: bool = DEFAULT_FIXED_POINT, - name: Optional[str] = None + name: Optional[str] = None, + source_node_query: Optional[str] = None, + destination_node_query: Optional[str] = None, + edge_query: Optional[str] = None ): super().__init__( direction='reverse', @@ -204,11 +227,14 @@ def __init__(self, source_node_match=source_node_match, destination_node_match=destination_node_match, to_fixed_point=to_fixed_point, - name=name + name=name, + source_node_query=source_node_query, + destination_node_query=destination_node_query, + edge_query=edge_query ) def __repr__(self) -> str: - return f'ASTEdgeReverse(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name})' + return f'ASTEdgeReverse(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' e_reverse = ASTEdgeReverse # noqa: E305 @@ -222,7 +248,10 @@ def __init__(self, source_node_match: Optional[dict] = DEFAULT_FILTER_DICT, destination_node_match: Optional[dict] = DEFAULT_FILTER_DICT, to_fixed_point: bool = DEFAULT_FIXED_POINT, - name: Optional[str] = None + name: Optional[str] = None, + source_node_query: Optional[str] = None, + destination_node_query: Optional[str] = None, + edge_query: Optional[str] = None ): super().__init__( direction='undirected', @@ -231,10 +260,13 @@ def __init__(self, source_node_match=source_node_match, destination_node_match=destination_node_match, to_fixed_point=to_fixed_point, - name=name + name=name, + source_node_query=source_node_query, + destination_node_query=destination_node_query, + edge_query=edge_query ) def __repr__(self) -> str: - return f'ASTEdgeUndirected(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name})' + return f'ASTEdgeUndirected(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' e_undirected = ASTEdgeUndirected # noqa: E305 diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 049fed8b2e..c0ade90e53 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -5,6 +5,12 @@ from .filter_by_dict import filter_by_dict +def query_if_not_none(query: Optional[str], df: pd.DataFrame) -> pd.DataFrame: + if query is None: + return df + return df.query(query) + + def hop(self: Plottable, nodes: Optional[pd.DataFrame] = None, # chain: incoming wavefront hops: Optional[int] = 1, @@ -13,6 +19,9 @@ def hop(self: Plottable, edge_match: Optional[dict] = None, source_node_match: Optional[dict] = None, destination_node_match: Optional[dict] = None, + source_node_query: Optional[str] = None, + destination_node_query: Optional[str] = None, + edge_query: Optional[str] = None, return_as_wave_front = False, target_wave_front: Optional[pd.DataFrame] = None # chain: limit hits to these for reverse pass ) -> Plottable: @@ -25,8 +34,11 @@ def hop(self: Plottable, to_fixed_point: keep hopping until no new nodes are found (ignores hops) direction: 'forward', 'reverse', 'undirected' edge_match: dict of kv-pairs to exact match (see also: filter_edges_by_dict) - source_node_match: dict of kv-pairs to match nodes before hopping + source_node_match: dict of kv-pairs to match nodes before hopping (including intermediate) destination_node_match: dict of kv-pairs to match nodes after hopping (including intermediate) + source_node_query: dataframe query to match nodes before hopping (including intermediate) + destination_node_query: dataframe query to match nodes after hopping (including intermediate) + edge_query: dataframe query to match edges before hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use) target_wave_front: Only consider these nodes for reachability (primarily for internal use by reverse pass) """ @@ -56,10 +68,10 @@ def hop(self: Plottable, if g2._edge is None: if 'index' in g2._edges.columns: raise ValueError('Edges cannot have column "index", please remove or set as g._edge via bind() or edges()') - edges_indexed = g2.filter_edges_by_dict(edge_match)._edges.reset_index() + edges_indexed = query_if_not_none(edge_query, g2.filter_edges_by_dict(edge_match)._edges).reset_index() EDGE_ID = 'index' else: - edges_indexed = g2.filter_edges_by_dict(edge_match)._edges + edges_indexed = query_if_not_none(edge_query, g2.filter_edges_by_dict(edge_match)._edges) EDGE_ID = g2._edge if g2._node is None: @@ -86,12 +98,13 @@ def hop(self: Plottable, hops_remaining = hops_remaining - 1 assert len(wave_front.columns) == 1, "just indexes" - wave_front_iter : pd.DataFrame = ( - filter_by_dict( - nodes if first_iter else wave_front.merge(nodes, on=g2._node, how='left'), - source_node_match - )[[ g2._node ]] - ) + wave_front_iter : pd.DataFrame = query_if_not_none( + source_node_query, + filter_by_dict( + nodes if first_iter else wave_front.merge(nodes, on=g2._node, how='left'), + source_node_match + ) + )[[ g2._node ]] first_iter = False hop_edges_forward = None @@ -106,12 +119,12 @@ def hop(self: Plottable, ) new_node_ids_forward = hop_edges_forward[[g2._destination]].rename(columns={g2._destination: g2._node}).drop_duplicates() - if destination_node_match is not None: - base_nodes = target_wave_front if target_wave_front is not None else g2._nodes - new_node_ids_forward = filter_by_dict( - base_target_nodes.merge(new_node_ids_forward, on=g2._node, how='inner'), - destination_node_match - )[[g2._node]] + new_node_ids_forward = query_if_not_none( + destination_node_query, + filter_by_dict( + base_target_nodes.merge(new_node_ids_forward, on=g2._node, how='inner'), + destination_node_match + ))[[g2._node]] hop_edges_forward = hop_edges_forward.merge( new_node_ids_forward.rename(columns={g2._node: g2._destination}), how='inner', @@ -131,12 +144,12 @@ def hop(self: Plottable, ) new_node_ids_reverse = hop_edges_reverse[[g2._source]].rename(columns={g2._source: g2._node}).drop_duplicates() - if destination_node_match is not None: - base_nodes = target_wave_front if target_wave_front is not None else g2._nodes - new_node_ids_reverse = filter_by_dict( - base_target_nodes.merge(new_node_ids_reverse, on=g2._node, how='inner'), - destination_node_match - )[[g2._node]] + new_node_ids_reverse = query_if_not_none( + destination_node_query, + filter_by_dict( + base_target_nodes.merge(new_node_ids_reverse, on=g2._node, how='inner'), + destination_node_match + ))[[g2._node]] hop_edges_reverse = hop_edges_reverse.merge( new_node_ids_reverse.rename(columns={g2._node: g2._source}), how='inner', diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index 839e32c7fc..55f112b840 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -646,3 +646,47 @@ def test_hop_chain_2_end_undirected(self): n({}) ]) compare_graphs(g3_undirected_chain_closed, g_out_nodes, g_out_edges) + +class TestComputeChainQuery(NoAuthTestCase): + + def test_node_query(self): + + g = chain_graph() + + g2 = g.chain([ + n(query='n == "a"') + ]) + + assert g2._nodes.to_dict(orient='records') == [{'n': 'a'}] + assert g2._edges.to_dict(orient='records') == [] + + def test_edge_query(self): + + g = chain_graph() + + g2 = g.chain([ + e_forward(edge_query='s == "a"') + ]) + + assert g2._nodes.to_dict(orient='records') == [{'n': 'a'}, {'n': 'b'}] + assert g2._edges.to_dict(orient='records') == [{'s': 'a', 'd': 'b'}] + + def test_edge_source_query(self): + + g = chain_graph() + + g2 = g.chain([ + e_forward(source_node_query='n == "a"') + ]) + assert g2._nodes.to_dict(orient='records') == [{'n': 'a'}, {'n': 'b'}] + assert g2._edges.to_dict(orient='records') == [{'s': 'a', 'd': 'b'}] + + def test_edge_destination_query(self): + + g = chain_graph() + + g2 = g.chain([ + e_forward(destination_node_query='n == "b"') + ]) + assert g2._nodes.to_dict(orient='records') == [{'n': 'a'}, {'n': 'b'}] + assert g2._edges.to_dict(orient='records') == [{'s': 'a', 'd': 'b'}] diff --git a/graphistry/tests/test_compute_hops.py b/graphistry/tests/test_compute_hops.py index 5db33ff6c1..01ff225c3e 100644 --- a/graphistry/tests/test_compute_hops.py +++ b/graphistry/tests/test_compute_hops.py @@ -49,7 +49,6 @@ def hops_graph(): return CGFull().nodes(nodes_df, 'node').edges(edges_df, 's', 'd') - class TestComputeHopMixin(NoAuthTestCase): @@ -184,3 +183,23 @@ def test_hop_filter_types(self): def test_is_in(self): g = hops_graph() assert g.hop(source_node_match={'node': is_in(['e', 'k'])})._edges.shape == (3, 3) + +class TestComputeHopMixinQuery(NoAuthTestCase): + + def test_hop_source_query(self): + g = hops_graph() + g2 = g.hop(source_node_query='node == "d"', direction='forward', hops=1) + assert g2._nodes.shape == (6, 2) + assert g2._edges.shape == (5, 3) + + def test_hop_destination_query(self): + g = hops_graph() + g2 = g.hop(destination_node_query='node == "d"', direction='reverse', hops=1) + assert g2._nodes.shape == (6, 2) + assert g2._edges.shape == (5, 3) + + def test_hop_edge_query(self): + g = hops_graph() + g2 = g.hop(edge_query='s == "d"', direction='forward', hops=1) + assert g2._nodes.shape == (6, 2) + assert g2._edges.shape == (5, 3) From 3c31c0d35043013d9adf2c113955455ec214711a Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 17:32:19 -0800 Subject: [PATCH 051/104] docs(hop): comments --- graphistry/compute/chain.py | 9 ++++++++- graphistry/compute/hop.py | 5 +++-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 4efdc47647..5d7f70cc66 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -193,7 +193,11 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: logger.debug('============ FORWARDS ============') - #forwards + # Forwards + # This computes valid path *prefixes*, where each g nodes/edges is the path wavefront: + # g_step._nodes: The nodes reached in this step + # g_step._edges: The edges used to reach those nodes + # At the paths are prefixes, wavefront nodes may invalid wrt subsequent steps (e.g., halt early) g_stack : List[Plottable] = [] for op in ops: prev_step_nodes = ( # start from only prev step's wavefront node @@ -217,6 +221,9 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: logger.debug('============ BACKWARDS ============') + # Backwards + # Compute reverse and thus complete paths. Dropped nodes/edges are thus the incomplete path prefixes. + # Each g node/edge represents a valid wavefront entry for that step. g_stack_reverse : List[Plottable] = [] for (op, g_step) in zip(reversed(ops), reversed(g_stack)): prev_loop_step = g_stack[-1] if len(g_stack_reverse) == 0 else g_stack_reverse[-1] diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index c0ade90e53..edca8fa9e3 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -169,8 +169,9 @@ def hop(self: Plottable, + ( [ new_node_ids_forward ] if new_node_ids_forward is not None else mt ) # noqa: W503 + ( [ new_node_ids_reverse] if new_node_ids_reverse is not None else mt ), # noqa: W503 ignore_index=True, sort=False).drop_duplicates() - - # Finally add initial nodes as confirmed also match edge + post-node predicates, not just pre-node predicates + # Finally include all initial root nodes matched against, now that edge triples satisfy all source/dest/edge predicates + # Only run first iteration b/c root nodes already accounted for in subsequent + # In wavefront mode, skip, as we only want to return reached nodes if matches_nodes is None: if return_as_wave_front: matches_nodes = new_node_ids[:0] From 575c4fd909856a9c7d36cea78d312d60c601d1fd Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 17:36:13 -0800 Subject: [PATCH 052/104] fix(lint) --- graphistry/compute/chain.py | 5 ----- graphistry/compute/hop.py | 2 ++ 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 5d7f70cc66..c8be5d12c1 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -214,11 +214,6 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: ) g_stack.append(g_step) - encountered_nodes_df = pd.concat([ - g_step._nodes - for g_step in g_stack - ]).drop_duplicates(subset=[g._node]) - logger.debug('============ BACKWARDS ============') # Backwards diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index edca8fa9e3..b14a7bc08a 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -119,6 +119,7 @@ def hop(self: Plottable, ) new_node_ids_forward = hop_edges_forward[[g2._destination]].rename(columns={g2._destination: g2._node}).drop_duplicates() + if destination_node_query is not None or destination_node_match is not None: new_node_ids_forward = query_if_not_none( destination_node_query, filter_by_dict( @@ -144,6 +145,7 @@ def hop(self: Plottable, ) new_node_ids_reverse = hop_edges_reverse[[g2._source]].rename(columns={g2._source: g2._node}).drop_duplicates() + if destination_node_query is not None or destination_node_match is not None: new_node_ids_reverse = query_if_not_none( destination_node_query, filter_by_dict( From b1bfac7b5ff5f035164fe3a1043736b7491c7802 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 17:51:59 -0800 Subject: [PATCH 053/104] refactor(is_in, astpredicate) --- CHANGELOG.md | 6 ++++++ graphistry/compute/__init__.py | 3 +-- graphistry/compute/ast.py | 5 ++++- graphistry/compute/filter_by_dict.py | 9 +-------- graphistry/compute/predicates/ASTPredicate.py | 6 ++++++ graphistry/compute/predicates/is_in.py | 9 +++++++++ graphistry/tests/test_compute_filter_by_dict.py | 3 ++- graphistry/tests/test_compute_hops.py | 2 +- 8 files changed, 30 insertions(+), 13 deletions(-) create mode 100644 graphistry/compute/predicates/ASTPredicate.py create mode 100644 graphistry/compute/predicates/is_in.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 452346e1d5..5f6e35e36a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * chain: optional df queries: - `chain([n(query='...')])` - `chain([e_forward(..., source_node_query='...', edge_query='...', destination_node_query='...')])` +* `ASTPredicate` base class for filter matching ### Fixed @@ -22,6 +23,11 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * hop: multi-hops with `source_node_match` specified was not checking intermediate hops * compute logging no longer default-overrides level to DEBUG +### Changed + +* refactor: move `is_in`, `IsIn` implementations to `graphistry.ast.predicates`; old imports preserved +* `IsIn` now implements `ASTPredicate` + ## [0.29.7 - 2023-11-02] ### Added diff --git a/graphistry/compute/__init__.py b/graphistry/compute/__init__.py index 0b26750541..cd5a814c7c 100644 --- a/graphistry/compute/__init__.py +++ b/graphistry/compute/__init__.py @@ -1,5 +1,4 @@ from .ComputeMixin import ComputeMixin from .ast import ( - n, e_forward, e_reverse, e_undirected + n, e_forward, e_reverse, e_undirected, is_in, IsIn ) -from .filter_by_dict import is_in, IsIn diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index c7edcf9eb9..a481336e7f 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -2,7 +2,9 @@ import pandas as pd from graphistry.Plottable import Plottable -from .filter_by_dict import is_in, IsIn +from .predicates.ASTPredicate import ASTPredicate +from .predicates.is_in import is_in, IsIn +from .filter_by_dict import filter_by_dict import logging logger = logging.getLogger(__name__) @@ -15,6 +17,7 @@ class ASTObject(object): """ Internal, not intended for use outside of this module. + These are operator-level expressions used as g.chain(List) """ def __init__(self, name: Optional[str] = None): self._name = name diff --git a/graphistry/compute/filter_by_dict.py b/graphistry/compute/filter_by_dict.py index 5aa9ef77fc..22c294815f 100644 --- a/graphistry/compute/filter_by_dict.py +++ b/graphistry/compute/filter_by_dict.py @@ -2,14 +2,7 @@ import pandas as pd from graphistry.Plottable import Plottable - - -class IsIn(): - def __init__(self, options: List[Any]) -> None: - self.options = options - -def is_in(options: List[Any]) -> IsIn: - return IsIn(options) +from .predicates.is_in import IsIn def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: diff --git a/graphistry/compute/predicates/ASTPredicate.py b/graphistry/compute/predicates/ASTPredicate.py new file mode 100644 index 0000000000..135940e3c6 --- /dev/null +++ b/graphistry/compute/predicates/ASTPredicate.py @@ -0,0 +1,6 @@ +class ASTPredicate(): + """ + Internal, not intended for use outside of this module. + These are fancy columnar predicates used in {k: v, ...} node/edge df matching when going beyond primitive equality + """ + pass diff --git a/graphistry/compute/predicates/is_in.py b/graphistry/compute/predicates/is_in.py new file mode 100644 index 0000000000..9d337b5e94 --- /dev/null +++ b/graphistry/compute/predicates/is_in.py @@ -0,0 +1,9 @@ +from typing import Any, List +from .ASTPredicate import ASTPredicate + +class IsIn(ASTPredicate): + def __init__(self, options: List[Any]) -> None: + self.options = options + +def is_in(options: List[Any]) -> IsIn: + return IsIn(options) diff --git a/graphistry/tests/test_compute_filter_by_dict.py b/graphistry/tests/test_compute_filter_by_dict.py index 5babdd9211..570e14e865 100644 --- a/graphistry/tests/test_compute_filter_by_dict.py +++ b/graphistry/tests/test_compute_filter_by_dict.py @@ -1,7 +1,8 @@ import pandas as pd from functools import lru_cache -from graphistry.compute.filter_by_dict import filter_by_dict, is_in, IsIn +from graphistry.compute.ast import is_in, IsIn +from graphistry.compute.filter_by_dict import filter_by_dict from graphistry.tests.test_compute import CGFull @lru_cache(maxsize=1) diff --git a/graphistry/tests/test_compute_hops.py b/graphistry/tests/test_compute_hops.py index 01ff225c3e..157cebfa8e 100644 --- a/graphistry/tests/test_compute_hops.py +++ b/graphistry/tests/test_compute_hops.py @@ -2,7 +2,7 @@ from common import NoAuthTestCase from functools import lru_cache -from graphistry.compute.filter_by_dict import is_in +from graphistry.compute.ast import is_in from graphistry.tests.test_compute import CGFull @lru_cache(maxsize=1) From 66e9dbe3ce8835979d8387793f98d39c538f0ac4 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 19:24:24 -0800 Subject: [PATCH 054/104] feat(predicates): many predicates for hop/chain --- CHANGELOG.md | 9 + README.md | 33 +++ graphistry/__init__.py | 38 +++- graphistry/compute/__init__.py | 44 +++- graphistry/compute/ast.py | 43 +++- graphistry/compute/filter_by_dict.py | 20 +- graphistry/compute/predicates/ASTPredicate.py | 9 +- graphistry/compute/predicates/categorical.py | 17 ++ graphistry/compute/predicates/is_in.py | 6 + graphistry/compute/predicates/numeric.py | 121 ++++++++++ graphistry/compute/predicates/str.py | 210 ++++++++++++++++++ graphistry/compute/predicates/temporal.py | 82 +++++++ graphistry/tests/test_compute_chain.py | 2 +- .../tests/test_compute_filter_by_dict.py | 2 +- graphistry/tests/test_compute_hops.py | 2 +- 15 files changed, 621 insertions(+), 17 deletions(-) create mode 100644 graphistry/compute/predicates/categorical.py create mode 100644 graphistry/compute/predicates/numeric.py create mode 100644 graphistry/compute/predicates/str.py create mode 100644 graphistry/compute/predicates/temporal.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 5f6e35e36a..2120896df7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,11 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm - `chain([n(query='...')])` - `chain([e_forward(..., source_node_query='...', edge_query='...', destination_node_query='...')])` * `ASTPredicate` base class for filter matching +* Additional predicates for hop and chain match expressions: + - categorical: is_in (example above), duplicated + - temporal: is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end, is_leap_year + - numeric: gt, lt, ge, le, eq, ne, between, isna, notna + - str: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull ### Fixed @@ -28,6 +33,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * refactor: move `is_in`, `IsIn` implementations to `graphistry.ast.predicates`; old imports preserved * `IsIn` now implements `ASTPredicate` +### Docs + +* hop/chain: new query and predicate forms + ## [0.29.7 - 2023-11-02] ### Added diff --git a/README.md b/README.md index 70c8692891..cede790c01 100644 --- a/README.md +++ b/README.md @@ -1139,6 +1139,15 @@ g2.plot() # nodes are values from cols s, d, k1 .collapse(node='some_id', column='some_col', attribute='some val') ``` +Both `hop()` and `chain()` match dictionary expressions support dataframe series *predicates*. The above examples show `is_in([x, y, z, ...])`. Additional predicates include: + +* categorical: is_in, duplicated +* temporal: is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end +* numeric: gt, lt, ge, le, eq, ne, between, isna, notna +* string: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull + + + #### Table to graph ```python @@ -1225,16 +1234,37 @@ Method `.hop()` enables slightly more complicated edge filters: ```python +from graphistry import is_in, gt + # (a)-[{"v": 1, "type": "z"}]->(b) based on g g2b = g2.hop( source_node_match={g2._node: "a"}, edge_match={"v": 1, "type": "z"}, destination_node_match={g2._node: "b"}) +g2b = g2.hop( + source_node_query='n == "a", + edge_query='v == 1 and type == "z"', + destination_node_query='n == "b"') + +# (a {x in [1,2] and y > 3})-[e]->(b) based on g +g2c = g2.hop( + source_node_match={ + g2._node: "a", + "x": is_in([1,2]), + "y": gt(3) + }, + destination_node_match={g2._node: "b"}) +) # (a or b)-[1 to 8 hops]->(anynode), based on graph g2 g3 = g2.hop(pd.DataFrame({g2._node: ['a', 'b']}), hops=8) +# (a or b)-[1 to 8 hops]->(anynode), based on graph g2 +g3 = g2.hop(pd.DataFrame({g2._node: is_in(['a', 'b'])}), hops=8) + # (c)<-[any number of hops]-(any node), based on graph g3 +# Note multihop matches check source/destination/edge match/query predicates +# against every encountered edge for it to be included g4 = g3.hop(source_node_match={"node": "c"}, direction='reverse', to_fixed_point=True) # (c)-[incoming or outgoing edge]-(any node), @@ -1251,6 +1281,7 @@ from graphistry import n, e_forward, e_reverse, e_undirected, is_in g2.chain([ n() ]) g2.chain([ n({"x": 1, "y": True}) ]), +g2.chain([ n(query='x == 1 and y == True') ]), g2.chain([ n({"z": is_in([1,2,4,'z'])}) ]), # multiple valid values g2.chain([ e_forward({"type": "x"}, hops=2) ]) # simple multi-hop g3 = g2.chain([ @@ -1264,6 +1295,8 @@ print('# end nodes: ', len(g3._nodes[ g3._nodes.end ])) print('# end edges: ', len(g3._edges[ g3._edges.final_edge ])) ``` +See table above for more predicates like `is_in()` and `gt()` + #### Pipelining ```python diff --git a/graphistry/__init__.py b/graphistry/__init__.py index 02e467afea..246fdf6cb7 100644 --- a/graphistry/__init__.py +++ b/graphistry/__init__.py @@ -51,7 +51,43 @@ from graphistry.compute import ( n, e_forward, e_reverse, e_undirected, - is_in, IsIn + + is_in, IsIn, + + duplicated, Duplicated, + + is_month_start, IsMonthStart, + is_month_end, IsMonthEnd, + is_quarter_start, IsQuarterStart, + is_quarter_end, IsQuarterEnd, + is_year_start, IsYearStart, + is_leap_year, IsLeapYear, + + gt, GT, + lt, LT, + ge, GE, + le, LE, + eq, EQ, + ne, NE, + between, Between, + isna, IsNA, + notna, NotNA, + + contains, Contains, + startswith, Startswith, + endswith, Endswith, + match, Match, + isnumeric, IsNumeric, + isalpha, IsAlpha, + isdigit, IsDigit, + islower, IsLower, + isupper, IsUpper, + isspace, IsSpace, + isalnum, IsAlnum, + isdecimal, IsDecimal, + istitle, IsTitle, + isnull, IsNull, + notnull, NotNull, ) from graphistry.Engine import Engine diff --git a/graphistry/compute/__init__.py b/graphistry/compute/__init__.py index cd5a814c7c..d321b0915e 100644 --- a/graphistry/compute/__init__.py +++ b/graphistry/compute/__init__.py @@ -1,4 +1,46 @@ from .ComputeMixin import ComputeMixin from .ast import ( - n, e_forward, e_reverse, e_undirected, is_in, IsIn + n, e_forward, e_reverse, e_undirected +) +from .predicates.is_in import ( + is_in, IsIn +) +from .predicates.categorical import ( + duplicated, Duplicated, +) +from .predicates.temporal import ( + is_month_start, IsMonthStart, + is_month_end, IsMonthEnd, + is_quarter_start, IsQuarterStart, + is_quarter_end, IsQuarterEnd, + is_year_start, IsYearStart, + is_leap_year, IsLeapYear +) +from .predicates.numeric import ( + gt, GT, + lt, LT, + ge, GE, + le, LE, + eq, EQ, + ne, NE, + between, Between, + isna, IsNA, + notna, NotNA +) +from .predicates.str import ( + contains, Contains, + startswith, Startswith, + endswith, Endswith, + match, Match, + isnumeric, IsNumeric, + isalpha, IsAlpha, + isdigit, IsDigit, + islower, IsLower, + isupper, IsUpper, + isspace, IsSpace, + isalnum, IsAlnum, + isdecimal, IsDecimal, + istitle, IsTitle, + isnull, IsNull, + notnull, NotNull, ) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index a481336e7f..39695edc21 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -3,7 +3,48 @@ from graphistry.Plottable import Plottable from .predicates.ASTPredicate import ASTPredicate -from .predicates.is_in import is_in, IsIn +from .predicates.is_in import ( + is_in, IsIn +) +from .predicates.categorical import ( + duplicated, Duplicated, +) +from .predicates.temporal import ( + is_month_start, IsMonthStart, + is_month_end, IsMonthEnd, + is_quarter_start, IsQuarterStart, + is_quarter_end, IsQuarterEnd, + is_year_start, IsYearStart, + is_leap_year, IsLeapYear +) +from .predicates.numeric import ( + gt, GT, + lt, LT, + ge, GE, + le, LE, + eq, EQ, + ne, NE, + between, Between, + isna, IsNA, + notna, NotNA +) +from .predicates.str import ( + contains, Contains, + startswith, Startswith, + endswith, Endswith, + match, Match, + isnumeric, IsNumeric, + isalpha, IsAlpha, + isdigit, IsDigit, + islower, IsLower, + isupper, IsUpper, + isspace, IsSpace, + isalnum, IsAlnum, + isdecimal, IsDecimal, + istitle, IsTitle, + isnull, IsNull, + notnull, NotNull +) from .filter_by_dict import filter_by_dict import logging diff --git a/graphistry/compute/filter_by_dict.py b/graphistry/compute/filter_by_dict.py index 22c294815f..db59d2605d 100644 --- a/graphistry/compute/filter_by_dict.py +++ b/graphistry/compute/filter_by_dict.py @@ -1,8 +1,8 @@ -from typing import Any, Dict, List, Optional, TYPE_CHECKING +from typing import Dict, Optional import pandas as pd from graphistry.Plottable import Plottable -from .predicates.is_in import IsIn +from .predicates.ASTPredicate import ASTPredicate def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: @@ -13,25 +13,25 @@ def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: if filter_dict is None or filter_dict == {}: return df - ins: Dict[str, IsIn] = {} + predicates: Dict[str, ASTPredicate] = {} for col, val in filter_dict.items(): if col not in df.columns: raise ValueError(f'Key "{col}" not in columns of df, available columns are: {df.columns}') - if isinstance(val, IsIn): - ins[col] = val - filter_dict_concrete = filter_dict if not ins else { + if isinstance(val, ASTPredicate): + predicates[col] = val + filter_dict_concrete = filter_dict if not predicates else { k: v for k, v in filter_dict.items() - if not isinstance(v, IsIn) + if not isinstance(v, ASTPredicate) } if filter_dict_concrete: hits = (df[list(filter_dict_concrete)] == pd.Series(filter_dict_concrete)).all(axis=1) else: hits = df[[]].assign(x=True).x - if ins: - for col, val in ins.items(): - hits = hits & df[col].isin(val.options) + if predicates: + for col, op in predicates.items(): + hits = hits & op(df[col]) return df[hits] diff --git a/graphistry/compute/predicates/ASTPredicate.py b/graphistry/compute/predicates/ASTPredicate.py index 135940e3c6..24c2d08bc8 100644 --- a/graphistry/compute/predicates/ASTPredicate.py +++ b/graphistry/compute/predicates/ASTPredicate.py @@ -1,6 +1,13 @@ +from abc import abstractmethod +import pandas as pd + + class ASTPredicate(): """ Internal, not intended for use outside of this module. These are fancy columnar predicates used in {k: v, ...} node/edge df matching when going beyond primitive equality """ - pass + + @abstractmethod + def __call__(self, s: pd.Series) -> pd.Series: + pass diff --git a/graphistry/compute/predicates/categorical.py b/graphistry/compute/predicates/categorical.py new file mode 100644 index 0000000000..3e786d3153 --- /dev/null +++ b/graphistry/compute/predicates/categorical.py @@ -0,0 +1,17 @@ +from typing import Literal, Optional +import pandas as pd + +from .ASTPredicate import ASTPredicate + +class Duplicated(ASTPredicate): + def __init__(self, keep: Literal['first', 'last', False] = 'first') -> None: + self.keep = keep + + def __call__(self, s: pd.Series) -> pd.Series: + return s.duplicated(keep=self.keep) + +def duplicated(keep: Literal['first', 'last', False] = 'first') -> Duplicated: + """ + Return whether a given value is duplicated + """ + return Duplicated(keep=keep) diff --git a/graphistry/compute/predicates/is_in.py b/graphistry/compute/predicates/is_in.py index 9d337b5e94..77c9f2505a 100644 --- a/graphistry/compute/predicates/is_in.py +++ b/graphistry/compute/predicates/is_in.py @@ -1,9 +1,15 @@ from typing import Any, List +import pandas as pd + from .ASTPredicate import ASTPredicate + class IsIn(ASTPredicate): def __init__(self, options: List[Any]) -> None: self.options = options + + def __call__(self, s: pd.Series) -> pd.Series: + return s.isin(self.options) def is_in(options: List[Any]) -> IsIn: return IsIn(options) diff --git a/graphistry/compute/predicates/numeric.py b/graphistry/compute/predicates/numeric.py new file mode 100644 index 0000000000..d17b07bc0c --- /dev/null +++ b/graphistry/compute/predicates/numeric.py @@ -0,0 +1,121 @@ +from typing import Optional +import pandas as pd + +from .ASTPredicate import ASTPredicate + +class GT(ASTPredicate): + def __init__(self, val: float) -> None: + self.val = val + + def __call__(self, s: pd.Series) -> pd.Series: + return s > self.val + +def gt(val: float) -> GT: + """ + Return whether a given value is greater than a threshold + """ + return GT(val) + +class LT(ASTPredicate): + def __init__(self, val: float) -> None: + self.val = val + + def __call__(self, s: pd.Series) -> pd.Series: + return s < self.val + +def lt(val: float) -> LT: + """ + Return whether a given value is less than a threshold + """ + return LT(val) + +class GE(ASTPredicate): + def __init__(self, val: float) -> None: + self.val = val + + def __call__(self, s: pd.Series) -> pd.Series: + return s >= self.val + +def ge(val: float) -> GE: + """ + Return whether a given value is greater than or equal to a threshold + """ + return GE(val) + +class LE(ASTPredicate): + def __init__(self, val: float) -> None: + self.val = val + + def __call__(self, s: pd.Series) -> pd.Series: + return s <= self.val + +def le(val: float) -> LE: + """ + Return whether a given value is less than or equal to a threshold + """ + return LE(val) + +class EQ(ASTPredicate): + def __init__(self, val: float) -> None: + self.val = val + + def __call__(self, s: pd.Series) -> pd.Series: + return s == self.val + +def eq(val: float) -> EQ: + """ + Return whether a given value is equal to a threshold + """ + return EQ(val) + +class NE(ASTPredicate): + def __init__(self, val: float) -> None: + self.val = val + + def __call__(self, s: pd.Series) -> pd.Series: + return s != self.val + +def ne(val: float) -> NE: + """ + Return whether a given value is not equal to a threshold + """ + return NE(val) + +class Between(ASTPredicate): + def __init__(self, lower: float, upper: float, inclusive: bool = True) -> None: + self.lower = lower + self.upper = upper + self.inclusive = inclusive + + def __call__(self, s: pd.Series) -> pd.Series: + if self.inclusive: + return (s >= self.lower) & (s <= self.upper) + else: + return (s > self.lower) & (s < self.upper) + +def between(lower: float, upper: float, inclusive: bool = True) -> Between: + """ + Return whether a given value is between a lower and upper threshold + """ + return Between(lower, upper, inclusive) + +class IsNA(ASTPredicate): + def __call__(self, s: pd.Series) -> pd.Series: + return s.isna() + +def isna() -> IsNA: + """ + Return whether a given value is NA + """ + return IsNA() + + +class NotNA(ASTPredicate): + def __call__(self, s: pd.Series) -> pd.Series: + return s.notna() + +def notna() -> NotNA: + """ + Return whether a given value is not NA + """ + return NotNA() diff --git a/graphistry/compute/predicates/str.py b/graphistry/compute/predicates/str.py new file mode 100644 index 0000000000..14a8ae2de5 --- /dev/null +++ b/graphistry/compute/predicates/str.py @@ -0,0 +1,210 @@ +from typing import Optional +import pandas as pd + +from .ASTPredicate import ASTPredicate + + +class Contains(ASTPredicate): + def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None, regex: bool = True) -> None: + self.pat = pat + self.case = case + self.flags = flags + self.na = na + self.regex = regex + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.contains(self.pat, self.case, self.flags, self.na, self.regex) + +def contains(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None, regex: bool = True) -> Contains: + """ + Return whether a given pattern or regex is contained within a string + """ + return Contains(pat, case, flags, na, regex) + + +class Startswith(ASTPredicate): + def __init__(self, pat: str, na: Optional[str] = None) -> None: + self.pat = pat + self.na = na + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.startswith(self.pat, self.na) + +def startswith(pat: str, na: Optional[str] = None) -> Startswith: + """ + Return whether a given pattern is at the start of a string + """ + return Startswith(pat, na) + +class Endswith(ASTPredicate): + def __init__(self, pat: str, na: Optional[str] = None) -> None: + self.pat = pat + self.na = na + + def __call__(self, s: pd.Series) -> pd.Series: + """ + Return whether a given pattern is at the end of a string + """ + return s.str.endswith(self.pat, self.na) + +def endswith(pat: str, na: Optional[str] = None) -> Endswith: + return Endswith(pat, na) + +class Match(ASTPredicate): + def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None) -> None: + self.pat = pat + self.case = case + self.flags = flags + self.na = na + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.match(self.pat, self.case, self.flags, self.na) + +def match(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None) -> Match: + """ + Return whether a given pattern is at the start of a string + """ + return Match(pat, case, flags, na) + +class IsNumeric(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isnumeric() + +def isnumeric() -> IsNumeric: + """ + Return whether a given string is numeric + """ + return IsNumeric() + +class IsAlpha(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isalpha() + +def isalpha() -> IsAlpha: + """ + Return whether a given string is alphabetic + """ + return IsAlpha() + +class IsDigit(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isdigit() + +def isdigit() -> IsDigit: + """ + Return whether a given string is numeric + """ + return IsDigit() + +class IsLower(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.islower() + +def islower() -> IsLower: + """ + Return whether a given string is lowercase + """ + return IsLower() + +class IsUpper(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isupper() + +def isupper() -> IsUpper: + """ + Return whether a given string is uppercase + """ + return IsUpper() + +class IsSpace(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isspace() + +def isspace() -> IsSpace: + """ + Return whether a given string is whitespace + """ + return IsSpace() + +class IsAlnum(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isalnum() + +def isalnum() -> IsAlnum: + """ + Return whether a given string is alphanumeric + """ + return IsAlnum() + +class IsDecimal(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.isdecimal() + +def isdecimal() -> IsDecimal: + """ + Return whether a given string is decimal + """ + return IsDecimal() + +class IsTitle(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.str.istitle() + +def istitle() -> IsTitle: + """ + Return whether a given string is title case + """ + return IsTitle() + +class IsNull(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.isnull() + +def isnull() -> IsNull: + """ + Return whether a given string is null + """ + return IsNull() + +class NotNull(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.notnull() + +def notnull() -> NotNull: + """ + Return whether a given string is not null + """ + return NotNull() diff --git a/graphistry/compute/predicates/temporal.py b/graphistry/compute/predicates/temporal.py new file mode 100644 index 0000000000..b18984fe97 --- /dev/null +++ b/graphistry/compute/predicates/temporal.py @@ -0,0 +1,82 @@ +from typing import Optional +import pandas as pd + +from .ASTPredicate import ASTPredicate + +class IsMonthStart(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_month_start + +def is_month_start() -> IsMonthStart: + """ + Return whether a given value is a month start + """ + return IsMonthStart() + +class IsMonthEnd(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_month_end + +def is_month_end() -> IsMonthEnd: + """ + Return whether a given value is a month end + """ + return IsMonthEnd() + +class IsQuarterStart(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_quarter_start + +def is_quarter_start() -> IsQuarterStart: + """ + Return whether a given value is a quarter start + """ + return IsQuarterStart() + +class IsQuarterEnd(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_quarter_end + +def is_quarter_end() -> IsQuarterEnd: + """ + Return whether a given value is a quarter end + """ + return IsQuarterEnd() + +class IsYearStart(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_year_start + +def is_year_start() -> IsYearStart: + """ + Return whether a given value is a year start + """ + return IsYearStart() + +class IsLeapYear(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_leap_year + +def is_leap_year() -> IsLeapYear: + """ + Return whether a given value is a leap year + """ + return IsLeapYear() diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index 55f112b840..8d92852447 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -121,7 +121,7 @@ def test_chain_named(self): assert sorted(g2._edges[ g2._edges.e2 ][g2._destination].to_list()) == ["a", "b"] assert sorted(g2._nodes[ g2._nodes.n2 ][g2._node].to_list()) == ["a", "b"] - def test_chain_is_in(self): + def test_chain_predicate_is_in(self): g = hops_graph() assert g.chain([n({'node': is_in(['e', 'k'])})])._nodes.shape == (2, 2) diff --git a/graphistry/tests/test_compute_filter_by_dict.py b/graphistry/tests/test_compute_filter_by_dict.py index 570e14e865..07224a23bb 100644 --- a/graphistry/tests/test_compute_filter_by_dict.py +++ b/graphistry/tests/test_compute_filter_by_dict.py @@ -108,7 +108,7 @@ def test_kv_multiple_bad(self): g = hops_graph() assert g.filter_edges_by_dict({'i': -100, 'type': 'e'})._edges.equals(g._edges[:0]) -class TestIsIn(object): +class TestPredicateIsIn(object): def test_standalone(self): g = hops_graph() diff --git a/graphistry/tests/test_compute_hops.py b/graphistry/tests/test_compute_hops.py index 157cebfa8e..cf2c8e6b02 100644 --- a/graphistry/tests/test_compute_hops.py +++ b/graphistry/tests/test_compute_hops.py @@ -180,7 +180,7 @@ def test_hop_filter_types(self): assert g5a._nodes.shape == (2, 2) assert g5a._edges.shape == (1, 3) - def test_is_in(self): + def test_predicate_is_in(self): g = hops_graph() assert g.hop(source_node_match={'node': is_in(['e', 'k'])})._edges.shape == (3, 3) From 29a2d27f29bcdd8c8ae7c0f06b138fe41122177c Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 19:27:09 -0800 Subject: [PATCH 055/104] fix(py 3.7): use old Literal import --- graphistry/compute/predicates/categorical.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/compute/predicates/categorical.py b/graphistry/compute/predicates/categorical.py index 3e786d3153..bcc08c7c84 100644 --- a/graphistry/compute/predicates/categorical.py +++ b/graphistry/compute/predicates/categorical.py @@ -1,4 +1,4 @@ -from typing import Literal, Optional +from typing_extensions import Literal import pandas as pd from .ASTPredicate import ASTPredicate From 5b3795c773b29edd1426a84293a3b7d024db262b Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 19:52:41 -0800 Subject: [PATCH 056/104] fix(modules): prediciates __init__.py --- graphistry/compute/predicates/__init__.py | 1 + 1 file changed, 1 insertion(+) create mode 100644 graphistry/compute/predicates/__init__.py diff --git a/graphistry/compute/predicates/__init__.py b/graphistry/compute/predicates/__init__.py new file mode 100644 index 0000000000..8b13789179 --- /dev/null +++ b/graphistry/compute/predicates/__init__.py @@ -0,0 +1 @@ + From 053f66544c43571c0207e6878fe1c97bf3297ab8 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 22:17:51 -0800 Subject: [PATCH 057/104] docs(hop and chain): tutorial --- CHANGELOG.md | 1 + README.md | 23 + .../hop_and_chain_graph_pattern_mining.ipynb | 2736 +++++++++++++++++ 3 files changed, 2760 insertions(+) create mode 100644 demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb diff --git a/CHANGELOG.md b/CHANGELOG.md index 2120896df7..af4b890248 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -36,6 +36,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Docs * hop/chain: new query and predicate forms +* hop/chain graph pattern mining tutorial: [ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) ## [0.29.7 - 2023-11-02] diff --git a/README.md b/README.md index cede790c01..0f0cefe3e0 100644 --- a/README.md +++ b/README.md @@ -147,6 +147,25 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` +* Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)) + + Run Cypher-style graph queries natively on dataframes without going to a database or Java: + + ```python + from graphistry import n, e_undirected, is_in + + g2 = g.chain([ + n({'user': 'Biden'}), + e_undirected(), + n(name='bridge'), + e_undirected(), + n({'user': is_in(['Trump', 'Obama'])}) + ]) + + print('# bridges', len(g2._nodes[g2._nodes.bridge])) + g2.plot() + ``` + * [Spark](https://spark.apache.org/)/[Databricks](https://databricks.com/) ([ipynb demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.ipynb), [dbc demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.dbc)) ```python @@ -1218,6 +1237,10 @@ assert 'pagerank' in g2._nodes.columns #### Graph pattern matching +PyGraphistry supports a PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java + +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) + Traverse within a graph, or expand one graph against another Simple node and edge filtering via `filter_edges_by_dict()` and `filter_nodes_by_dict()`: diff --git a/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb b/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb new file mode 100644 index 0000000000..af8024a9eb --- /dev/null +++ b/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb @@ -0,0 +1,2736 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Hop-and-chain: PyGraphistry Cypher-style graph pattern matching on dataframes\n", + "\n", + "PyGraphistry supports a rich subset of the popular Cypher graph query language, which you can run on dataframes without needing to install a database nor native libraries. It is natively integrated with dataframes and thus has a Python-native syntax rather than the traditional string syntax.\n", + "\n", + "**PyGraphistry graph pattern matching features key similarities with Cypher**\n", + "\n", + "* Multi-hop searching\n", + "* Predicates on node and edge attributes\n", + "* Ability to identify matching nodes and edges\n", + "\n", + "**It is different in a few key ways**\n", + "\n", + "* Pure PyData (Python/C++/Fortran): No need to install databases, Java, etc., `pip install pygraphistry` is enough\n", + "* It is collection-oriented rather than path-oriented: All operations are guaranteed to translate to efficiently vectorized dataframe operations rather than asymptotically slower per-row path operations typical of traditional graph query engines\n", + "* Advanced users can insert custom predicates as native Python dataframe code\n", + "\n", + "---\n", + "\n", + "# Tutorial:\n", + "\n", + "1. Install & configure\n", + "1. Load & enrich a US congress twitter interaction dataset\n", + "1. Simple graph filtering: `g.hop()` and `g.chain([...])`\n", + "1. Multi-hop and paths-between-nodes pattern mining\n", + "1. Advanced filter predicates\n", + "1. Result labeling\n", + "\n" + ], + "metadata": { + "id": "GZxoiU8sQDk_" + } + }, + { + "cell_type": "markdown", + "source": [ + "# 1. Install & configure" + ], + "metadata": { + "id": "QQpsrtwBT7sa" + } + }, + { + "cell_type": "code", + "source": [ + "#! pip install graphistry[igraph]" + ], + "metadata": { + "id": "cYjRbgkU9Sx8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Imports" + ], + "metadata": { + "id": "Ff6Tt9DhkePl" + } + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "import graphistry\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")" + ], + "metadata": { + "id": "S5_y0CbLkjft" + }, + "execution_count": 141, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "graphistry.register(api=3, username='...', password='...')" + ], + "metadata": { + "id": "GQ83i-sKUaw9" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# 2. Load & enrich a US congress twitter interaction dataset" + ], + "metadata": { + "id": "eU9SyauNUHtR" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Data\n", + "\n", + "* Download\n", + "* Turn json into a Pandas edges dataframe\n", + "* Turn edges dataframe into a PyGraphistry graph\n", + "* Enrich nodes and edges with some useful graph metrics\n", + "* Visualize full graph to test" + ], + "metadata": { + "id": "AM9JhnaQkRd3" + } + }, + { + "cell_type": "code", + "source": [ + "! wget -q https://snap.stanford.edu/data/congress_network.zip\n", + "! unzip congress_network.zip\n" + ], + "metadata": { + "id": "55xeNAyDXhAm", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "287758f0-0df2-49ff-ecdc-283313f7e07a" + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "total 1.2M\n", + "drwxr-xr-x 1 root root 4.0K Dec 4 03:56 .\n", + "drwxr-xr-x 1 root root 4.0K Dec 4 03:33 ..\n", + "-rw-r--r-- 1 root root 150K May 9 2017 Attribute\n", + "-rw-r--r-- 1 root root 14K May 9 2017 Class_info\n", + "drwxr-xr-x 4 root root 4.0K Nov 30 14:24 .config\n", + "-rw-r--r-- 1 root root 190K Aug 5 05:26 congress_network.zip\n", + "-rw-r--r-- 1 root root 320K May 9 2017 edgelist\n", + "drwxr-xr-x 1 root root 4.0K Nov 30 14:27 sample_data\n", + "-rw-r--r-- 1 root root 16 May 9 2017 Statistics\n", + "-rw-r--r-- 1 root root 221K Dec 4 03:53 twitter.zip\n", + "-rw-r--r-- 1 root root 299K May 9 2017 vertex2aid\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import json\n", + "\n", + "with open('congress_network/congress_network_data.json', 'r') as file:\n", + " data = json.load(file)\n", + "\n", + "edges = []\n", + "for i, name in enumerate(data[0]['usernameList']):\n", + " for ii, j in enumerate(data[0]['outList'][i]):\n", + " edges.append({\n", + " 'from': name,\n", + " 'to': names[j],\n", + " 'weight': data[0]['outWeight'][i][ii]\n", + " })\n", + "edges_df = pd.DataFrame(edges)\n", + "\n", + "print(edges_df.shape)\n", + "edges_df.sample(5)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + }, + "id": "6CmULn4N-8oh", + "outputId": "61a1a4cf-dfe1-4260-a427-46009f4e4aaf" + }, + "execution_count": 40, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(13289, 3)\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " from to weight\n", + "11112 RepBobbyRush janschakowsky 0.034364\n", + "3836 RepCori Ilhan 0.015936\n", + "5282 RepTedDeutch RepDWStweets 0.003268\n", + "12352 BennieGThompson RepStricklandWA 0.006849\n", + "9358 RepCarolMiller RepTroyNehls 0.005291" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
fromtoweight
11112RepBobbyRushjanschakowsky0.034364
3836RepCoriIlhan0.015936
5282RepTedDeutchRepDWStweets0.003268
12352BennieGThompsonRepStricklandWA0.006849
9358RepCarolMillerRepTroyNehls0.005291
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 40 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Load dataframe as a PyGraphistry graph\n", + "\n", + "Turn into a graph and precompute some useful graph metrics\n", + "\n", + "Recall that a `g` object, underneath, is essentially just two dataframes, `g._edges` and `g._nodes`, and with many useful graph methods:" + ], + "metadata": { + "id": "XLFTgDTEDSeA" + } + }, + { + "cell_type": "code", + "source": [ + "# Shape\n", + "g = graphistry.edges(edges_df, 'from', 'to')\n", + "\n", + "# Enrich & style\n", + "# Tip: Switch from compute_igraph to compute_cugraph when GPUs are available\n", + "g2 = (g\n", + " .materialize_nodes()\n", + " .nodes(lambda g: g._nodes.assign(title=g._nodes.id))\n", + " .edges(lambda g: g._edges.assign(weight2=g._edges.weight))\n", + " .bind(point_title='title')\n", + " .compute_igraph('community_infomap')\n", + " .compute_igraph('pagerank')\n", + " .get_degrees()\n", + " .encode_point_color(\n", + " 'community_infomap',\n", + " as_categorical=True,\n", + " categorical_mapping={\n", + " 0: '#32a9a2', # vibrant teal\n", + " 1: '#ff6b6b', # soft coral\n", + " 2: '#f9d342', # muted yellow\n", + " }\n", + " )\n", + ")\n", + "\n", + "g2._nodes" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 461 + }, + "id": "aB1U7e0HXmHh", + "outputId": "53b9fa91-0caf-4866-c5a9-d9cf80e3c9ac" + }, + "execution_count": 77, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "WARNING:root:edge index g._edge not set so using edge index as ID; set g._edge via g.edges(), or change merge_if_existing to FalseWARNING:root:edge index g._edge __edge_index__ missing as attribute in ig; using ig edge order for IDsWARNING:root:edge index g._edge not set so using edge index as ID; set g._edge via g.edges(), or change merge_if_existing to FalseWARNING:root:edge index g._edge __edge_index__ missing as attribute in ig; using ig edge order for IDs" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id title community_infomap pagerank degree_in \\\n", + "0 SenatorBaldwin SenatorBaldwin 0 0.001422 26 \n", + "1 SenJohnBarrasso SenJohnBarrasso 0 0.001179 22 \n", + "2 SenatorBennet SenatorBennet 0 0.001995 33 \n", + "3 MarshaBlackburn MarshaBlackburn 0 0.001331 18 \n", + "4 SenBlumenthal SenBlumenthal 0 0.001672 30 \n", + ".. ... ... ... ... ... \n", + "470 RepJoeWilson RepJoeWilson 1 0.001780 21 \n", + "471 RobWittman RobWittman 1 0.001017 13 \n", + "472 rep_stevewomack rep_stevewomack 1 0.002637 35 \n", + "473 RepJohnYarmuth RepJohnYarmuth 2 0.000555 5 \n", + "474 RepLeeZeldin RepLeeZeldin 1 0.000511 3 \n", + "\n", + " degree_out degree \n", + "0 20 46 \n", + "1 19 41 \n", + "2 22 55 \n", + "3 38 56 \n", + "4 35 65 \n", + ".. ... ... \n", + "470 38 59 \n", + "471 19 32 \n", + "472 19 54 \n", + "473 20 25 \n", + "474 25 28 \n", + "\n", + "[475 rows x 7 columns]" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtitlecommunity_infomappagerankdegree_indegree_outdegree
0SenatorBaldwinSenatorBaldwin00.001422262046
1SenJohnBarrassoSenJohnBarrasso00.001179221941
2SenatorBennetSenatorBennet00.001995332255
3MarshaBlackburnMarshaBlackburn00.001331183856
4SenBlumenthalSenBlumenthal00.001672303565
........................
470RepJoeWilsonRepJoeWilson10.001780213859
471RobWittmanRobWittman10.001017131932
472rep_stevewomackrep_stevewomack10.002637351954
473RepJohnYarmuthRepJohnYarmuth20.00055552025
474RepLeeZeldinRepLeeZeldin10.00051132528
\n", + "

475 rows × 7 columns

\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 77 + } + ] + }, + { + "cell_type": "code", + "source": [ + "g2.plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 543 + }, + "id": "GY9Q7KyqBMq8", + "outputId": "5b4b277e-17fd-4201-9518-25168b927c6f" + }, + "execution_count": 79, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 79 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 3. Simple filtering: `g.hop()` & `g.chain([...])`\n", + "\n", + "We can filter by nodes, edges, and combinations of them\n", + "\n", + "The result is a graph where we can inspect the node and edge tables, or perform further graph operations, like visualization or further searches\n", + "\n", + "**Key concepts**\n", + "\n", + "There are 2 key methods:\n", + "* `g.hop(...)`: filter triples of source node, edge, destination node\n", + "* `g.chain([....])`: arbitrarily long sequence of node and edge predicates\n", + "\n", + "They reuse column operations core to dataframe libraries, such as comparison operators on strings, numbers, and dates\n", + "\n", + "**Sample tasks**\n", + "\n", + "This section shows how to:\n", + "\n", + "* Find SenSchumer and his immediate community (infomap metric)\n", + "* Look at his entire community\n", + "* Find everyone with high edge weight from/to SenSchumer; 2 hops either direction\n", + "* Find everyone in his community\n", + "\n" + ], + "metadata": { + "id": "2sB-Mi7qkrM0" + } + }, + { + "cell_type": "code", + "source": [ + "g2.chain([n({'title': 'SenSchumer'})])._nodes" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 81 + }, + "id": "ydS6ZUiUqx8E", + "outputId": "825b0c85-e3c9-4453-abed-1b252cf804d1" + }, + "execution_count": 80, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id title community_infomap pagerank degree_in degree_out \\\n", + "0 SenSchumer SenSchumer 2 0.001296 25 97 \n", + "\n", + " degree \n", + "0 122 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtitlecommunity_infomappagerankdegree_indegree_outdegree
0SenSchumerSenSchumer20.0012962597122
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 80 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "You can also pass `chain()` a sequence of node and edge expressions" + ], + "metadata": { + "id": "5vGMFFveUyLK" + } + }, + { + "cell_type": "code", + "source": [ + "g_immediate_community2 = g2.chain([n({'title': 'SenSchumer'}), e_undirected(), n({'community_infomap': 2})])\n", + "\n", + "print(len(g_immediate_community2._nodes), 'senators', len(g_immediate_community2._edges), 'relns')\n", + "g_immediate_community2._edges[['from', 'to', 'weight2']].sort_values(by=['weight2']).head(10)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "id": "I1L0ZRTrF_HH", + "outputId": "9ded72c2-694b-40a4-f55a-ee86b51f290d" + }, + "execution_count": 81, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "58 senators 69 relns\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " from to weight2\n", + "22 SenSchumer JacksonLeeTX18 0.001546\n", + "46 SenSchumer RepSarbanes 0.001546\n", + "23 SenSchumer RepJayapal 0.001546\n", + "53 SenSchumer PeterWelch 0.001546\n", + "25 SenSchumer RepDaveJoyce 0.001546\n", + "26 SenSchumer RepRobinKelly 0.001546\n", + "28 SenSchumer RepAndyKimNJ 0.001546\n", + "29 SenSchumer RepBarbaraLee 0.001546\n", + "50 SenSchumer RepPaulTonko 0.001546\n", + "32 SenSchumer RepMeijer 0.001546" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
fromtoweight2
22SenSchumerJacksonLeeTX180.001546
46SenSchumerRepSarbanes0.001546
23SenSchumerRepJayapal0.001546
53SenSchumerPeterWelch0.001546
25SenSchumerRepDaveJoyce0.001546
26SenSchumerRepRobinKelly0.001546
28SenSchumerRepAndyKimNJ0.001546
29SenSchumerRepBarbaraLee0.001546
50SenSchumerRepPaulTonko0.001546
32SenSchumerRepMeijer0.001546
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 81 + } + ] + }, + { + "cell_type": "code", + "source": [ + "g_immediate_community2.plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 543 + }, + "id": "3oR6PhBAGVdX", + "outputId": "3a2e4fa7-eb73-4a16-efa2-eb5b43bd929f" + }, + "execution_count": 82, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 82 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Often, we are just filtering on a src node / edge / dst node triple, so `hop()` is a short-form for this. All the `hop()` parameters can also be passed to edge expressions as well." + ], + "metadata": { + "id": "hRWJkIFAU5t_" + } + }, + { + "cell_type": "code", + "source": [ + "g_community2 = g2.hop(source_node_match={'community_infomap': 2}, destination_node_match={'community_infomap': 2})\n", + "\n", + "print(len(g_community2._nodes), 'senators', len(g_community2._edges), 'relns')\n", + "g_community2._edges.sort_values(by=['weight2']).head(10)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 380 + }, + "id": "5T5UWt8kFPuZ", + "outputId": "fd709a7d-a697-40d5-c34b-354eea84b72f" + }, + "execution_count": 83, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "214 senators 4993 relns\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " from to weight weight2\n", + "378 RepDonBeyer RepSpeier 0.000658 0.000658\n", + "354 RepDonBeyer repcleaver 0.000658 0.000658\n", + "353 RepDonBeyer RepYvetteClarke 0.000658 0.000658\n", + "352 RepDonBeyer RepCasten 0.000658 0.000658\n", + "349 RepDonBeyer RepBeatty 0.000658 0.000658\n", + "360 RepDonBeyer RepGaramendi 0.000658 0.000658\n", + "361 RepDonBeyer RepChuyGarcia 0.000658 0.000658\n", + "362 RepDonBeyer RepRaulGrijalva 0.000658 0.000658\n", + "365 RepDonBeyer USRepKeating 0.000658 0.000658\n", + "366 RepDonBeyer RepRickLarsen 0.000658 0.000658" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
fromtoweightweight2
378RepDonBeyerRepSpeier0.0006580.000658
354RepDonBeyerrepcleaver0.0006580.000658
353RepDonBeyerRepYvetteClarke0.0006580.000658
352RepDonBeyerRepCasten0.0006580.000658
349RepDonBeyerRepBeatty0.0006580.000658
360RepDonBeyerRepGaramendi0.0006580.000658
361RepDonBeyerRepChuyGarcia0.0006580.000658
362RepDonBeyerRepRaulGrijalva0.0006580.000658
365RepDonBeyerUSRepKeating0.0006580.000658
366RepDonBeyerRepRickLarsen0.0006580.000658
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 83 + } + ] + }, + { + "cell_type": "code", + "source": [ + "g_community2.encode_point_color('pagerank', ['blue', 'yellow', 'red'], as_continuous=True).plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 543 + }, + "id": "uMnOYQypG-KH", + "outputId": "a498472b-3fe5-4e1c-f9c5-b7b45515b63f" + }, + "execution_count": 86, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 86 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 4. Multi-hop and paths-between-nodes pattern mining\n", + "\n", + "Method `chain([...])` can be used for looking more than one hop out, and even finding paths between nodes." + ], + "metadata": { + "id": "o6yJ48oxKPiz" + } + }, + { + "cell_type": "markdown", + "source": [ + "Ex: All people bridging SenSchumer and SpeakerPelosi" + ], + "metadata": { + "id": "fT4y7fH4KzAr" + } + }, + { + "cell_type": "code", + "source": [ + "g_shumer_pelosi_bridges = g2.chain([\n", + " n({'title': 'SenSchumer'}),\n", + " e_undirected(),\n", + " n(),\n", + " e_undirected(),\n", + " n({'title': 'SpeakerPelosi'})\n", + "])\n", + "\n", + "print(len(g_shumer_pelosi_bridges._nodes), 'senators')\n", + "g_shumer_pelosi_bridges._edges.sort_values(by='weight').head(5)" + ], + "metadata": { + "id": "tItpDvqCBjvC", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + }, + "outputId": "5f701cb9-8e0a-4495-bfbd-baff50466ae9" + }, + "execution_count": 94, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "66 senators\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " from to weight weight2\n", + "86 RepJayapal SpeakerPelosi 0.000871 0.000871\n", + "47 SenSchumer RepMeijer 0.001546 0.001546\n", + "23 SenSchumer RepBuddyCarter 0.001546 0.001546\n", + "24 SenSchumer RepJudyChu 0.001546 0.001546\n", + "26 SenSchumer repcleaver 0.001546 0.001546" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
fromtoweightweight2
86RepJayapalSpeakerPelosi0.0008710.000871
47SenSchumerRepMeijer0.0015460.001546
23SenSchumerRepBuddyCarter0.0015460.001546
24SenSchumerRepJudyChu0.0015460.001546
26SenSchumerrepcleaver0.0015460.001546
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 94 + } + ] + }, + { + "cell_type": "code", + "source": [ + "g_shumer_pelosi_bridges.plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 543 + }, + "id": "L_YmDjeRJzcw", + "outputId": "c1abf11a-e48c-43ce-8087-a56052f3097c" + }, + "execution_count": 92, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 92 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 5. Advanced filter predicates\n", + "\n", + "We can use a variety of predicates for filtering nodes and edges beyond attribute value equality.\n", + "\n", + "Common tasks include comparing attributes using:\n", + "* Set inclusion: `is_in([...])`\n", + "* Numeric comparisons: `gt(...)`, `lt(...)`, `ge(...)`, `le(...)`\n", + "* String comparison: `startswith(...)`, `endswith(...)`, `contains(...)`\n", + "* Regular expression matching: `matches(...)`\n", + "* Duplicate checking: `duplicated()`" + ], + "metadata": { + "id": "dbRgj3qxLU5I" + } + }, + { + "cell_type": "markdown", + "source": [ + "Graph where nodes are in the top 20 pagerank:" + ], + "metadata": { + "id": "z69kYi3wMMHK" + } + }, + { + "cell_type": "code", + "source": [ + "top_20_pr = g2._nodes.pagerank.sort_values(ascending=False, ignore_index=True)[19]\n", + "top_20_pr" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tHkjtTw-MVuA", + "outputId": "d377eef1-8a2b-484a-b190-e95491eef4c2" + }, + "execution_count": 134, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.005888600097034367" + ] + }, + "metadata": {}, + "execution_count": 134 + } + ] + }, + { + "cell_type": "code", + "source": [ + "g_high_pr = g2.chain([\n", + " n({'pagerank': ge(top_20_pr)}),\n", + " e_undirected(),\n", + " n({'pagerank': ge(top_20_pr)}),\n", + "])\n", + "\n", + "len(g_high_pr._nodes)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wDg9pUyJMR3V", + "outputId": "7ba923cd-5faa-431e-8f8f-8da223a28a39" + }, + "execution_count": 128, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "20" + ] + }, + "metadata": {}, + "execution_count": 128 + } + ] + }, + { + "cell_type": "code", + "source": [ + "g_high_pr.plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 543 + }, + "id": "FVc-Mou-M6TO", + "outputId": "68ea6daa-b75b-46ad-cc54-010512aab919" + }, + "execution_count": 129, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 129 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Graph where the name includes Leader" + ], + "metadata": { + "id": "MDHyapjHOY-b" + } + }, + { + "cell_type": "code", + "source": [ + "g_leaders = g2.hop(\n", + " source_node_match={'title': contains('Leader')},\n", + " destination_node_match = {'title': contains('Leader')}\n", + ")\n", + "\n", + "print(len(g_leaders._nodes), 'leaders')\n", + "\n", + "g_leaders.plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 561 + }, + "id": "mwkKlqhoO29-", + "outputId": "9f3329f1-08f5-4d8f-c203-c049e59a101a" + }, + "execution_count": 136, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "2 leaders\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 136 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Graph of leaders and senators" + ], + "metadata": { + "id": "g--y65A0PHNJ" + } + }, + { + "cell_type": "code", + "source": [ + "g_leaders_and_senators = g2.hop(\n", + " source_node_match={'title': match_re(r'Sen|Leader')},\n", + " destination_node_match = {'title': match_re(r'Sen|Leader')}\n", + ")\n", + "\n", + "print(len(g_leaders_and_senators._nodes), 'leaders and senators')\n", + "\n", + "g_leaders_and_senators.plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 561 + }, + "id": "mBfeLkA_Ol2i", + "outputId": "e0353e9e-34c2-4fc5-a756-b0a9260e1edb" + }, + "execution_count": 139, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "67 leaders and senators\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 139 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 6. Result labeling\n", + "\n", + "It can be useful to name node and edges within the path query for downstream reasoning:" + ], + "metadata": { + "id": "DuxWkZFVV9-R" + } + }, + { + "cell_type": "code", + "source": [ + "g_bridges2 = g2.chain([\n", + " n({'title': 'SenSchumer'}),\n", + " e_undirected(name='from_schumer'),\n", + " n(name='found_bridge'),\n", + " e_undirected(name='from_pelosi'),\n", + " n({'title': 'SpeakerPelosi'})\n", + "])\n", + "\n", + "print(len(g_bridges2._nodes), 'senators in full graph')\n", + "\n", + "named = g_bridges2._nodes[ g_bridges2._nodes.found_bridge ]\n", + "print(len(named), 'bridging senators')\n", + "edges = g_bridges2._edges\n", + "print(len(edges[edges.from_schumer]), 'relns from_schumer', len(edges[edges.from_pelosi]), 'relns from_pelosi')\n", + "\n", + "g_bridges2.encode_point_color(\n", + " 'found_bridge',\n", + " as_categorical=True,\n", + " categorical_mapping={\n", + " True: 'orange',\n", + " False: 'silver'\n", + " }\n", + ").plot()" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 595 + }, + "id": "6G4nnclNPpY8", + "outputId": "5139bef8-45ba-4ae9-cadb-9da856eb6bc8" + }, + "execution_count": 156, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "66 senators in full graph\n", + "64 bridging senators\n", + "75 relns from_schumer 83 relns from_pelosi\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " " + ] + }, + "metadata": {}, + "execution_count": 156 + } + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "w3w4RRYkWXKo" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From 238cb1daa6904ab3316ff9cf6445334f9f7890fd Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 22:17:59 -0800 Subject: [PATCH 058/104] fix(lint) --- graphistry/compute/predicates/__init__.py | 1 - 1 file changed, 1 deletion(-) diff --git a/graphistry/compute/predicates/__init__.py b/graphistry/compute/predicates/__init__.py index 8b13789179..e69de29bb2 100644 --- a/graphistry/compute/predicates/__init__.py +++ b/graphistry/compute/predicates/__init__.py @@ -1 +0,0 @@ - From 3800a7b5e5115e0aa2cd0f5c564b7dafb8620918 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 3 Dec 2023 23:26:12 -0800 Subject: [PATCH 059/104] fix(docs) --- docs/source/conf.py | 34 +++++++++++ docs/source/graphistry.compute.predicates.rst | 58 +++++++++++++++++++ docs/source/graphistry.compute.rst | 17 +++++- 3 files changed, 108 insertions(+), 1 deletion(-) create mode 100644 docs/source/graphistry.compute.predicates.rst diff --git a/docs/source/conf.py b/docs/source/conf.py index b7748c38a2..5d182ca6d0 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -15,6 +15,7 @@ # sys.path.insert(0, os.path.abspath('.')) sys.path.insert(0, os.path.abspath("../..")) +sys.path.insert(0, os.path.abspath('../../')) import graphistry # -- Project information ----------------------------------------------------- @@ -47,6 +48,39 @@ ('py:class', '3'), ('py:class', ""), ('py:class', ""), + ('py:class', "graphistry.compute.predicates.ASTPredicate.ASTPredicate"), + ('py:class', 'graphistry.compute.predicates.categorical.Duplicated'), + ('py:class', 'graphistry.compute.predicates.is_in.IsIn'), + ('py:class', 'graphistry.compute.predicates.numeric.Between'), + ('py:class', 'graphistry.compute.predicates.numeric.EQ'), + ('py:class', 'graphistry.compute.predicates.numeric.GE'), + ('py:class', 'graphistry.compute.predicates.numeric.GT'), + ('py:class', 'graphistry.compute.predicates.numeric.IsNA'), + ('py:class', 'graphistry.compute.predicates.numeric.LE'), + ('py:class', 'graphistry.compute.predicates.numeric.LT'), + ('py:class', 'graphistry.compute.predicates.numeric.NE'), + ('py:class', 'graphistry.compute.predicates.numeric.NotNA'), + ('py:class', 'graphistry.compute.predicates.str.Contains'), + ('py:class', 'graphistry.compute.predicates.str.Endswith'), + ('py:class', 'graphistry.compute.predicates.str.IsAlnum'), + ('py:class', 'graphistry.compute.predicates.str.IsAlpha'), + ('py:class', 'graphistry.compute.predicates.str.IsDecimal'), + ('py:class', 'graphistry.compute.predicates.str.IsDigit'), + ('py:class', 'graphistry.compute.predicates.str.IsLower'), + ('py:class', 'graphistry.compute.predicates.str.IsNull'), + ('py:class', 'graphistry.compute.predicates.str.IsNumeric'), + ('py:class', 'graphistry.compute.predicates.str.IsSpace'), + ('py:class', 'graphistry.compute.predicates.str.IsTitle'), + ('py:class', 'graphistry.compute.predicates.str.IsUpper'), + ('py:class', 'graphistry.compute.predicates.str.Match'), + ('py:class', 'graphistry.compute.predicates.str.NotNull'), + ('py:class', 'graphistry.compute.predicates.str.Startswith'), + ('py:class', 'graphistry.compute.predicates.temporal.IsLeapYear'), + ('py:class', 'graphistry.compute.predicates.temporal.IsMonthEnd'), + ('py:class', 'graphistry.compute.predicates.temporal.IsMonthStart'), + ('py:class', 'graphistry.compute.predicates.temporal.IsQuarterEnd'), + ('py:class', 'graphistry.compute.predicates.temporal.IsQuarterStart'), + ('py:class', 'graphistry.compute.predicates.temporal.IsYearStart'), ('py:class', 'graphistry.Engine.Engine'), ('py:class', 'graphistry.gremlin.CosmosMixin'), ('py:class', 'graphistry.gremlin.GremlinMixin'), diff --git a/docs/source/graphistry.compute.predicates.rst b/docs/source/graphistry.compute.predicates.rst new file mode 100644 index 0000000000..5494a24da1 --- /dev/null +++ b/docs/source/graphistry.compute.predicates.rst @@ -0,0 +1,58 @@ +predicates module +------------------------------------------------ + +.. automodule:: graphistry.compute.predicates + :members: + :undoc-members: + :show-inheritance: + :noindex: + + +ASTPredicate +--------------- + +.. automodule:: graphistry.compute.predicates.ASTPredicate + :members: + :undoc-members: + :show-inheritance: + :noindex: + +categorical +--------------- +.. automodule:: graphistry.compute.predicates.categorical + :members: + :undoc-members: + :show-inheritance: + :noindex: + +is_in +--------------- +.. automodule:: graphistry.compute.predicates.is_in + :members: + :undoc-members: + :show-inheritance: + :noindex: + +numeric +--------------- +.. automodule:: graphistry.compute.predicates.numeric + :members: + :undoc-members: + :show-inheritance: + :noindex: + +str +-------------------- +.. automodule:: graphistry.compute.predicates.str + :members: + :undoc-members: + :show-inheritance: + :noindex: + +temporal +--------------- +.. automodule:: graphistry.compute.predicates.temporal + :members: + :undoc-members: + :show-inheritance: + :noindex: diff --git a/docs/source/graphistry.compute.rst b/docs/source/graphistry.compute.rst index c610034aab..87811ed705 100644 --- a/docs/source/graphistry.compute.rst +++ b/docs/source/graphistry.compute.rst @@ -1,3 +1,11 @@ +Compute Modules +--------------- + +.. toctree:: + :maxdepth: 2 + + graphistry.compute.predicates + ComputeMixin module ------------------------------------------------ @@ -7,7 +15,6 @@ ComputeMixin module :show-inheritance: :noindex: - Chain --------------- @@ -56,3 +63,11 @@ Hop :undoc-members: :show-inheritance: :noindex: + +predicates +--------------- +.. automodule:: graphistry.compute.predicates + :members: + :undoc-members: + :show-inheritance: + :noindex: From baad7383bb2489cd4b5b3032ac76692055adf0aa Mon Sep 17 00:00:00 2001 From: lmeyerov Date: Mon, 4 Dec 2023 11:10:01 -0800 Subject: [PATCH 060/104] fix(docs): readme md syntax --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0f0cefe3e0..757cfcb5b5 100644 --- a/README.md +++ b/README.md @@ -1265,7 +1265,7 @@ g2b = g2.hop( edge_match={"v": 1, "type": "z"}, destination_node_match={g2._node: "b"}) g2b = g2.hop( - source_node_query='n == "a", + source_node_query='n == "a"', edge_query='v == 1 and type == "z"', destination_node_query='n == "b"') From 2ac465a636b69024bb1e05b687080755fe19760a Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 4 Dec 2023 21:54:53 -0800 Subject: [PATCH 061/104] fix(imports): setup_logger and drop unused --- graphistry/compute/ComputeMixin.py | 3 ++- graphistry/compute/ast.py | 9 +++++---- graphistry/compute/chain.py | 12 +++++------- graphistry/compute/hop.py | 4 ++++ graphistry/dgl_utils.py | 2 +- graphistry/feature_utils.py | 4 +--- graphistry/features.py | 3 +-- graphistry/tests/test_compute_chain.py | 10 +++++----- graphistry/tests/test_dgl_utils.py | 2 +- 9 files changed, 25 insertions(+), 24 deletions(-) diff --git a/graphistry/compute/ComputeMixin.py b/graphistry/compute/ComputeMixin.py index 7a9b2f71c7..a5a0431f00 100644 --- a/graphistry/compute/ComputeMixin.py +++ b/graphistry/compute/ComputeMixin.py @@ -4,6 +4,7 @@ from graphistry.Engine import Engine from graphistry.Plottable import Plottable +from graphistry.util import setup_logger from .chain import chain as chain_base from .collapse import collapse_by from .hop import hop as hop_base @@ -12,7 +13,7 @@ filter_nodes_by_dict as filter_nodes_by_dict_base ) -logger = logging.getLogger("compute") +logger = setup_logger(__name__) if TYPE_CHECKING: MIXIN_BASE = Plottable diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 39695edc21..942856de45 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -1,7 +1,9 @@ -from typing import Any, List, Optional, cast +import logging +from typing import Optional, cast import pandas as pd from graphistry.Plottable import Plottable +from graphistry.util import setup_logger from .predicates.ASTPredicate import ASTPredicate from .predicates.is_in import ( is_in, IsIn @@ -47,9 +49,8 @@ ) from .filter_by_dict import filter_by_dict -import logging -logger = logging.getLogger(__name__) -#logger.setLevel(logging.DEBUG) + +logger = setup_logger(__name__) ############################################################################## diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index c8be5d12c1..a3132ca624 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -1,13 +1,11 @@ -from typing import cast, List, Optional, Tuple, Union +from typing import cast, List, Tuple import pandas as pd from graphistry.Plottable import Plottable +from graphistry.util import setup_logger from .ast import ASTObject, ASTNode, ASTEdge -from .filter_by_dict import filter_by_dict -import logging -logger = logging.getLogger(__name__) -#logger.setLevel(logging.DEBUG) +logger = setup_logger(__name__) ############################################################################### @@ -191,7 +189,7 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: added_edge_index = False - logger.debug('============ FORWARDS ============') + logger.debug('======================== FORWARDS ========================') # Forwards # This computes valid path *prefixes*, where each g nodes/edges is the path wavefront: @@ -214,7 +212,7 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: ) g_stack.append(g_step) - logger.debug('============ BACKWARDS ============') + logger.debug('======================== BACKWARDS ========================') # Backwards # Compute reverse and thus complete paths. Dropped nodes/edges are thus the incomplete path prefixes. diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index b14a7bc08a..93ea3c6d2f 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -1,9 +1,13 @@ +import logging from typing import List, Optional import pandas as pd from graphistry.Plottable import Plottable +from graphistry.util import setup_logger from .filter_by_dict import filter_by_dict +logger = setup_logger(__name__) + def query_if_not_none(query: Optional[str], df: pd.DataFrame) -> pd.DataFrame: if query is None: diff --git a/graphistry/dgl_utils.py b/graphistry/dgl_utils.py index 0999ea7982..56b5670f33 100644 --- a/graphistry/dgl_utils.py +++ b/graphistry/dgl_utils.py @@ -54,7 +54,7 @@ def lazy_torch_import_has_dependency(): return False, e, None -logger = setup_logger(name=__name__, verbose=config.VERBOSE) +logger = setup_logger(name=__name__) diff --git a/graphistry/feature_utils.py b/graphistry/feature_utils.py index 1ca5272df0..5d80e7e5bf 100644 --- a/graphistry/feature_utils.py +++ b/graphistry/feature_utils.py @@ -8,7 +8,6 @@ from functools import partial from typing import ( - Hashable, List, Union, Dict, @@ -16,7 +15,6 @@ Optional, Tuple, TYPE_CHECKING, - Type ) # noqa from typing_extensions import Literal # Literal native to py3.8+ @@ -27,7 +25,7 @@ from .ai_utils import infer_graph, infer_self_graph # add this inside classes and have a method that can set log level -logger = setup_logger(name=__name__, verbose=config.VERBOSE) +logger = setup_logger(__name__) if TYPE_CHECKING: MIXIN_BASE = ComputeMixin diff --git a/graphistry/features.py b/graphistry/features.py index 32e83a3a28..0ae4a49942 100644 --- a/graphistry/features.py +++ b/graphistry/features.py @@ -1,8 +1,7 @@ from .util import setup_logger -from .constants import VERBOSE, TRACE from .util import ModelDict -logger = setup_logger("graphistry.features", verbose=VERBOSE, fullpath=TRACE) +logger = setup_logger(__name__) # ############################################################### UNK = "UNK" diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index 8d92852447..c674edf06a 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -1,15 +1,15 @@ from functools import lru_cache from typing import Dict, List +import logging import pandas as pd + from common import NoAuthTestCase +from graphistry.compute.ast import n, e_forward, e_reverse, e_undirected, is_in from graphistry.tests.test_compute import CGFull - from graphistry.tests.test_compute_hops import hops_graph -from graphistry.compute.ast import n, e_forward, e_reverse, e_undirected, is_in +from graphistry.util import setup_logger -import logging -logger = logging.getLogger() -logger.setLevel(logging.DEBUG) +logger = setup_logger(__name__) @lru_cache(maxsize=1) diff --git a/graphistry/tests/test_dgl_utils.py b/graphistry/tests/test_dgl_utils.py index bf3610885b..760045eee6 100644 --- a/graphistry/tests/test_dgl_utils.py +++ b/graphistry/tests/test_dgl_utils.py @@ -11,7 +11,7 @@ if has_dgl: import torch -logger = setup_logger("test_DGL_utils", verbose=True) +logger = setup_logger(__name__) edf = pd.read_csv( "graphistry/tests/data/malware_capture_bot.csv", index_col=0, nrows=50 From 0480b01f9e556f29959e3cbc76225b2575c53416 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 4 Dec 2023 21:57:06 -0800 Subject: [PATCH 062/104] fix(hop): handle intermediate matches --- graphistry/compute/ast.py | 23 +++-- graphistry/compute/hop.py | 132 +++++++++++++++++++++++-- graphistry/tests/test_compute_chain.py | 37 +++++++ 3 files changed, 178 insertions(+), 14 deletions(-) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 942856de45..bc45c7fa4b 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -106,10 +106,9 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta if self._name is not None: out_g = out_g.nodes(out_g._nodes.assign(**{self._name: True})) - logger.debug(f'CALL NODE {self} ===>') - logger.debug(out_g._nodes) - logger.debug(out_g._edges) - logger.debug('----------------------------------------') + if logger.isEnabledFor(logging.DEBUG): + logger.debug('CALL NODE %s ====>\nnodes:\n%s\nedges:\n%s\n', self, out_g._nodes, out_g._edges) + logger.debug('----------------------------------------') return out_g @@ -171,6 +170,15 @@ def __repr__(self) -> str: def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: + if logger.isEnabledFor(logging.DEBUG): + logger.debug('----------------------------------------') + logger.debug('@CALL EDGE START {%s} ===>\n', self) + logger.debug('prev_node_wavefront:\n%s\n', prev_node_wavefront) + logger.debug('target_wave_front:\n%s\n', target_wave_front) + logger.debug('g._nodes:\n%s\n', g._nodes) + logger.debug('g._edges:\n%s\n', g._edges) + logger.debug('----------------------------------------') + out_g = g.hop( nodes=prev_node_wavefront, hops=self._hops, @@ -189,10 +197,9 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta if self._name is not None: out_g = out_g.edges(out_g._edges.assign(**{self._name: True})) - logger.debug(f'CALL EDGE {self} ===>') - logger.debug(out_g._nodes) - logger.debug(out_g._edges) - logger.debug('----------------------------------------') + if logger.isEnabledFor(logging.DEBUG): + logger.debug('/CALL EDGE END {%s} ===>\nnodes:\n%s\nedges:\n%s\n', self, out_g._nodes, out_g._edges) + logger.debug('----------------------------------------') return out_g diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 93ea3c6d2f..a94725b9bf 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -44,7 +44,7 @@ def hop(self: Plottable, destination_node_query: dataframe query to match nodes after hopping (including intermediate) edge_query: dataframe query to match edges before hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use) - target_wave_front: Only consider these nodes for reachability (primarily for internal use by reverse pass) + target_wave_front: Only consider these nodes for reachability, and for intermediate hops, also consider nodes (primarily for internal use by reverse pass) """ """ @@ -55,19 +55,45 @@ def hop(self: Plottable, """ + #TODO target_wave_front code also includes nodes for handling intermediate hops + # ... better to make an explicit param of allowed intermediates? (vs recording each intermediate hop) + + debugging_hop = True + + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('=======================') + logger.debug('======== HOP ==========') + logger.debug('nodes:\n%s', nodes) + logger.debug('self._nodes:\n%s', self._nodes) + logger.debug('self._edges:\n%s', self._edges) + logger.debug('hops: %s', hops) + logger.debug('to_fixed_point: %s', to_fixed_point) + logger.debug('direction: %s', direction) + logger.debug('edge_match: %s', edge_match) + logger.debug('source_node_match: %s', source_node_match) + logger.debug('destination_node_match: %s', destination_node_match) + logger.debug('source_node_query: %s', source_node_query) + logger.debug('destination_node_query: %s', destination_node_query) + logger.debug('edge_query: %s', edge_query) + logger.debug('return_as_wave_front: %s', return_as_wave_front) + logger.debug('target_wave_front:\n%s', target_wave_front) + logger.debug('---------------------') + if not to_fixed_point and not isinstance(hops, int): raise ValueError(f'Must provide hops int when to_fixed_point is False, received: {hops}') if direction not in ['forward', 'reverse', 'undirected']: raise ValueError(f'Invalid direction: "{direction}", must be one of: "forward" (default), "reverse", "undirected"') + + if target_wave_front is not None and nodes is None: + raise ValueError('target_wave_front requires nodes to target against (for intermediate hops)') if destination_node_match == {}: destination_node_match = None g2 = self.materialize_nodes() - if nodes is None: - nodes = g2._nodes + starting_nodes = nodes if nodes is not None else g2._nodes if g2._edge is None: if 'index' in g2._edges.columns: @@ -86,7 +112,7 @@ def hop(self: Plottable, hops_remaining = hops - wave_front = nodes[[g2._node]][:0] + wave_front = starting_nodes[[g2._node]][:0] matches_nodes = None matches_edges = edges_indexed[[EDGE_ID]][:0] @@ -94,8 +120,25 @@ def hop(self: Plottable, #richly-attributed subset for dest matching & return-enriching base_target_nodes = target_wave_front if target_wave_front is not None else g2._nodes + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ LOOP PRE ~~~~~~~~~~~') + logger.debug('starting_nodes:\n%s', starting_nodes) + logger.debug('g2._nodes:\n%s', g2._nodes) + logger.debug('g2._edges:\n%s', g2._edges) + logger.debug('edges_indexed:\n%s', edges_indexed) + logger.debug('=====================') + first_iter = True while True: + + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ LOOP STEP BEGIN ~~~~~~~~~~~') + logger.debug('hops_remaining: %s', hops_remaining) + logger.debug('wave_front:\n%s', wave_front) + logger.debug('matches_nodes:\n%s', matches_nodes) + logger.debug('matches_edges:\n%s', matches_edges) + logger.debug('first_iter: %s', first_iter) + if not to_fixed_point and hops_remaining is not None: if hops_remaining < 1: break @@ -105,12 +148,18 @@ def hop(self: Plottable, wave_front_iter : pd.DataFrame = query_if_not_none( source_node_query, filter_by_dict( - nodes if first_iter else wave_front.merge(nodes, on=g2._node, how='left'), + starting_nodes + if first_iter else + wave_front.merge(self._nodes, on=g2._node, how='left'), source_node_match ) )[[ g2._node ]] first_iter = False + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ LOOP STEP CONTINUE ~~~~~~~~~~~') + logger.debug('wave_front_iter:\n%s', wave_front_iter) + hop_edges_forward = None new_node_ids_forward = None if direction in ['forward', 'undirected']: @@ -121,6 +170,21 @@ def hop(self: Plottable, on=g2._node) [[g2._source, g2._destination, EDGE_ID]] ) + if target_wave_front is not None: + assert nodes is not None, "target_wave_front indicates nodes" + if hops_remaining: + intermediate_target_wave_front = pd.concat([ + target_wave_front[[g2._node]], + nodes[[g2._node]] + ], sort=False, ignore_index=True + ).drop_duplicates() + else: + intermediate_target_wave_front = target_wave_front[[g2._node]] + hop_edges_forward = hop_edges_forward.merge( + intermediate_target_wave_front.rename(columns={g2._node: g2._destination}), + how='inner', + on=g2._destination + ) new_node_ids_forward = hop_edges_forward[[g2._destination]].rename(columns={g2._destination: g2._node}).drop_duplicates() if destination_node_query is not None or destination_node_match is not None: @@ -136,10 +200,14 @@ def hop(self: Plottable, on=g2._destination ) + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('--- direction in [forward, undirected] ---') + logger.debug('hop_edges_forward:\n%s', hop_edges_forward) + logger.debug('new_node_ids_forward:\n%s', new_node_ids_forward) + hop_edges_reverse = None new_node_ids_reverse = None if direction in ['reverse', 'undirected']: - #TODO limit by target_wave_front if exists? hop_edges_reverse = ( wave_front_iter.merge( edges_indexed[[g2._destination, g2._source, EDGE_ID]].assign(**{g2._node: edges_indexed[g2._destination]}), @@ -147,6 +215,28 @@ def hop(self: Plottable, on=g2._node) [[g2._destination, g2._source, EDGE_ID]] ) + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('--- direction in [reverse, undirected] ---') + logger.debug('hop_edges_reverse basic:\n%s', hop_edges_reverse) + + if target_wave_front is not None: + assert nodes is not None, "target_wave_front indicates nodes" + if hops_remaining: + intermediate_target_wave_front = pd.concat([ + target_wave_front[[g2._node]], + nodes[[g2._node]] + ], sort=False, ignore_index=True + ).drop_duplicates() + else: + intermediate_target_wave_front = target_wave_front[[g2._node]] + hop_edges_reverse = hop_edges_reverse.merge( + intermediate_target_wave_front.rename(columns={g2._node: g2._source}), + how='inner', + on=g2._source + ) + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('hop_edges_reverse filtered by target_wave_front:\n%s', hop_edges_reverse) + new_node_ids_reverse = hop_edges_reverse[[g2._source]].rename(columns={g2._source: g2._node}).drop_duplicates() if destination_node_query is not None or destination_node_match is not None: @@ -161,6 +251,12 @@ def hop(self: Plottable, how='inner', on=g2._source ) + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('hop_edges_reverse filtered by destination predicates:\n%s', hop_edges_reverse) + + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('hop_edges_reverse:\n%s', hop_edges_reverse) + logger.debug('new_node_ids_reverse:\n%s', new_node_ids_reverse) mt : List[pd.DataFrame] = [] # help mypy @@ -175,6 +271,12 @@ def hop(self: Plottable, + ( [ new_node_ids_forward ] if new_node_ids_forward is not None else mt ) # noqa: W503 + ( [ new_node_ids_reverse] if new_node_ids_reverse is not None else mt ), # noqa: W503 ignore_index=True, sort=False).drop_duplicates() + + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ LOOP STEP MERGES 1 ~~~~~~~~~~~') + logger.debug('matches_edges:\n%s', matches_edges) + logger.debug('new_node_ids:\n%s', new_node_ids) + # Finally include all initial root nodes matched against, now that edge triples satisfy all source/dest/edge predicates # Only run first iteration b/c root nodes already accounted for in subsequent # In wavefront mode, skip, as we only want to return reached nodes @@ -192,6 +294,10 @@ def hop(self: Plottable, else mt), ignore_index=True, sort=False).drop_duplicates(subset=[g2._node]) + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ LOOP STEP MERGES 2 ~~~~~~~~~~~') + logger.debug('matches_edges:\n%s', matches_edges) + combined_node_ids = pd.concat([matches_nodes, new_node_ids], ignore_index=True, sort=False).drop_duplicates() if len(combined_node_ids) == len(matches_nodes): @@ -201,6 +307,13 @@ def hop(self: Plottable, wave_front = new_node_ids matches_nodes = combined_node_ids + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ LOOP STEP POST ~~~~~~~~~~~') + logger.debug('matches_nodes:\n%s', matches_nodes) + logger.debug('combined_node_ids:\n%s', combined_node_ids) + logger.debug('wave_front:\n%s', wave_front) + logger.debug('matches_nodes:\n%s', matches_nodes) + #hydrate edges final_edges = edges_indexed.merge(matches_edges, on=EDGE_ID, how='inner') if EDGE_ID not in self._edges: @@ -219,4 +332,11 @@ def hop(self: Plottable, how='inner') g_out = g_out.nodes(final_nodes) + if debugging_hop and logger.isEnabledFor(logging.DEBUG): + logger.debug('~~~~~~~~~~ HOP OUTPUT ~~~~~~~~~~~') + logger.debug('nodes:\n%s', g_out._nodes) + logger.debug('edges:\n%s', g_out._edges) + logger.debug('======== /HOP =============') + logger.debug('==========================') + return g_out diff --git a/graphistry/tests/test_compute_chain.py b/graphistry/tests/test_compute_chain.py index c674edf06a..3f98324100 100644 --- a/graphistry/tests/test_compute_chain.py +++ b/graphistry/tests/test_compute_chain.py @@ -403,6 +403,43 @@ def test_hop_chain_1_end_undirected(self): ]) compare_graphs(g3_undirected_chain_closed, g_out_nodes, g_out_edges) + def test_tricky_topology_1(self): + + nodes = pd.DataFrame({ + 'n': ['a1', 'a2', 'b1', 'b2'], + 't': [0, 0, 1, 1] + }) + + edges = pd.DataFrame({ + 's': ['a1', 'a1' ], + 'd': ['a2', 'b1'] + }) + + n_out = pd.DataFrame({ + 'n': ['a1', 'a2'], + 't': [0, 0] + }) + + e_out = pd.DataFrame({ + 's': ['a1'], + 'd': ['a2'] + }) + + g = CGFull().edges(edges, 's', 'd').nodes(nodes, 'n') + + g2 = g.chain([ + n({'t': 0}), + e_undirected(), + n({'t': 0}) + ]) + + if logger.isEnabledFor(logging.DEBUG): + logger.debug('\nNODES\n') + logger.debug(g2._nodes.to_dict(orient='records')) + logger.debug('\nEDGES\n') + logger.debug(g2._edges.to_dict(orient='records')) + + compare_graphs(g2, n_out.to_dict(orient='records'), e_out.to_dict(orient='records')) class TestComputeChainWavefront2Mixin(NoAuthTestCase): """ From ff5eefcf45b7fdcb3d16c835207142ada9a5680a Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 4 Dec 2023 22:07:37 -0800 Subject: [PATCH 063/104] refactor(setup_logger): handlers --- CHANGELOG.md | 4 ++++ graphistry/util.py | 41 +++++++++++++++++++++++++---------------- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index af4b890248..c31b75edfd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -26,12 +26,16 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * chain/hop: source_node_match was being mishandled when multiple node attributes exist * chain: backwards validation pass was too permissive; add `target_wave_front` check` * hop: multi-hops with `source_node_match` specified was not checking intermediate hops +* hop: multi-hops reverse validation was mishandling intermediate nodes * compute logging no longer default-overrides level to DEBUG ### Changed * refactor: move `is_in`, `IsIn` implementations to `graphistry.ast.predicates`; old imports preserved * `IsIn` now implements `ASTPredicate` +* Refactor: use `setup_logger(__name__)` more consistently instead of `logging.getLogger(__name__)` +* Refactor: drop unused imports +* Redo `setup_logger()` to activate formatted stream handler iff verbose / LOG_LEVEL ### Docs diff --git a/graphistry/util.py b/graphistry/util.py index 025e8853b3..3498b33572 100644 --- a/graphistry/util.py +++ b/graphistry/util.py @@ -17,25 +17,34 @@ # ##################################### +@lru_cache(maxsize=1) +def get_handler(short=False): + if short: + formatter = logging.Formatter("%(filename)s:%(lineno)s %(message)s\n") + else: + formatter = logging.Formatter("\n[%(filename)s:%(lineno)s - %(funcName)20s() ] %(message)s\n") + handler = logging.StreamHandler() + handler.setFormatter(formatter) + return handler -def global_logger(): - logger = logging.getLogger() - return logger +def setup_logger(name='', verbose=VERBOSE, fullpath=TRACE): + logger = logging.getLogger(name) + + if verbose is not None: + if verbose: + logger.setLevel(logging.DEBUG) + else: + logger.setLevel(logging.ERROR) + elif os.environ.get('LOG_LEVEL', None) is not None: + if os.environ['LOG_LEVEL'] == 'TRACE': + logger.setLevel(logging.DEBUG) + else: + logger.setLevel(os.environ['LOG_LEVEL']) + if not logger.handlers and (verbose is not None or os.environ.get('LOG_LEVEL', None) is not None): + logger.addHandler(get_handler(short=False)) -def setup_logger(name, verbose=VERBOSE, fullpath=TRACE): - # if fullpath: - # FORMAT = "[%(filename)s:%(lineno)s - %(funcName)20s() ]\n %(message)s\n" - # else: - # FORMAT = " %(message)s\n" - # logging.basicConfig(format=FORMAT) - # logger = logging.getLogger()#f'graphistry.{name}') - # if verbose is None: - # logger.setLevel(logging.ERROR) - # else: - # logger.setLevel(logging.INFO if verbose else logging.DEBUG) - # return logger - return global_logger() + return logger # ##################################### From c585802474fe87f98ee0a76f1b57a2b2cf76354b Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 4 Dec 2023 22:10:44 -0800 Subject: [PATCH 064/104] infra(docker tests): propagate LOG_LEVEL --- CHANGELOG.md | 4 ++++ docker/test-cpu-local.sh | 2 ++ 2 files changed, 6 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index c31b75edfd..cc9c977aad 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * hop: multi-hops reverse validation was mishandling intermediate nodes * compute logging no longer default-overrides level to DEBUG +### Infra + +* Docker tests support LOG_LEVEL + ### Changed * refactor: move `is_in`, `IsIn` implementations to `graphistry.ast.predicates`; old imports preserved diff --git a/docker/test-cpu-local.sh b/docker/test-cpu-local.sh index 9d7392605d..7f6279495e 100755 --- a/docker/test-cpu-local.sh +++ b/docker/test-cpu-local.sh @@ -11,6 +11,7 @@ WITH_TYPECHECK=${WITH_TYPECHECK:-1} WITH_TEST=${WITH_TEST:-1} WITH_BUILD=${WITH_BUILD:-1} TEST_CPU_VERSION=${TEST_CPU_VERSION:-latest} +LOG_LEVEL=${LOG_LEVEL:-DEBUG} SENTENCE_TRANSFORMER=${SENTENCE_TRANSFORMER-average_word_embeddings_komninos} NETWORK="" @@ -46,6 +47,7 @@ docker run \ -e WITH_TYPECHECK=$WITH_TYPECHECK \ -e WITH_BUILD=$WITH_BUILD \ -e WITH_TEST=$WITH_TEST \ + -e LOG_LEVEL=$LOG_LEVEL \ -v "`pwd`/../graphistry:/opt/pygraphistry/graphistry:ro" \ --rm \ ${NETWORK} \ From c12c2cefd2f7762d7b358bfec4f561fc72c8a005 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 4 Dec 2023 23:27:10 -0800 Subject: [PATCH 065/104] docs(changelog) --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index cc9c977aad..b59cae05d4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +## [0.30.0 - 2023-12-04] + ### Added * chain/hop: `is_in()` membership predicate, `.chain([ n({'type': is_in(['a', 'b'])}) ])` From cd1d75fd077531b44e4f7e353b45fe4d9abe993c Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 5 Dec 2023 00:21:50 -0800 Subject: [PATCH 066/104] fix(readthedocs) --- .readthedocs.yml | 5 +++++ CHANGELOG.md | 4 ++++ 2 files changed, 9 insertions(+) diff --git a/.readthedocs.yml b/.readthedocs.yml index f68f0b53f7..e3932a2520 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -9,6 +9,11 @@ version: 2 sphinx: configuration: docs/source/conf.py +build: + os: ubuntu-22.04 + tools: + python: "3.8" + # Optionally build your docs in additional formats such as PDF formats: - pdf diff --git a/CHANGELOG.md b/CHANGELOG.md index 28f9b01d91..56d12025c6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Docs + +* Update readthedocs yml to work around ReadTheDocs v2 yml interpretation regressions + ## [0.30.0 - 2023-12-04] ### Added From 37a231452bb67d82a4ba274858744dcf7b355925 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 5 Dec 2023 00:28:15 -0800 Subject: [PATCH 067/104] fix(readthedocs): changes to v2 format interp --- .readthedocs.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.readthedocs.yml b/.readthedocs.yml index e3932a2520..609e875f73 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -21,7 +21,6 @@ formats: - epub python: - version: "3.8" install: - method: pip path: . From 302083e91ea63b9ea395f987a7ef537303a25f5a Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 5 Dec 2023 00:58:26 -0800 Subject: [PATCH 068/104] fix(markdownlint) --- .github/workflows/ci.yml | 2 +- .markdownlint.yaml | 267 +++++++++++++++++++++++++++++++++++++++ CHANGELOG.md | 2 + README.md | 16 +-- 4 files changed, 276 insertions(+), 11 deletions(-) create mode 100644 .markdownlint.yaml diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 0b110f586c..15a357a183 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -303,6 +303,6 @@ jobs: - name: Test building docs continue-on-error: true run: | - docker run --rm -v "$(pwd)/README.md:/README.md:ro" avtodev/markdown-lint:v1 README.md + docker run --rm -v "$(pwd)/README.md:/workdir/README.md:ro" -v "$(pwd)/.markdownlint.yaml:/workdir/.markdownlint.yaml:ro" ghcr.io/igorshubovych/markdownlint-cli:v0.37.0 README.md diff --git a/.markdownlint.yaml b/.markdownlint.yaml new file mode 100644 index 0000000000..02794720db --- /dev/null +++ b/.markdownlint.yaml @@ -0,0 +1,267 @@ +# ------------------------------------------------------------------------------ +# Based on markdownlint/schema/.markdownlint.yml +# ------------------------------------------------------------------------------ + + +# Example markdownlint configuration with all properties set to their default value + +# Default state for all rules +default: true + +# Path to configuration file to extend +extends: null + +# MD001/heading-increment : Heading levels should only increment by one level at a time : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md001.md +MD001: true + +# MD003/heading-style : Heading style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md003.md +MD003: + # Heading style + style: "consistent" + +# MD004/ul-style : Unordered list style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md004.md +MD004: + # List style + style: "consistent" + +# MD005/list-indent : Inconsistent indentation for list items at the same level : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md005.md +MD005: true + +# MD007/ul-indent : Unordered list indentation : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md007.md +MD007: + # Spaces for indent + indent: 2 + # Whether to indent the first level of the list + start_indented: false + # Spaces for first level indent (when start_indented is set) + start_indent: 2 + +# MD009/no-trailing-spaces : Trailing spaces : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md009.md +MD009: + # Spaces for line break + br_spaces: 2 + # Allow spaces for empty lines in list items + list_item_empty_lines: false + # Include unnecessary breaks + strict: false + +# MD010/no-hard-tabs : Hard tabs : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md010.md +MD010: + # Include code blocks + code_blocks: true + # Fenced code languages to ignore + ignore_code_languages: [] + # Number of spaces for each hard tab + spaces_per_tab: 1 + +# MD011/no-reversed-links : Reversed link syntax : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md011.md +MD011: true + +# MD012/no-multiple-blanks : Multiple consecutive blank lines : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md012.md +MD012: + # Consecutive blank lines + maximum: 1 + +# MD013/line-length : Line length : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md013.md +MD013: + # Number of characters + line_length: 2000 + # Number of characters for headings + heading_line_length: 2000 + # Number of characters for code blocks + code_block_line_length: 2000 + # Include code blocks + code_blocks: true + # Include tables + tables: false + # Include headings + headings: true + # Strict length checking + strict: false + # Stern length checking + stern: false + +# MD014/commands-show-output : Dollar signs used before commands without showing output : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md014.md +MD014: true + +# MD018/no-missing-space-atx : No space after hash on atx style heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md018.md +MD018: true + +# MD019/no-multiple-space-atx : Multiple spaces after hash on atx style heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md019.md +MD019: true + +# MD020/no-missing-space-closed-atx : No space inside hashes on closed atx style heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md020.md +MD020: true + +# MD021/no-multiple-space-closed-atx : Multiple spaces inside hashes on closed atx style heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md021.md +MD021: true + +# MD022/blanks-around-headings : Headings should be surrounded by blank lines : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md022.md +MD022: + # Blank lines above heading + lines_above: 1 + # Blank lines below heading + lines_below: 1 + +# MD023/heading-start-left : Headings must start at the beginning of the line : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md023.md +MD023: true + +# MD024/no-duplicate-heading : Multiple headings with the same content : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md024.md +MD024: + # Only check sibling headings + allow_different_nesting: false + # Only check sibling headings + siblings_only: false + +# MD025/single-title/single-h1 : Multiple top-level headings in the same document : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md025.md +MD025: + # Heading level + level: 1 + # RegExp for matching title in front matter + front_matter_title: "^\\s*title\\s*[:=]" + +# MD026/no-trailing-punctuation : Trailing punctuation in heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md026.md +MD026: + # Punctuation characters + punctuation: ".,;:!。,;:!" + +# MD027/no-multiple-space-blockquote : Multiple spaces after blockquote symbol : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md027.md +MD027: true + +# MD028/no-blanks-blockquote : Blank line inside blockquote : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md028.md +MD028: true + +# MD029/ol-prefix : Ordered list item prefix : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md029.md +MD029: + # List style + style: "one_or_ordered" + +# MD030/list-marker-space : Spaces after list markers : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md030.md +MD030: + # Spaces for single-line unordered list items + ul_single: 1 + # Spaces for single-line ordered list items + ol_single: 1 + # Spaces for multi-line unordered list items + ul_multi: 1 + # Spaces for multi-line ordered list items + ol_multi: 1 + +# MD031/blanks-around-fences : Fenced code blocks should be surrounded by blank lines : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md031.md +MD031: + # Include list items + list_items: true + +# MD032/blanks-around-lists : Lists should be surrounded by blank lines : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md032.md +MD032: true + +# MD033/no-inline-html : Inline HTML : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md033.md +MD033: + # Allowed elements + allowed_elements: [table, tr, td, img, em,br, a] + +# MD034/no-bare-urls : Bare URL used : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md034.md +MD034: true + +# MD035/hr-style : Horizontal rule style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md035.md +MD035: + # Horizontal rule style + style: "consistent" + +# MD036/no-emphasis-as-heading : Emphasis used instead of a heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md036.md +MD036: + # Punctuation characters + punctuation: ".,;:!?。,;:!?" + +# MD037/no-space-in-emphasis : Spaces inside emphasis markers : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md037.md +MD037: true + +# MD038/no-space-in-code : Spaces inside code span elements : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md038.md +MD038: true + +# MD039/no-space-in-links : Spaces inside link text : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md039.md +MD039: true + +# MD040/fenced-code-language : Fenced code blocks should have a language specified : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md040.md +MD040: + # List of languages + allowed_languages: [] + # Require language only + language_only: false + +# MD041/first-line-heading/first-line-h1 : First line in a file should be a top-level heading : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md041.md +MD041: + # Heading level + level: 1 + # RegExp for matching title in front matter + front_matter_title: "^\\s*title\\s*[:=]" + +# MD042/no-empty-links : No empty links : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md042.md +MD042: true + +# MD043/required-headings : Required heading structure : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md043.md +MD043: false + +# MD044/proper-names : Proper names should have the correct capitalization : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md044.md +MD044: + # List of proper names + names: [] + # Include code blocks + code_blocks: true + # Include HTML elements + html_elements: true + +# MD045/no-alt-text : Images should have alternate text (alt text) : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md045.md +MD045: true + +# MD046/code-block-style : Code block style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md046.md +MD046: + # Block style + style: "consistent" + +# MD047/single-trailing-newline : Files should end with a single newline character : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md047.md +MD047: true + +# MD048/code-fence-style : Code fence style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md048.md +MD048: + # Code fence style + style: "consistent" + +# MD049/emphasis-style : Emphasis style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md049.md +MD049: + # Emphasis style + style: "consistent" + +# MD050/strong-style : Strong style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md050.md +MD050: + # Strong style + style: "consistent" + +# MD051/link-fragments : Link fragments should be valid : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md051.md +MD051: true + +# MD052/reference-links-images : Reference links and images should use a label that is defined : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md052.md +MD052: + # Include shortcut syntax + shortcut_syntax: false + +# MD053/link-image-reference-definitions : Link and image reference definitions should be needed : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md053.md +MD053: + # Ignored definitions + ignored_definitions: + - "//" + +# MD054/link-image-style : Link and image style : https://github.com/DavidAnson/markdownlint/blob/v0.32.1/doc/md054.md +MD054: + # Allow autolinks + autolink: true + # Allow inline links and images + inline: true + # Allow full reference links and images + full: true + # Allow collapsed reference links and images + collapsed: true + # Allow shortcut reference links and images + shortcut: true + # Allow URLs as inline links + url_inline: true \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index 56d12025c6..96fce3472d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Docs * Update readthedocs yml to work around ReadTheDocs v2 yml interpretation regressions +* Make README.md pass markdownlint +* Switch markdownlint docker channel to official and pin ## [0.30.0 - 2023-12-04] diff --git a/README.md b/README.md index 757cfcb5b5..5818b142a2 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,7 @@ You can use PyGraphistry with traditional Python data sources like CSVs, SQL, Ne -## PyGraphistry is... +## **PyGraphistry is:** * **Fast & gorgeous:** Interactively cluster, filter, inspect large amounts of data, and zip through timebars. It clusters large graphs with a descendant of the gorgeous ForceAtlas2 layout algorithm introduced in Gephi. Our data explorer connects to Graphistry's GPU cluster to layout and render hundreds of thousand of nodes+edges in your browser at unparalleled speeds. @@ -410,7 +410,7 @@ Automatically and intelligently transform text, numbers, booleans, and other for preds = model.predict(X_new) ``` - * Encode model definitions and compare models against each other +* Encode model definitions and compare models against each other ```python # graphistry @@ -434,7 +434,6 @@ Automatically and intelligently transform text, numbers, booleans, and other for # compare g2 vs g3 or add to different pipelines ``` - See `help(g.featurize)` for more options ### [sklearn-based UMAP](https://umap-learn.readthedocs.io/en/latest/), [cuML-based UMAP](https://docs.rapids.ai/api/cuml/stable/api.html?highlight=umap#cuml.UMAP) @@ -455,6 +454,7 @@ See `help(g.featurize)` for more options new_df = pd.read_csv(...) embeddings, X_new, _ = g.transform_umap(new_df, None, kind='nodes', return_graph=False) ``` + * Infer a new graph from new data using the old umap coordinates to run inference without having to train a new umap model. ```python @@ -466,7 +466,6 @@ See `help(g.featurize)` for more options g3 = g.transform_umap(new_df, return_graph=True, merge_policy=True) g3.plot() # useful to see how new data connects to old -- play with `sample` and `n_neighbors` to control how much of old to include ``` - * UMAP supports many options, such as supervised mode, working on a subset of columns, and passing arguments to underlying `featurize()` and UMAP implementations (see `help(g.umap)`): @@ -551,8 +550,7 @@ GNN support is rapidly evolving, please contact the team directly or on Slack fo ``` - -* If edges are not given, `g.umap(..)` will supply them: +* If edges are not given, `g.umap(..)` will supply them: ```python ndf = pd.read_csv(nodes.csv) @@ -561,7 +559,7 @@ GNN support is rapidly evolving, please contact the team directly or on Slack fo g2.search_graph('my natural language query', ...).plot() ``` - + See `help(g.search_graph)` for options ### Knowledge Graph Embeddings @@ -617,7 +615,7 @@ See `help(g.search_graph)` for options See `help(g.embed)`, `help(g.predict_links)` , or `help(g.predict_links_all)` for options -### DBSCAN +### DBSCAN * Enrich UMAP embeddings or featurization dataframe with GPU or CPU DBSCAN @@ -1165,8 +1163,6 @@ Both `hop()` and `chain()` match dictionary expressions support dataframe series * numeric: gt, lt, ge, le, eq, ne, between, isna, notna * string: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull - - #### Table to graph ```python From cf5d3096dcff4244a551ce2f5c8a68e2fea23fea Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 5 Dec 2023 00:58:54 -0800 Subject: [PATCH 069/104] docs(changelog) --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 96fce3472d..a05d5b6d5e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +## [0.30.1 - 2023-12-05] + ### Docs * Update readthedocs yml to work around ReadTheDocs v2 yml interpretation regressions From ff8f7ee9f532bd9b4d67c1e37b74a9482351523e Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 5 Dec 2023 01:05:42 -0800 Subject: [PATCH 070/104] fix(docs): match tag --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a05d5b6d5e..6b1a330bf8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] -## [0.30.1 - 2023-12-05] +## [0.31.0 - 2023-12-05] ### Docs From a2992c885519d7e1f828f8659532834e0341cb12 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 5 Dec 2023 01:06:23 -0800 Subject: [PATCH 071/104] docs(version): clean as 0.31.1 --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6b1a330bf8..6642a01b8c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] -## [0.31.0 - 2023-12-05] +## [0.31.1 - 2023-12-05] ### Docs From d96039e25b8aacf88402e955487329f192fc2d46 Mon Sep 17 00:00:00 2001 From: Akshat Balyan <89499072+B4K2@users.noreply.github.com> Date: Thu, 7 Dec 2023 18:40:56 +0530 Subject: [PATCH 072/104] Update hop_and_chain_graph_pattern_mining.ipynb --- .../hop_and_chain_graph_pattern_mining.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb b/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb index af8024a9eb..98cc207cdb 100644 --- a/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb +++ b/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb @@ -183,7 +183,7 @@ " for ii, j in enumerate(data[0]['outList'][i]):\n", " edges.append({\n", " 'from': name,\n", - " 'to': names[j],\n", + " 'to': data[0]['usernameList'][j],\n", " 'weight': data[0]['outWeight'][i][ii]\n", " })\n", "edges_df = pd.DataFrame(edges)\n", @@ -2733,4 +2733,4 @@ "outputs": [] } ] -} \ No newline at end of file +} From 5a48d95cf615c330f99469bcc26d99c6a02fa2e2 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Wed, 20 Dec 2023 14:30:40 -0600 Subject: [PATCH 073/104] fix: test change for the register command to use markdown instead of HTML to allow viewing for databricks users --- graphistry/pygraphistry.py | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index f4d00a1b23..2346ab1ab0 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -147,7 +147,7 @@ def login(username, password, org_name=None, fail_silent=False): """Authenticate and set token for reuse (api=3). If token_refresh_ms (default: 10min), auto-refreshes token. By default, must be reinvoked within 24hr.""" logger.debug("@PyGraphistry login : org_name :{} vs PyGraphistry.org_name() : {}".format(org_name, PyGraphistry.org_name())) - + if not org_name: org_name = PyGraphistry.org_name() @@ -167,7 +167,7 @@ def login(username, password, org_name=None, fail_silent=False): .login(username, password, org_name) .token ) - + logger.debug("@PyGraphistry login After ArrowUploader.login: org_name :{} vs PyGraphistry.org_name() : {}".format(org_name, PyGraphistry.org_name())) PyGraphistry.api_token(token) @@ -246,12 +246,12 @@ def sso_login(org_name=None, idp_name=None, sso_timeout=SSO_GET_TOKEN_ELAPSE_SEC auth_url = arrow_uploader.sso_auth_url # print("auth_url : {}".format(auth_url)) if auth_url and not PyGraphistry.api_token(): - PyGraphistry._handle_auth_url(auth_url, sso_timeout, sso_opt_into_type) + PyGraphistry._handle_auth_url(auth_url, sso_timeout, sso_opt_into_type) return auth_url @staticmethod def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): - """Internal function to handle what to do with the auth_url + """Internal function to handle what to do with the auth_url based on the client mode python/ipython console or notebook. :param auth_url: SSO auth url retrieved via API @@ -270,7 +270,8 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): if in_ipython() or in_databricks() or sso_opt_into_type == 'display': # If run in notebook, just display the HTML # from IPython.core.display import HTML from IPython.display import display, HTML - display(HTML(f'Login SSO')) + display(HTML(f'old: Login SSO')) + display(Markdown(f'[new: Login SSO]({auth_url})")) print("Please click the above URL to open browser to login") print(f"If you cannot see the URL, please open browser, browse to this URL: {auth_url}") print("Please close browser tab after SSO login to back to notebook") @@ -290,7 +291,7 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): time.sleep(1) elapsed_time = 1 token = None - + while True: token, org_name = PyGraphistry._sso_get_token() try: @@ -328,7 +329,7 @@ def sso_get_token(): # set org_name to sso org PyGraphistry._config['org_name'] = org_name return token - + @staticmethod def _sso_get_token(): token = None @@ -513,7 +514,7 @@ def api_version(value=None): """Set or get the API version: 1 for 1.0 (deprecated), 3 for 2.0. Setting api=2 (protobuf) fully deprecated from the PyGraphistry client. Also set via environment variable GRAPHISTRY_API_VERSION.""" - + import re if value is None: #if set by env var, interpret @@ -571,7 +572,7 @@ def register( idp_name: Optional[str] = None, is_sso_login: Optional[bool] = False, sso_timeout: Optional[int] = SSO_GET_TOKEN_ELAPSE_SECONDS, - sso_opt_into_type: Optional[Literal["display", "browser"]] = None + sso_opt_into_type: Optional[Literal["display", "browser"]] = None ): """API key registration and server selection @@ -688,7 +689,7 @@ def register( PyGraphistry.set_bolt_driver(bolt) # Reset token creds PyGraphistry.__reset_token_creds_in_memory() - + if not (username is None) and not (password is None): PyGraphistry.login(username, password, org_name) PyGraphistry.api_token(token or PyGraphistry._config['api_token']) @@ -718,7 +719,7 @@ def __check_login_type_to_reset_token_creds( ): if origin_login_type != new_login_type: PyGraphistry.__reset_token_creds_in_memory() - + @staticmethod def privacy( mode: Optional[Mode] = None, @@ -1962,7 +1963,7 @@ def nodes(nodes: Union[Callable, Any], node=None, *args, **kwargs) -> Plottable: **Example** :: - + import graphistry def sample_nodes(g, n): @@ -2308,7 +2309,7 @@ def org_name(value=None): # setter, use switch_org instead if 'org_name' not in PyGraphistry._config or value is not PyGraphistry._config['org_name']: - try: + try: PyGraphistry.switch_org(value.strip()) # PyGraphistry._config['org_name'] = value.strip() except: @@ -2351,7 +2352,7 @@ def scene_settings( point_size: Optional[float] = None, edge_curvature: Optional[float] = None, edge_opacity: Optional[float] = None, - point_opacity: Optional[float] = None, + point_opacity: Optional[float] = None, ): return Plotter().scene_settings( menu, @@ -2422,7 +2423,7 @@ def _handle_api_response(response): logger.error('Error: %s', response, exc_info=True) raise Exception("Unknown Error") - + client_protocol_hostname = PyGraphistry.client_protocol_hostname From 778421b1203157951209508e001168fc8dad30b3 Mon Sep 17 00:00:00 2001 From: Thomas Cook Date: Wed, 20 Dec 2023 14:41:18 -0600 Subject: [PATCH 074/104] fix syntax error --- graphistry/pygraphistry.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index 2346ab1ab0..d03d7afb36 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -271,7 +271,7 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): # from IPython.core.display import HTML from IPython.display import display, HTML display(HTML(f'old: Login SSO')) - display(Markdown(f'[new: Login SSO]({auth_url})")) + display(Markdown(f'[new: Login SSO]({auth_url})')) print("Please click the above URL to open browser to login") print(f"If you cannot see the URL, please open browser, browse to this URL: {auth_url}") print("Please close browser tab after SSO login to back to notebook") From b8b005453696507958f27bab1fabc2da1da1c163 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Wed, 20 Dec 2023 18:54:32 -0800 Subject: [PATCH 075/104] fix(auth url): revert broken master commits --- graphistry/pygraphistry.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/graphistry/pygraphistry.py b/graphistry/pygraphistry.py index d03d7afb36..98916f6a01 100644 --- a/graphistry/pygraphistry.py +++ b/graphistry/pygraphistry.py @@ -270,8 +270,7 @@ def _handle_auth_url(auth_url, sso_timeout, sso_opt_into_type): if in_ipython() or in_databricks() or sso_opt_into_type == 'display': # If run in notebook, just display the HTML # from IPython.core.display import HTML from IPython.display import display, HTML - display(HTML(f'old: Login SSO')) - display(Markdown(f'[new: Login SSO]({auth_url})')) + display(HTML(f'Login SSO')) print("Please click the above URL to open browser to login") print(f"If you cannot see the URL, please open browser, browse to this URL: {auth_url}") print("Please close browser tab after SSO login to back to notebook") From eea5eeebdf28418178e2ef22c45f120cb713f157 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Wed, 20 Dec 2023 20:33:22 -0800 Subject: [PATCH 076/104] docs(gfql) --- CHANGELOG.md | 4 ++++ README.md | 10 +++++----- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6642a01b8c..68d03aaf56 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Docs + +* GFQL in readme.md + ## [0.31.1 - 2023-12-05] ### Docs diff --git a/README.md b/README.md index 5818b142a2..c790ac7d79 100644 --- a/README.md +++ b/README.md @@ -147,9 +147,9 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)) +* GFQL: Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)) - Run Cypher-style graph queries natively on dataframes without going to a database or Java: + Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: ```python from graphistry import n, e_undirected, is_in @@ -1133,7 +1133,7 @@ g2.plot() # nodes are values from cols s, d, k1 destination_node_match={"k2": 2}, destination_node_query='k2 == 2 or k2 == 4', ) - .chain([ # filter to subgraph + .chain([ # filter to subgraph with Cypher-style GFQL n(), n({'k2': 0, "m": 'ok'}), #specific values n({'type': is_in(["type1", "type2"])}), #multiple valid values @@ -1156,7 +1156,7 @@ g2.plot() # nodes are values from cols s, d, k1 .collapse(node='some_id', column='some_col', attribute='some val') ``` -Both `hop()` and `chain()` match dictionary expressions support dataframe series *predicates*. The above examples show `is_in([x, y, z, ...])`. Additional predicates include: +Both `hop()` and `chain()` (GFQL) match dictionary expressions support dataframe series *predicates*. The above examples show `is_in([x, y, z, ...])`. Additional predicates include: * categorical: is_in, duplicated * temporal: is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end @@ -1233,7 +1233,7 @@ assert 'pagerank' in g2._nodes.columns #### Graph pattern matching -PyGraphistry supports a PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java +PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) From 8ee50de062212adac3f60e39f0be6b55875bde0c Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:27:16 -0800 Subject: [PATCH 077/104] feat(is_year_end): add missing predicate --- CHANGELOG.md | 3 +++ docs/source/conf.py | 1 + graphistry/__init__.py | 1 + graphistry/compute/__init__.py | 1 + graphistry/compute/ast.py | 1 + 5 files changed, 7 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 68d03aaf56..c3570d37e2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,9 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Added +* GFQL predicate `is_year_end` + ### Docs * GFQL in readme.md diff --git a/docs/source/conf.py b/docs/source/conf.py index 5d182ca6d0..b3f8ad6cb7 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -81,6 +81,7 @@ ('py:class', 'graphistry.compute.predicates.temporal.IsQuarterEnd'), ('py:class', 'graphistry.compute.predicates.temporal.IsQuarterStart'), ('py:class', 'graphistry.compute.predicates.temporal.IsYearStart'), + ('py:class', 'graphistry.compute.predicates.temporal.IsYearEnd'), ('py:class', 'graphistry.Engine.Engine'), ('py:class', 'graphistry.gremlin.CosmosMixin'), ('py:class', 'graphistry.gremlin.GremlinMixin'), diff --git a/graphistry/__init__.py b/graphistry/__init__.py index 246fdf6cb7..c3e5f6610d 100644 --- a/graphistry/__init__.py +++ b/graphistry/__init__.py @@ -61,6 +61,7 @@ is_quarter_start, IsQuarterStart, is_quarter_end, IsQuarterEnd, is_year_start, IsYearStart, + is_year_end, IsYearEnd, is_leap_year, IsLeapYear, gt, GT, diff --git a/graphistry/compute/__init__.py b/graphistry/compute/__init__.py index d321b0915e..5065246bd9 100644 --- a/graphistry/compute/__init__.py +++ b/graphistry/compute/__init__.py @@ -14,6 +14,7 @@ is_quarter_start, IsQuarterStart, is_quarter_end, IsQuarterEnd, is_year_start, IsYearStart, + is_year_end, IsYearEnd, is_leap_year, IsLeapYear ) from .predicates.numeric import ( diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index bc45c7fa4b..93e3189f6b 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -17,6 +17,7 @@ is_quarter_start, IsQuarterStart, is_quarter_end, IsQuarterEnd, is_year_start, IsYearStart, + is_year_end, IsYearEnd, is_leap_year, IsLeapYear ) from .predicates.numeric import ( From f900bd154cfefdeb24e242337ced85bb7acdd0ff Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:28:40 -0800 Subject: [PATCH 078/104] refactor(repr): remove unnecessary --- graphistry/compute/ast.py | 9 --------- 1 file changed, 9 deletions(-) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 93e3189f6b..c188c26d95 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -253,9 +253,6 @@ def __init__(self, edge_query=edge_query ) - def __repr__(self) -> str: - return f'ASTEdgeForward(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' - e_forward = ASTEdgeForward # noqa: E305 class ASTEdgeReverse(ASTEdge): @@ -285,9 +282,6 @@ def __init__(self, destination_node_query=destination_node_query, edge_query=edge_query ) - - def __repr__(self) -> str: - return f'ASTEdgeReverse(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' e_reverse = ASTEdgeReverse # noqa: E305 @@ -319,7 +313,4 @@ def __init__(self, edge_query=edge_query ) - def __repr__(self) -> str: - return f'ASTEdgeUndirected(edge_match={self._edge_match}, hops={self._hops}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, to_fixed_point={self._to_fixed_point}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' - e_undirected = ASTEdgeUndirected # noqa: E305 From 4ad0233b39e969c05556c3c5d12bbbfb68fb5d7f Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:29:28 -0800 Subject: [PATCH 079/104] refactor(gfql): e() now undirected edge --- CHANGELOG.md | 4 ++++ graphistry/compute/ast.py | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c3570d37e2..f4930df0b9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm * GFQL in readme.md +### Breaking 🔥 + +* GFQL `e()` now aliases `e_undirected` instead of the base class `ASTEdge` + ## [0.31.1 - 2023-12-05] ### Docs diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index c188c26d95..6727df437b 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -223,7 +223,6 @@ def reverse(self) -> 'ASTEdge': destination_node_query=self._source_node_query, edge_query=self._edge_query ) -e = ASTEdge # noqa: E305 class ASTEdgeForward(ASTEdge): """ @@ -314,3 +313,4 @@ def __init__(self, ) e_undirected = ASTEdgeUndirected # noqa: E305 +e = ASTEdgeUndirected # noqa: E305 From 8a943129b1ec7424817f9a6a24aacb042c197b22 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:30:41 -0800 Subject: [PATCH 080/104] refactor(gfql): literal typed Direction --- graphistry/compute/ast.py | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 6727df437b..38461899a6 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -121,10 +121,11 @@ def reverse(self) -> 'ASTNode': ############################################################################### +Direction = Literal['forward', 'reverse', 'undirected'] DEFAULT_HOPS = 1 DEFAULT_FIXED_POINT = False -DEFAULT_DIRECTION = 'forward' +DEFAULT_DIRECTION: Direction = 'forward' DEFAULT_FILTER_DICT = None class ASTEdge(ASTObject): @@ -133,7 +134,7 @@ class ASTEdge(ASTObject): """ def __init__( self, - direction: Optional[str] = DEFAULT_DIRECTION, + direction: Optional[Direction] = DEFAULT_DIRECTION, edge_match: Optional[dict] = DEFAULT_FILTER_DICT, hops: Optional[int] = DEFAULT_HOPS, to_fixed_point: bool = DEFAULT_FIXED_POINT, @@ -158,7 +159,7 @@ def __init__( self._hops = hops self._to_fixed_point = to_fixed_point - self._direction = direction + self._direction : Direction = direction self._source_node_match = source_node_match self._edge_match = edge_match self._destination_node_match = destination_node_match @@ -206,6 +207,7 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta def reverse(self) -> 'ASTEdge': # updates both edges and nodes + direction : Direction if self._direction == 'reverse': direction = 'forward' elif self._direction == 'forward': From 90851d6ca42a1deebc55e8c7e9353300ea3c25d3 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:30:56 -0800 Subject: [PATCH 081/104] refactor(gfql): literal typed Direction --- graphistry/compute/ast.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 38461899a6..93ad93ef40 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -1,5 +1,5 @@ import logging -from typing import Optional, cast +from typing_extensions import Literal import pandas as pd from graphistry.Plottable import Plottable From ece0924ba0dcd4620a9f4907a8309d0291a8bf76 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:31:48 -0800 Subject: [PATCH 082/104] feat(gfql): chain serialization --- CHANGELOG.md | 2 + graphistry/compute/ast.py | 144 ++++++++++++- graphistry/compute/chain.py | 19 +- graphistry/compute/predicates/ASTPredicate.py | 11 + graphistry/compute/predicates/categorical.py | 15 ++ graphistry/compute/predicates/from_json.py | 37 ++++ graphistry/compute/predicates/is_in.py | 21 ++ graphistry/compute/predicates/numeric.py | 129 +++++++++++- graphistry/compute/predicates/str.py | 191 +++++++++++++++++- graphistry/compute/predicates/temporal.py | 62 ++++++ .../compute/predicates/test_categorical.py | 15 ++ .../compute/predicates/test_from_json.py | 27 +++ .../tests/compute/predicates/test_is_in.py | 16 ++ .../tests/compute/predicates/test_numeric.py | 16 ++ .../tests/compute/predicates/test_str.py | 14 ++ .../tests/compute/predicates/test_temporal.py | 13 ++ graphistry/tests/compute/test_ast.py | 23 +++ graphistry/tests/compute/test_chain.py | 38 ++++ graphistry/tests/test_util.py | 63 ++++++ graphistry/util.py | 11 + 20 files changed, 856 insertions(+), 11 deletions(-) create mode 100644 graphistry/compute/predicates/from_json.py create mode 100644 graphistry/tests/compute/predicates/test_categorical.py create mode 100644 graphistry/tests/compute/predicates/test_from_json.py create mode 100644 graphistry/tests/compute/predicates/test_is_in.py create mode 100644 graphistry/tests/compute/predicates/test_numeric.py create mode 100644 graphistry/tests/compute/predicates/test_str.py create mode 100644 graphistry/tests/compute/predicates/test_temporal.py create mode 100644 graphistry/tests/compute/test_ast.py create mode 100644 graphistry/tests/compute/test_chain.py create mode 100644 graphistry/tests/test_util.py diff --git a/CHANGELOG.md b/CHANGELOG.md index f4930df0b9..e7761fba2a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] ### Added + +* GFQL query serialization: `graphistry.compute.from_json(graphistry.compute.to_json([...]))` * GFQL predicate `is_year_end` ### Docs diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 93ad93ef40..7e892959e1 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -1,9 +1,11 @@ +from abc import abstractmethod import logging +from typing import Dict, Optional, Union, cast from typing_extensions import Literal import pandas as pd from graphistry.Plottable import Plottable -from graphistry.util import setup_logger +from graphistry.util import is_json_serializable, setup_logger from .predicates.ASTPredicate import ASTPredicate from .predicates.is_in import ( is_in, IsIn @@ -66,12 +68,43 @@ def __init__(self, name: Optional[str] = None): self._name = name pass + @abstractmethod def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: raise RuntimeError('__call__ not implemented') + @abstractmethod def reverse(self) -> 'ASTObject': raise RuntimeError('reverse not implemented') + + @abstractmethod + def to_json(self, validate=True) -> dict: + raise NotImplementedError() + def validate(self) -> None: + pass + + +############################################################################## + + +def assert_record_match(d: Dict) -> None: + assert isinstance(d, dict) + for k, v in d.items(): + assert isinstance(k, str) + assert isinstance(v, ASTPredicate) or is_json_serializable(v) + +def maybe_filter_dict_from_json(d: Dict, key: str) -> Optional[Dict]: + if key not in d: + return None + if key in d and isinstance(d[key], dict): + return { + k: ASTPredicate.from_json(v) if isinstance(v, dict) else v + for k, v in d[key].items() + } + elif key in d and d[key] is not None: + raise ValueError('filter_dict must be a dict or None') + else: + return None ############################################################################## @@ -91,6 +124,36 @@ def __init__(self, filter_dict: Optional[dict] = None, name: Optional[str] = Non def __repr__(self) -> str: return f'ASTNode(filter_dict={self._filter_dict}, name={self._name})' + + def validate(self) -> None: + if self._filter_dict is not None: + assert_record_match(self._filter_dict) + if self._name is not None: + assert isinstance(self._name, str) + if self._query is not None: + assert isinstance(self._query, str) + + def to_json(self, validate=True) -> dict: + return { + 'type': 'Node', + 'filter_dict': { + k: v.to_json() if isinstance(v, ASTPredicate) else v + for k, v in self._filter_dict.items() + if v is not None + } if self._filter_dict is not None else {}, + **({'name': self._name} if self._name is not None else {}), + **({'query': self._query } if self._query is not None else {}) + } + + @classmethod + def from_json(cls, d: dict) -> 'ASTNode': + out = ASTNode( + filter_dict=maybe_filter_dict_from_json(d, 'filter_dict'), + name=d['name'] if 'name' in d else None, + query=d['query'] if 'query' in d else None + ) + out.validate() + return out def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: out_g = (g @@ -170,6 +233,71 @@ def __init__( def __repr__(self) -> str: return f'ASTEdge(direction={self._direction}, edge_match={self._edge_match}, hops={self._hops}, to_fixed_point={self._to_fixed_point}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' + def validate(self) -> None: + assert self._hops is None or isinstance(self._hops, int) + assert isinstance(self._to_fixed_point, bool) + assert self._direction in ['forward', 'reverse', 'undirected'] + if self._source_node_match is not None: + assert_record_match(self._source_node_match) + if self._edge_match is not None: + assert_record_match(self._edge_match) + if self._destination_node_match is not None: + assert_record_match(self._destination_node_match) + if self._name is not None: + assert isinstance(self._name, str) + if self._source_node_query is not None: + assert isinstance(self._source_node_query, str) + if self._destination_node_query is not None: + assert isinstance(self._destination_node_query, str) + if self._edge_query is not None: + assert isinstance(self._edge_query, str) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return { + 'type': 'Edge', + 'hops': self._hops, + 'to_fixed_point': self._to_fixed_point, + 'direction': self._direction, + **({'source_node_match': { + k: v.to_json() if isinstance(v, ASTPredicate) else v + for k, v in self._source_node_match.items() + if v is not None + }} if self._source_node_match is not None else {}), + **({'edge_match': { + k: v.to_json() if isinstance(v, ASTPredicate) else v + for k, v in self._edge_match.items() + if v is not None + }} if self._edge_match is not None else {}), + **({'destination_node_match': { + k: v.to_json() if isinstance(v, ASTPredicate) else v + for k, v in self._destination_node_match.items() + if v is not None + }} if self._destination_node_match is not None else {}), + **({'name': self._name} if self._name is not None else {}), + **({'source_node_query': self._source_node_query} if self._source_node_query is not None else {}), + **({'destination_node_query': self._destination_node_query} if self._destination_node_query is not None else {}), + **({'edge_query': self._edge_query} if self._edge_query is not None else {}) + } + + @classmethod + def from_json(cls, d: dict) -> 'ASTEdge': + out = ASTEdge( + direction=d['direction'] if 'direction' in d else None, + edge_match=maybe_filter_dict_from_json(d, 'edge_match'), + hops=d['hops'] if 'hops' in d else None, + to_fixed_point=d['to_fixed_point'] if 'to_fixed_point' in d else None, + source_node_match=maybe_filter_dict_from_json(d, 'source_node_match'), + destination_node_match=maybe_filter_dict_from_json(d, 'destination_node_match'), + source_node_query=d['source_node_query'] if 'source_node_query' in d else None, + destination_node_query=d['destination_node_query'] if 'destination_node_query' in d else None, + edge_query=d['edge_query'] if 'edge_query' in d else None, + name=d['name'] if 'name' in d else None + ) + out.validate() + return out + def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: if logger.isEnabledFor(logging.DEBUG): @@ -316,3 +444,17 @@ def __init__(self, e_undirected = ASTEdgeUndirected # noqa: E305 e = ASTEdgeUndirected # noqa: E305 + +### + +def from_json(o: Dict) -> Union[ASTNode, ASTEdge]: + assert isinstance(o, dict) + assert 'type' in o + out : Union[ASTNode, ASTEdge] + if o['type'] == 'Node': + out = ASTNode.from_json(o) + elif o['type'] == 'Edge': + out = ASTEdge.from_json(o) + else: + raise ValueError(f'Unknown type {o["type"]}') + return out diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index a3132ca624..44e0133138 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -1,9 +1,9 @@ -from typing import cast, List, Tuple +from typing import Dict, cast, List, Tuple import pandas as pd from graphistry.Plottable import Plottable from graphistry.util import setup_logger -from .ast import ASTObject, ASTNode, ASTEdge +from .ast import ASTObject, ASTNode, ASTEdge, from_json as ASTObject_from_json logger = setup_logger(__name__) @@ -253,3 +253,18 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: g_out = g.nodes(final_nodes_df).edges(final_edges_df) return g_out + +### + +def from_json(d: Dict) -> List[ASTObject]: + """ + Convert a JSON AST into a list of ASTObjects + """ + assert isinstance(d, list) + return [ASTObject_from_json(op) for op in d] + +def to_json(ops: List[ASTObject]) -> List[Dict]: + """ + Convert a list of ASTObjects into a JSON AST + """ + return [op.to_json() for op in ops] diff --git a/graphistry/compute/predicates/ASTPredicate.py b/graphistry/compute/predicates/ASTPredicate.py index 24c2d08bc8..b5621b5a07 100644 --- a/graphistry/compute/predicates/ASTPredicate.py +++ b/graphistry/compute/predicates/ASTPredicate.py @@ -10,4 +10,15 @@ class ASTPredicate(): @abstractmethod def __call__(self, s: pd.Series) -> pd.Series: + raise NotImplementedError() + + @abstractmethod + def to_json(self, validate=True) -> dict: + raise NotImplementedError() + + @classmethod + def from_json(cls, d: dict) -> 'ASTPredicate': + raise NotImplementedError() + + def validate(self) -> None: pass diff --git a/graphistry/compute/predicates/categorical.py b/graphistry/compute/predicates/categorical.py index bcc08c7c84..6b98f5cfe5 100644 --- a/graphistry/compute/predicates/categorical.py +++ b/graphistry/compute/predicates/categorical.py @@ -10,6 +10,21 @@ def __init__(self, keep: Literal['first', 'last', False] = 'first') -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.duplicated(keep=self.keep) + def validate(self) -> None: + assert self.keep in ['first', 'last', False] + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'Duplicated', 'keep': self.keep} + + @classmethod + def from_json(cls, d: dict) -> 'Duplicated': + assert 'keep' in d + out = Duplicated(keep=d['keep']) + out.validate() + return out + def duplicated(keep: Literal['first', 'last', False] = 'first') -> Duplicated: """ Return whether a given value is duplicated diff --git a/graphistry/compute/predicates/from_json.py b/graphistry/compute/predicates/from_json.py new file mode 100644 index 0000000000..cab27531b1 --- /dev/null +++ b/graphistry/compute/predicates/from_json.py @@ -0,0 +1,37 @@ +from typing import Dict, List, Type + +from graphistry.compute.predicates.ASTPredicate import ASTPredicate +from graphistry.compute.predicates.categorical import Duplicated +from graphistry.compute.predicates.is_in import IsIn +from graphistry.compute.predicates.numeric import GT, LT, GE, LE, EQ, NE, Between, IsNA, NotNA +from graphistry.compute.predicates.str import ( + Contains, Startswith, Endswith, Match, IsNumeric, IsAlpha, IsDecimal, IsDigit, IsLower, IsUpper, + IsSpace, IsAlnum, IsTitle, IsNull, NotNull +) +from graphistry.compute.predicates.temporal import ( + IsMonthStart, IsMonthEnd, IsQuarterStart, IsQuarterEnd, + IsYearStart, IsYearEnd, IsLeapYear +) + +predicates : List[Type[ASTPredicate]] = [ + Duplicated, + IsIn, + GT, LT, GE, LE, EQ, NE, Between, IsNA, NotNA, + Contains, Startswith, Endswith, Match, IsNumeric, IsAlpha, IsDecimal, IsDigit, IsLower, IsUpper, + IsSpace, IsAlnum, IsDecimal, IsTitle, IsNull, NotNull, + IsMonthStart, IsMonthEnd, IsQuarterStart, IsQuarterEnd, + IsYearStart, IsYearEnd, IsLeapYear +] + +type_to_predicate: Dict[str, Type[ASTPredicate]] = { + cls.__name__: cls + for cls in predicates +} + +def from_json(d: Dict) -> ASTPredicate: + assert isinstance(d, dict) + assert 'type' in d + assert d['type'] in type_to_predicate + out = type_to_predicate[d['type']].from_json(d) + out.validate() + return out diff --git a/graphistry/compute/predicates/is_in.py b/graphistry/compute/predicates/is_in.py index 77c9f2505a..64a2605f55 100644 --- a/graphistry/compute/predicates/is_in.py +++ b/graphistry/compute/predicates/is_in.py @@ -1,6 +1,8 @@ from typing import Any, List import pandas as pd +from graphistry.util import assert_json_serializable + from .ASTPredicate import ASTPredicate @@ -10,6 +12,25 @@ def __init__(self, options: List[Any]) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.isin(self.options) + + def validate(self) -> None: + assert isinstance(self.options, list) + assert_json_serializable(self.options) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return { + 'type': 'IsIn', + 'options': self.options + } + + @classmethod + def from_json(cls, d: dict) -> 'IsIn': + assert 'options' in d + out = IsIn(options=d['options']) + out.validate() + return out def is_in(options: List[Any]) -> IsIn: return IsIn(options) diff --git a/graphistry/compute/predicates/numeric.py b/graphistry/compute/predicates/numeric.py index d17b07bc0c..64fde67ee0 100644 --- a/graphistry/compute/predicates/numeric.py +++ b/graphistry/compute/predicates/numeric.py @@ -1,80 +1,162 @@ -from typing import Optional +from typing import Union import pandas as pd from .ASTPredicate import ASTPredicate -class GT(ASTPredicate): + +class NumericASTPredicate(ASTPredicate): + def __init__(self, val: Union[int, float]) -> None: + self.val = val + + def validate(self) -> None: + assert isinstance(self.val, (int, float)) + +### + +class GT(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val def __call__(self, s: pd.Series) -> pd.Series: return s > self.val + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'GT', 'val': self.val} + + @classmethod + def from_json(cls, d: dict) -> 'GT': + assert 'val' in d + out = GT(val=d['val']) + out.validate() + return out + def gt(val: float) -> GT: """ Return whether a given value is greater than a threshold """ return GT(val) -class LT(ASTPredicate): +class LT(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val def __call__(self, s: pd.Series) -> pd.Series: return s < self.val + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'LT', 'val': self.val} + + @classmethod + def from_json(cls, d: dict) -> 'LT': + assert 'val' in d + out = LT(val=d['val']) + out.validate() + return out + def lt(val: float) -> LT: """ Return whether a given value is less than a threshold """ return LT(val) -class GE(ASTPredicate): +class GE(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val def __call__(self, s: pd.Series) -> pd.Series: return s >= self.val + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'GE', 'val': self.val} + + @classmethod + def from_json(cls, d: dict) -> 'GE': + assert 'val' in d + out = GE(val=d['val']) + out.validate() + return out + def ge(val: float) -> GE: """ Return whether a given value is greater than or equal to a threshold """ return GE(val) -class LE(ASTPredicate): +class LE(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val def __call__(self, s: pd.Series) -> pd.Series: return s <= self.val + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'LE', 'val': self.val} + + @classmethod + def from_json(cls, d: dict) -> 'LE': + assert 'val' in d + out = LE(val=d['val']) + out.validate() + return out + def le(val: float) -> LE: """ Return whether a given value is less than or equal to a threshold """ return LE(val) -class EQ(ASTPredicate): +class EQ(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val def __call__(self, s: pd.Series) -> pd.Series: return s == self.val + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'EQ', 'val': self.val} + + @classmethod + def from_json(cls, d: dict) -> 'EQ': + assert 'val' in d + out = EQ(val=d['val']) + out.validate() + return out + def eq(val: float) -> EQ: """ Return whether a given value is equal to a threshold """ return EQ(val) -class NE(ASTPredicate): +class NE(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val def __call__(self, s: pd.Series) -> pd.Series: return s != self.val + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'NE', 'val': self.val} + + @classmethod + def from_json(cls, d: dict) -> 'NE': + assert 'val' in d + out = NE(val=d['val']) + out.validate() + return out + def ne(val: float) -> NE: """ Return whether a given value is not equal to a threshold @@ -92,6 +174,25 @@ def __call__(self, s: pd.Series) -> pd.Series: return (s >= self.lower) & (s <= self.upper) else: return (s > self.lower) & (s < self.upper) + + def validate(self) -> None: + assert isinstance(self.lower, (int, float)) + assert isinstance(self.upper, (int, float)) + assert isinstance(self.inclusive, bool) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return {'type': 'Between', 'lower': self.lower, 'upper': self.upper, 'inclusive': self.inclusive} + + @classmethod + def from_json(cls, d: dict) -> 'Between': + assert 'lower' in d + assert 'upper' in d + assert 'inclusive' in d + out = Between(lower=d['lower'], upper=d['upper'], inclusive=d['inclusive']) + out.validate() + return out def between(lower: float, upper: float, inclusive: bool = True) -> Between: """ @@ -102,6 +203,13 @@ def between(lower: float, upper: float, inclusive: bool = True) -> Between: class IsNA(ASTPredicate): def __call__(self, s: pd.Series) -> pd.Series: return s.isna() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsNA'} + + @classmethod + def from_json(cls, d: dict) -> 'IsNA': + return IsNA() def isna() -> IsNA: """ @@ -113,6 +221,13 @@ def isna() -> IsNA: class NotNA(ASTPredicate): def __call__(self, s: pd.Series) -> pd.Series: return s.notna() + + def to_json(self, validate=True) -> dict: + return {'type': 'NotNA'} + + @classmethod + def from_json(cls, d: dict) -> 'NotNA': + return NotNA() def notna() -> NotNA: """ diff --git a/graphistry/compute/predicates/str.py b/graphistry/compute/predicates/str.py index 14a8ae2de5..fb81cc5ddf 100644 --- a/graphistry/compute/predicates/str.py +++ b/graphistry/compute/predicates/str.py @@ -14,6 +14,41 @@ def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[boo def __call__(self, s: pd.Series) -> pd.Series: return s.str.contains(self.pat, self.case, self.flags, self.na, self.regex) + + def validate(self) -> None: + assert isinstance(self.pat, str) + assert isinstance(self.case, bool) + assert isinstance(self.flags, int) + assert isinstance(self.na, (bool, type(None))) + assert isinstance(self.regex, bool) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return { + 'type': 'Contains', + 'pat': self.pat, + 'case': self.case, + 'flags': self.flags, + **({'na': self.na} if self.na is not None else {}), + 'regex': self.regex + } + + @classmethod + def from_json(cls, d: dict) -> 'Contains': + assert 'pat' in d + assert 'case' in d + assert 'flags' in d + assert 'regex' in d + out = Contains( + pat=d['pat'], + case=d['case'], + flags=d['flags'], + na=d['na'] if 'na' in d else None, + regex=d['regex'] + ) + out.validate() + return out def contains(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None, regex: bool = True) -> Contains: """ @@ -30,6 +65,29 @@ def __init__(self, pat: str, na: Optional[str] = None) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.startswith(self.pat, self.na) + def validate(self) -> None: + assert isinstance(self.pat, str) + assert isinstance(self.na, (str, type(None))) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return { + 'type': 'Startswith', + 'pat': self.pat, + **({'na': self.na} if self.na is not None else {}) + } + + @classmethod + def from_json(cls, d: dict) -> 'Startswith': + assert 'pat' in d + out = Startswith( + pat=d['pat'], + na=d['na'] if 'na' in d else None + ) + out.validate() + return out + def startswith(pat: str, na: Optional[str] = None) -> Startswith: """ Return whether a given pattern is at the start of a string @@ -47,6 +105,29 @@ def __call__(self, s: pd.Series) -> pd.Series: """ return s.str.endswith(self.pat, self.na) + def validate(self) -> None: + assert isinstance(self.pat, str) + assert isinstance(self.na, (str, type(None))) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return { + 'type': 'Endswith', + 'pat': self.pat, + **({'na': self.na} if self.na is not None else {}) + } + + @classmethod + def from_json(cls, d: dict) -> 'Endswith': + assert 'pat' in d + out = Endswith( + pat=d['pat'], + na=d['na'] if 'na' in d else None + ) + out.validate() + return out + def endswith(pat: str, na: Optional[str] = None) -> Endswith: return Endswith(pat, na) @@ -59,6 +140,37 @@ def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[boo def __call__(self, s: pd.Series) -> pd.Series: return s.str.match(self.pat, self.case, self.flags, self.na) + + def validate(self) -> None: + assert isinstance(self.pat, str) + assert isinstance(self.case, bool) + assert isinstance(self.flags, int) + assert isinstance(self.na, (bool, type(None))) + + def to_json(self, validate=True) -> dict: + if validate: + self.validate() + return { + 'type': 'Match', + 'pat': self.pat, + 'case': self.case, + 'flags': self.flags, + **({'na': self.na} if self.na is not None else {}) + } + + @classmethod + def from_json(cls, d: dict) -> 'Match': + assert 'pat' in d + assert 'case' in d + assert 'flags' in d + out = Match( + pat=d['pat'], + case=d['case'], + flags=d['flags'], + na=d['na'] if 'na' in d else None + ) + out.validate() + return out def match(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None) -> Match: """ @@ -72,7 +184,14 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isnumeric() - + + def to_json(self, validate=True) -> dict: + return {'type': 'IsNumeric'} + + @classmethod + def from_json(cls, d: dict) -> 'IsNumeric': + return IsNumeric() + def isnumeric() -> IsNumeric: """ Return whether a given string is numeric @@ -85,6 +204,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isalpha() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsAlpha'} + + @classmethod + def from_json(cls, d: dict) -> 'IsAlpha': + return IsAlpha() def isalpha() -> IsAlpha: """ @@ -98,6 +224,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isdigit() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsDigit'} + + @classmethod + def from_json(cls, d: dict) -> 'IsDigit': + return IsDigit() def isdigit() -> IsDigit: """ @@ -111,6 +244,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.islower() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsLower'} + + @classmethod + def from_json(cls, d: dict) -> 'IsLower': + return IsLower() def islower() -> IsLower: """ @@ -124,6 +264,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isupper() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsUpper'} + + @classmethod + def from_json(cls, d: dict) -> 'IsUpper': + return IsUpper() def isupper() -> IsUpper: """ @@ -138,6 +285,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isspace() + def to_json(self, validate=True) -> dict: + return {'type': 'IsSpace'} + + @classmethod + def from_json(cls, d: dict) -> 'IsSpace': + return IsSpace() + def isspace() -> IsSpace: """ Return whether a given string is whitespace @@ -151,6 +305,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isalnum() + def to_json(self, validate=True) -> dict: + return {'type': 'IsAlnum'} + + @classmethod + def from_json(cls, d: dict) -> 'IsAlnum': + return IsAlnum() + def isalnum() -> IsAlnum: """ Return whether a given string is alphanumeric @@ -163,6 +324,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.isdecimal() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsDecimal'} + + @classmethod + def from_json(cls, d: dict) -> 'IsDecimal': + return IsDecimal() def isdecimal() -> IsDecimal: """ @@ -176,6 +344,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.str.istitle() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsTitle'} + + @classmethod + def from_json(cls, d: dict) -> 'IsTitle': + return IsTitle() def istitle() -> IsTitle: """ @@ -189,6 +364,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.isnull() + + def to_json(self, validate=True) -> dict: + return {'type': 'IsNull'} + + @classmethod + def from_json(cls, d: dict) -> 'IsNull': + return IsNull() def isnull() -> IsNull: """ @@ -202,6 +384,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.notnull() + + def to_json(self, validate=True) -> dict: + return {'type': 'NotNull'} + + @classmethod + def from_json(cls, d: dict) -> 'NotNull': + return NotNull() def notnull() -> NotNull: """ diff --git a/graphistry/compute/predicates/temporal.py b/graphistry/compute/predicates/temporal.py index b18984fe97..329e95dcf2 100644 --- a/graphistry/compute/predicates/temporal.py +++ b/graphistry/compute/predicates/temporal.py @@ -9,6 +9,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_month_start + + def to_json(self, validate=True) -> dict: + return {'type': 'IsMonthStart'} + + @classmethod + def from_json(cls, d: dict) -> 'IsMonthStart': + return IsMonthStart() def is_month_start() -> IsMonthStart: """ @@ -22,6 +29,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_month_end + + def to_json(self, validate=True) -> dict: + return {'type': 'IsMonthEnd'} + + @classmethod + def from_json(cls, d: dict) -> 'IsMonthEnd': + return IsMonthEnd() def is_month_end() -> IsMonthEnd: """ @@ -35,6 +49,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_quarter_start + + def to_json(self, validate=True) -> dict: + return {'type': 'IsQuarterStart'} + + @classmethod + def from_json(cls, d: dict) -> 'IsQuarterStart': + return IsQuarterStart() def is_quarter_start() -> IsQuarterStart: """ @@ -48,6 +69,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_quarter_end + + def to_json(self, validate=True) -> dict: + return {'type': 'IsQuarterEnd'} + + @classmethod + def from_json(cls, d: dict) -> 'IsQuarterEnd': + return IsQuarterEnd() def is_quarter_end() -> IsQuarterEnd: """ @@ -61,6 +89,13 @@ def __init__(self) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_year_start + + def to_json(self, validate=True) -> dict: + return {'type': 'IsYearStart'} + + @classmethod + def from_json(cls, d: dict) -> 'IsYearStart': + return IsYearStart() def is_year_start() -> IsYearStart: """ @@ -68,12 +103,39 @@ def is_year_start() -> IsYearStart: """ return IsYearStart() +class IsYearEnd(ASTPredicate): + def __init__(self) -> None: + pass + + def __call__(self, s: pd.Series) -> pd.Series: + return s.dt.is_year_end + + def to_json(self, validate=True) -> dict: + return {'type': 'IsYearEnd'} + + @classmethod + def from_json(cls, d: dict) -> 'IsYearEnd': + return IsYearEnd() + +def is_year_end() -> IsYearEnd: + """ + Return whether a given value is a year end + """ + return IsYearEnd() + class IsLeapYear(ASTPredicate): def __init__(self) -> None: pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_leap_year + + def to_json(self, validate=True) -> dict: + return {'type': 'IsLeapYear'} + + @classmethod + def from_json(cls, d: dict) -> 'IsLeapYear': + return IsLeapYear() def is_leap_year() -> IsLeapYear: """ diff --git a/graphistry/tests/compute/predicates/test_categorical.py b/graphistry/tests/compute/predicates/test_categorical.py new file mode 100644 index 0000000000..afdc0393b8 --- /dev/null +++ b/graphistry/tests/compute/predicates/test_categorical.py @@ -0,0 +1,15 @@ +from graphistry.compute.predicates.categorical import Duplicated, duplicated + +def test_duplicated(): + + d = duplicated('last') + assert isinstance(d, Duplicated) + assert d.keep == 'last' + + o = d.to_json() + assert isinstance(o, dict) + assert o['type'] == 'Duplicated' + + d2 = Duplicated.from_json(o) + assert isinstance(d2, Duplicated) + assert d2.keep == 'last' diff --git a/graphistry/tests/compute/predicates/test_from_json.py b/graphistry/tests/compute/predicates/test_from_json.py new file mode 100644 index 0000000000..adafef1bbe --- /dev/null +++ b/graphistry/tests/compute/predicates/test_from_json.py @@ -0,0 +1,27 @@ +from graphistry.compute.predicates.categorical import Duplicated +from graphistry.compute.predicates.from_json import from_json + + +def test_from_json_good(): + d = from_json({'type': 'Duplicated', 'keep': 'last'}) + assert isinstance(d, Duplicated) + assert d.keep == 'last' + +def test_from_json_bad(): + try: + from_json({'type': 'zzz'}) + assert False + except AssertionError: + assert True + + try: + from_json({'type': 'Duplicated', 'keep': 'zzz'}) + assert False + except AssertionError: + assert True + + try: + from_json({'type': 'Duplicated'}) + assert False + except AssertionError: + assert True diff --git a/graphistry/tests/compute/predicates/test_is_in.py b/graphistry/tests/compute/predicates/test_is_in.py new file mode 100644 index 0000000000..8648f3487c --- /dev/null +++ b/graphistry/tests/compute/predicates/test_is_in.py @@ -0,0 +1,16 @@ +from graphistry.compute.predicates.is_in import IsIn, is_in + + +def test_is_in(): + + d = is_in([1, 2, 3]) + assert isinstance(d, IsIn) + assert d.options == [1, 2, 3] + + o = d.to_json() + assert isinstance(o, dict) + assert o['type'] == 'IsIn' + + d2 = IsIn.from_json(o) + assert isinstance(d2, IsIn) + assert d2.options == [1, 2, 3] diff --git a/graphistry/tests/compute/predicates/test_numeric.py b/graphistry/tests/compute/predicates/test_numeric.py new file mode 100644 index 0000000000..b6ce762c60 --- /dev/null +++ b/graphistry/tests/compute/predicates/test_numeric.py @@ -0,0 +1,16 @@ +from graphistry.compute.predicates.numeric import GT, gt + +def test_gt(): + + d = gt(1) + assert isinstance(d, GT) + assert d.val == 1 + + o = d.to_json() + assert isinstance(o, dict) + assert o['type'] == 'GT' + assert o['val'] == 1 + + d2 = GT.from_json(o) + assert isinstance(d2, GT) + assert d2.val == 1 diff --git a/graphistry/tests/compute/predicates/test_str.py b/graphistry/tests/compute/predicates/test_str.py new file mode 100644 index 0000000000..da6875157b --- /dev/null +++ b/graphistry/tests/compute/predicates/test_str.py @@ -0,0 +1,14 @@ +from graphistry.compute.predicates.str import IsUpper, isupper + + +def test_is_upper(): + + d = isupper() + assert isinstance(d, IsUpper) + + o = d.to_json() + assert isinstance(o, dict) + assert o['type'] == 'IsUpper' + + d2 = IsUpper.from_json(o) + assert isinstance(d2, IsUpper) diff --git a/graphistry/tests/compute/predicates/test_temporal.py b/graphistry/tests/compute/predicates/test_temporal.py new file mode 100644 index 0000000000..fe6101edc7 --- /dev/null +++ b/graphistry/tests/compute/predicates/test_temporal.py @@ -0,0 +1,13 @@ +from graphistry.compute.predicates.temporal import IsLeapYear, is_leap_year + +def test_is_leap_year(): + + d = is_leap_year() + assert isinstance(d, IsLeapYear) + + o = d.to_json() + assert isinstance(o, dict) + assert o['type'] == 'IsLeapYear' + + d2 = IsLeapYear.from_json(o) + assert isinstance(d2, IsLeapYear) diff --git a/graphistry/tests/compute/test_ast.py b/graphistry/tests/compute/test_ast.py new file mode 100644 index 0000000000..f08977223b --- /dev/null +++ b/graphistry/tests/compute/test_ast.py @@ -0,0 +1,23 @@ +from graphistry.compute.ast import from_json, ASTNode, ASTEdge, n, e, e_forward, e_reverse, e_undirected + +def test_serialization_node(): + + node = n(query='zzz', name='abc') + o = node.to_json() + node2 = from_json(o) + assert isinstance(node2, ASTNode) + assert node2._query == 'zzz' + assert node2._name == 'abc' + o2 = node2.to_json() + assert o == o2 + +def test_serialization_edge(): + + edge = e(edge_query='zzz', name='abc') + o = edge.to_json() + edge2 = from_json(o) + assert isinstance(edge2, ASTEdge) + assert edge2._edge_query == 'zzz' + assert edge2._name == 'abc' + o2 = edge2.to_json() + assert o == o2 diff --git a/graphistry/tests/compute/test_chain.py b/graphistry/tests/compute/test_chain.py new file mode 100644 index 0000000000..0f3d221c82 --- /dev/null +++ b/graphistry/tests/compute/test_chain.py @@ -0,0 +1,38 @@ +from graphistry.compute.ast import ASTNode, ASTEdge, n, e +from graphistry.compute.chain import to_json as chain_to_json, from_json as chain_from_json + +def test_chain_serialization_mt(): + o = chain_to_json([]) + d = chain_from_json(o) + assert d == [] + assert o == [] + +def test_chain_serialization_node(): + o = chain_to_json([n(query='zzz', name='abc')]) + d = chain_from_json(o) + assert isinstance(d[0], ASTNode) + assert d[0]._query == 'zzz' + assert d[0]._name == 'abc' + o2 = chain_to_json(d) + assert o == o2 + +def test_chain_serialization_edge(): + o = chain_to_json([e(edge_query='zzz', name='abc')]) + d = chain_from_json(o) + assert isinstance(d[0], ASTEdge) + assert d[0]._edge_query == 'zzz' + assert d[0]._name == 'abc' + o2 = chain_to_json(d) + assert o == o2 + +def test_chain_serialization_multi(): + o = chain_to_json([n(query='zzz', name='abc'), e(edge_query='zzz', name='abc')]) + d = chain_from_json(o) + assert isinstance(d[0], ASTNode) + assert d[0]._query == 'zzz' + assert d[0]._name == 'abc' + assert isinstance(d[1], ASTEdge) + assert d[1]._edge_query == 'zzz' + assert d[1]._name == 'abc' + o2 = chain_to_json(d) + assert o == o2 diff --git a/graphistry/tests/test_util.py b/graphistry/tests/test_util.py new file mode 100644 index 0000000000..6e36b44f0c --- /dev/null +++ b/graphistry/tests/test_util.py @@ -0,0 +1,63 @@ +from graphistry.util import assert_json_serializable + +class TestAssertJsonSerializable(): + + def test_primitives(self): + assert_json_serializable(1) + assert_json_serializable(1.0) + assert_json_serializable('a') + assert_json_serializable(True) + assert_json_serializable(None) + + def test_list(self): + assert_json_serializable([]) + assert_json_serializable([1]) + assert_json_serializable([1, 2]) + assert_json_serializable([1, 'a', True, None]) + + def test_dict(self): + assert_json_serializable({}) + assert_json_serializable({'a': 1}) + assert_json_serializable({'a': 1, 'b': 2}) + assert_json_serializable({'a': 1, 'b': 'b', 'c': True, 'd': None}) + + def test_nested(self): + assert_json_serializable({'a': [1]}) + assert_json_serializable({'a': {'b': 1}}) + assert_json_serializable({'a': [{'b': 1}]}) + assert_json_serializable({'a': [{'b': 1}, {'c': 2}]}) + + def test_unserializable(self): + + try: + assert_json_serializable(set()) + assert False, 'Expected exception on set' + except AssertionError: + pass + + try: + assert_json_serializable({'a': set()}) + assert False, 'Expected exception on nested set' + except AssertionError: + pass + + try: + assert_json_serializable({'a': [set()]}) + assert False, 'Expected exception on nested set' + except AssertionError: + pass + + try: + assert_json_serializable({'a': [{'b': set()}]}) + assert False, 'Expected exception on nested set' + except AssertionError: + pass + + class Unserializable: + pass + + try: + assert_json_serializable(Unserializable()) + assert False, 'Expected exception on class' + except AssertionError: + pass diff --git a/graphistry/util.py b/graphistry/util.py index 3498b33572..4bad6740d0 100644 --- a/graphistry/util.py +++ b/graphistry/util.py @@ -1,4 +1,5 @@ import hashlib +import json import logging import os import pandas as pd @@ -418,3 +419,13 @@ def printmd(string, color=None, size=20): # # # matches name of inner function # return wrapper + +def is_json_serializable(data): + try: + json.dumps(data) + return True + except TypeError: + return False + +def assert_json_serializable(data): + assert is_json_serializable(data), f"Data is not JSON-serializable: {data}" From eb3825c3fb80b614da7c77bd9913987482c9feb6 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:55:51 -0800 Subject: [PATCH 083/104] docs(chain serialization) --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index c790ac7d79..8e6808c9dc 100644 --- a/README.md +++ b/README.md @@ -1316,6 +1316,17 @@ print('# end edges: ', len(g3._edges[ g3._edges.final_edge ])) See table above for more predicates like `is_in()` and `gt()` +Queries can be serialized and deserialized, such as for saving and remote execution: + +```python +from graphistry.compute.chain import from_json, to_json + +pattern = [n(), e(), n()] +pattern_json = to_json(pattern) +pattern2 = from_json(pattern_json) +g.chain(pattern2).plot() +``` + #### Pipelining ```python From ae5d7117979055875c5fd0d208226a4298c8a3dc Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Thu, 21 Dec 2023 01:56:01 -0800 Subject: [PATCH 084/104] fix(docs): new class --- docs/source/conf.py | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/conf.py b/docs/source/conf.py index b3f8ad6cb7..c055149334 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -60,6 +60,7 @@ ('py:class', 'graphistry.compute.predicates.numeric.LT'), ('py:class', 'graphistry.compute.predicates.numeric.NE'), ('py:class', 'graphistry.compute.predicates.numeric.NotNA'), + ('py:class', 'graphistry.compute.predicates.numeric.NumericASTPredicate'), ('py:class', 'graphistry.compute.predicates.str.Contains'), ('py:class', 'graphistry.compute.predicates.str.Endswith'), ('py:class', 'graphistry.compute.predicates.str.IsAlnum'), From 78a8b3cbca3f059f48ba2a99939adbf2a8db05c4 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 22 Dec 2023 14:25:43 -0800 Subject: [PATCH 085/104] feat(json): explicit JSONVal type and helpers --- graphistry/compute/ast.py | 3 ++- graphistry/compute/predicates/is_in.py | 2 +- .../{test_util.py => utils/test_json.py} | 2 +- graphistry/util.py | 10 ------- graphistry/utils/json.py | 27 +++++++++++++++++++ 5 files changed, 31 insertions(+), 13 deletions(-) rename graphistry/tests/{test_util.py => utils/test_json.py} (96%) create mode 100644 graphistry/utils/json.py diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 7e892959e1..bbdc23933a 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -5,7 +5,8 @@ import pandas as pd from graphistry.Plottable import Plottable -from graphistry.util import is_json_serializable, setup_logger +from graphistry.util import setup_logger +from graphistry.utils.json import is_json_serializable from .predicates.ASTPredicate import ASTPredicate from .predicates.is_in import ( is_in, IsIn diff --git a/graphistry/compute/predicates/is_in.py b/graphistry/compute/predicates/is_in.py index 64a2605f55..698a13e7cd 100644 --- a/graphistry/compute/predicates/is_in.py +++ b/graphistry/compute/predicates/is_in.py @@ -1,7 +1,7 @@ from typing import Any, List import pandas as pd -from graphistry.util import assert_json_serializable +from graphistry.utils.json import assert_json_serializable from .ASTPredicate import ASTPredicate diff --git a/graphistry/tests/test_util.py b/graphistry/tests/utils/test_json.py similarity index 96% rename from graphistry/tests/test_util.py rename to graphistry/tests/utils/test_json.py index 6e36b44f0c..cbf8280187 100644 --- a/graphistry/tests/test_util.py +++ b/graphistry/tests/utils/test_json.py @@ -1,4 +1,4 @@ -from graphistry.util import assert_json_serializable +from graphistry.utils.json import assert_json_serializable class TestAssertJsonSerializable(): diff --git a/graphistry/util.py b/graphistry/util.py index 4bad6740d0..c2c47996f1 100644 --- a/graphistry/util.py +++ b/graphistry/util.py @@ -419,13 +419,3 @@ def printmd(string, color=None, size=20): # # # matches name of inner function # return wrapper - -def is_json_serializable(data): - try: - json.dumps(data) - return True - except TypeError: - return False - -def assert_json_serializable(data): - assert is_json_serializable(data), f"Data is not JSON-serializable: {data}" diff --git a/graphistry/utils/json.py b/graphistry/utils/json.py new file mode 100644 index 0000000000..9ddf068443 --- /dev/null +++ b/graphistry/utils/json.py @@ -0,0 +1,27 @@ + +import json +from typing import Any, Dict, List, Union + + +JSONVal = Union[None, bool, str, float, int, List['JSONVal'], Dict[str, 'JSONVal']] + + +def is_json_serializable(data): + try: + json.dumps(data) + return True + except TypeError: + return False + +def assert_json_serializable(data): + assert is_json_serializable(data), f"Data is not JSON-serializable: {data}" + +def serialize_to_json_val(obj: Any) -> JSONVal: + if isinstance(obj, (str, int, float, bool, type(None))): + return obj + elif isinstance(obj, list): + return [serialize_to_json_val(item) for item in obj] + elif isinstance(obj, dict): + return {key: serialize_to_json_val(value) for key, value in obj.items()} + else: + raise TypeError(f"Unsupported type for to_json: {type(obj)}") From 8323935b309f9a6c41d5d304a0b868a270e49837 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 22 Dec 2023 14:26:46 -0800 Subject: [PATCH 086/104] refactor(GFQL expr serialization): factor out json methods --- graphistry/compute/predicates/ASTPredicate.py | 40 +++- graphistry/compute/predicates/categorical.py | 12 -- graphistry/compute/predicates/from_json.py | 8 +- graphistry/compute/predicates/is_in.py | 15 -- graphistry/compute/predicates/numeric.py | 100 --------- graphistry/compute/predicates/str.py | 190 ------------------ graphistry/compute/predicates/temporal.py | 63 ------ 7 files changed, 36 insertions(+), 392 deletions(-) diff --git a/graphistry/compute/predicates/ASTPredicate.py b/graphistry/compute/predicates/ASTPredicate.py index b5621b5a07..cc6e25b966 100644 --- a/graphistry/compute/predicates/ASTPredicate.py +++ b/graphistry/compute/predicates/ASTPredicate.py @@ -1,24 +1,44 @@ -from abc import abstractmethod +from abc import ABC, abstractmethod +from typing import Any, Dict import pandas as pd +from graphistry.utils.json import JSONVal, serialize_to_json_val -class ASTPredicate(): + +class ASTPredicate(ABC): """ Internal, not intended for use outside of this module. These are fancy columnar predicates used in {k: v, ...} node/edge df matching when going beyond primitive equality """ - @abstractmethod - def __call__(self, s: pd.Series) -> pd.Series: - raise NotImplementedError() + reserved_fields = ['type'] @abstractmethod - def to_json(self, validate=True) -> dict: - raise NotImplementedError() - - @classmethod - def from_json(cls, d: dict) -> 'ASTPredicate': + def __call__(self, s: pd.Series) -> pd.Series: raise NotImplementedError() def validate(self) -> None: pass + + def to_json(self, validate=True) -> Dict[str, JSONVal]: + """ + Returns JSON-compatible dictionry {"type": "ClassName", "arg1": val1, ...} + Emits all non-reserved instance fields + """ + if validate: + self.validate() + data: Dict[str, JSONVal] = {'type': self.__class__.__name__} + for key, value in self.__dict__.items(): + if key not in self.reserved_fields: + data[key] = serialize_to_json_val(value) + return data + + @classmethod + def from_json(cls, d: Dict[str, JSONVal]) -> 'ASTPredicate': + """ + Given c.to_json(), hydrate back c + + Corresponding c.__class__.__init__ must accept all non-reserved instance fields + """ + constructor_args = {k: v for k, v in d.items() if k not in cls.reserved_fields} + return cls(**constructor_args) diff --git a/graphistry/compute/predicates/categorical.py b/graphistry/compute/predicates/categorical.py index 6b98f5cfe5..9d0d0ccb9d 100644 --- a/graphistry/compute/predicates/categorical.py +++ b/graphistry/compute/predicates/categorical.py @@ -13,18 +13,6 @@ def __call__(self, s: pd.Series) -> pd.Series: def validate(self) -> None: assert self.keep in ['first', 'last', False] - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'Duplicated', 'keep': self.keep} - - @classmethod - def from_json(cls, d: dict) -> 'Duplicated': - assert 'keep' in d - out = Duplicated(keep=d['keep']) - out.validate() - return out - def duplicated(keep: Literal['first', 'last', False] = 'first') -> Duplicated: """ Return whether a given value is duplicated diff --git a/graphistry/compute/predicates/from_json.py b/graphistry/compute/predicates/from_json.py index cab27531b1..544cc87cda 100644 --- a/graphistry/compute/predicates/from_json.py +++ b/graphistry/compute/predicates/from_json.py @@ -12,6 +12,8 @@ IsMonthStart, IsMonthEnd, IsQuarterStart, IsQuarterEnd, IsYearStart, IsYearEnd, IsLeapYear ) +from graphistry.utils.json import JSONVal + predicates : List[Type[ASTPredicate]] = [ Duplicated, @@ -28,10 +30,12 @@ for cls in predicates } -def from_json(d: Dict) -> ASTPredicate: +def from_json(d: Dict[str, JSONVal]) -> ASTPredicate: assert isinstance(d, dict) assert 'type' in d assert d['type'] in type_to_predicate - out = type_to_predicate[d['type']].from_json(d) + assert isinstance(d['type'], str) + pred = type_to_predicate[d['type']] + out = pred.from_json(d) out.validate() return out diff --git a/graphistry/compute/predicates/is_in.py b/graphistry/compute/predicates/is_in.py index 698a13e7cd..4803124d78 100644 --- a/graphistry/compute/predicates/is_in.py +++ b/graphistry/compute/predicates/is_in.py @@ -16,21 +16,6 @@ def __call__(self, s: pd.Series) -> pd.Series: def validate(self) -> None: assert isinstance(self.options, list) assert_json_serializable(self.options) - - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return { - 'type': 'IsIn', - 'options': self.options - } - - @classmethod - def from_json(cls, d: dict) -> 'IsIn': - assert 'options' in d - out = IsIn(options=d['options']) - out.validate() - return out def is_in(options: List[Any]) -> IsIn: return IsIn(options) diff --git a/graphistry/compute/predicates/numeric.py b/graphistry/compute/predicates/numeric.py index 64fde67ee0..826996214f 100644 --- a/graphistry/compute/predicates/numeric.py +++ b/graphistry/compute/predicates/numeric.py @@ -20,18 +20,6 @@ def __init__(self, val: float) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s > self.val - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'GT', 'val': self.val} - - @classmethod - def from_json(cls, d: dict) -> 'GT': - assert 'val' in d - out = GT(val=d['val']) - out.validate() - return out - def gt(val: float) -> GT: """ Return whether a given value is greater than a threshold @@ -45,18 +33,6 @@ def __init__(self, val: float) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s < self.val - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'LT', 'val': self.val} - - @classmethod - def from_json(cls, d: dict) -> 'LT': - assert 'val' in d - out = LT(val=d['val']) - out.validate() - return out - def lt(val: float) -> LT: """ Return whether a given value is less than a threshold @@ -70,18 +46,6 @@ def __init__(self, val: float) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s >= self.val - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'GE', 'val': self.val} - - @classmethod - def from_json(cls, d: dict) -> 'GE': - assert 'val' in d - out = GE(val=d['val']) - out.validate() - return out - def ge(val: float) -> GE: """ Return whether a given value is greater than or equal to a threshold @@ -95,18 +59,6 @@ def __init__(self, val: float) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s <= self.val - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'LE', 'val': self.val} - - @classmethod - def from_json(cls, d: dict) -> 'LE': - assert 'val' in d - out = LE(val=d['val']) - out.validate() - return out - def le(val: float) -> LE: """ Return whether a given value is less than or equal to a threshold @@ -120,18 +72,6 @@ def __init__(self, val: float) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s == self.val - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'EQ', 'val': self.val} - - @classmethod - def from_json(cls, d: dict) -> 'EQ': - assert 'val' in d - out = EQ(val=d['val']) - out.validate() - return out - def eq(val: float) -> EQ: """ Return whether a given value is equal to a threshold @@ -145,18 +85,6 @@ def __init__(self, val: float) -> None: def __call__(self, s: pd.Series) -> pd.Series: return s != self.val - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'NE', 'val': self.val} - - @classmethod - def from_json(cls, d: dict) -> 'NE': - assert 'val' in d - out = NE(val=d['val']) - out.validate() - return out - def ne(val: float) -> NE: """ Return whether a given value is not equal to a threshold @@ -180,20 +108,6 @@ def validate(self) -> None: assert isinstance(self.upper, (int, float)) assert isinstance(self.inclusive, bool) - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return {'type': 'Between', 'lower': self.lower, 'upper': self.upper, 'inclusive': self.inclusive} - - @classmethod - def from_json(cls, d: dict) -> 'Between': - assert 'lower' in d - assert 'upper' in d - assert 'inclusive' in d - out = Between(lower=d['lower'], upper=d['upper'], inclusive=d['inclusive']) - out.validate() - return out - def between(lower: float, upper: float, inclusive: bool = True) -> Between: """ Return whether a given value is between a lower and upper threshold @@ -203,13 +117,6 @@ def between(lower: float, upper: float, inclusive: bool = True) -> Between: class IsNA(ASTPredicate): def __call__(self, s: pd.Series) -> pd.Series: return s.isna() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsNA'} - - @classmethod - def from_json(cls, d: dict) -> 'IsNA': - return IsNA() def isna() -> IsNA: """ @@ -221,13 +128,6 @@ def isna() -> IsNA: class NotNA(ASTPredicate): def __call__(self, s: pd.Series) -> pd.Series: return s.notna() - - def to_json(self, validate=True) -> dict: - return {'type': 'NotNA'} - - @classmethod - def from_json(cls, d: dict) -> 'NotNA': - return NotNA() def notna() -> NotNA: """ diff --git a/graphistry/compute/predicates/str.py b/graphistry/compute/predicates/str.py index fb81cc5ddf..43091d630a 100644 --- a/graphistry/compute/predicates/str.py +++ b/graphistry/compute/predicates/str.py @@ -21,34 +21,6 @@ def validate(self) -> None: assert isinstance(self.flags, int) assert isinstance(self.na, (bool, type(None))) assert isinstance(self.regex, bool) - - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return { - 'type': 'Contains', - 'pat': self.pat, - 'case': self.case, - 'flags': self.flags, - **({'na': self.na} if self.na is not None else {}), - 'regex': self.regex - } - - @classmethod - def from_json(cls, d: dict) -> 'Contains': - assert 'pat' in d - assert 'case' in d - assert 'flags' in d - assert 'regex' in d - out = Contains( - pat=d['pat'], - case=d['case'], - flags=d['flags'], - na=d['na'] if 'na' in d else None, - regex=d['regex'] - ) - out.validate() - return out def contains(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None, regex: bool = True) -> Contains: """ @@ -68,25 +40,6 @@ def __call__(self, s: pd.Series) -> pd.Series: def validate(self) -> None: assert isinstance(self.pat, str) assert isinstance(self.na, (str, type(None))) - - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return { - 'type': 'Startswith', - 'pat': self.pat, - **({'na': self.na} if self.na is not None else {}) - } - - @classmethod - def from_json(cls, d: dict) -> 'Startswith': - assert 'pat' in d - out = Startswith( - pat=d['pat'], - na=d['na'] if 'na' in d else None - ) - out.validate() - return out def startswith(pat: str, na: Optional[str] = None) -> Startswith: """ @@ -108,25 +61,6 @@ def __call__(self, s: pd.Series) -> pd.Series: def validate(self) -> None: assert isinstance(self.pat, str) assert isinstance(self.na, (str, type(None))) - - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return { - 'type': 'Endswith', - 'pat': self.pat, - **({'na': self.na} if self.na is not None else {}) - } - - @classmethod - def from_json(cls, d: dict) -> 'Endswith': - assert 'pat' in d - out = Endswith( - pat=d['pat'], - na=d['na'] if 'na' in d else None - ) - out.validate() - return out def endswith(pat: str, na: Optional[str] = None) -> Endswith: return Endswith(pat, na) @@ -146,31 +80,6 @@ def validate(self) -> None: assert isinstance(self.case, bool) assert isinstance(self.flags, int) assert isinstance(self.na, (bool, type(None))) - - def to_json(self, validate=True) -> dict: - if validate: - self.validate() - return { - 'type': 'Match', - 'pat': self.pat, - 'case': self.case, - 'flags': self.flags, - **({'na': self.na} if self.na is not None else {}) - } - - @classmethod - def from_json(cls, d: dict) -> 'Match': - assert 'pat' in d - assert 'case' in d - assert 'flags' in d - out = Match( - pat=d['pat'], - case=d['case'], - flags=d['flags'], - na=d['na'] if 'na' in d else None - ) - out.validate() - return out def match(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None) -> Match: """ @@ -179,19 +88,10 @@ def match(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None return Match(pat, case, flags, na) class IsNumeric(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isnumeric() - def to_json(self, validate=True) -> dict: - return {'type': 'IsNumeric'} - - @classmethod - def from_json(cls, d: dict) -> 'IsNumeric': - return IsNumeric() - def isnumeric() -> IsNumeric: """ Return whether a given string is numeric @@ -199,18 +99,9 @@ def isnumeric() -> IsNumeric: return IsNumeric() class IsAlpha(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isalpha() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsAlpha'} - - @classmethod - def from_json(cls, d: dict) -> 'IsAlpha': - return IsAlpha() def isalpha() -> IsAlpha: """ @@ -219,18 +110,9 @@ def isalpha() -> IsAlpha: return IsAlpha() class IsDigit(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isdigit() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsDigit'} - - @classmethod - def from_json(cls, d: dict) -> 'IsDigit': - return IsDigit() def isdigit() -> IsDigit: """ @@ -239,18 +121,9 @@ def isdigit() -> IsDigit: return IsDigit() class IsLower(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.islower() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsLower'} - - @classmethod - def from_json(cls, d: dict) -> 'IsLower': - return IsLower() def islower() -> IsLower: """ @@ -259,18 +132,9 @@ def islower() -> IsLower: return IsLower() class IsUpper(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isupper() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsUpper'} - - @classmethod - def from_json(cls, d: dict) -> 'IsUpper': - return IsUpper() def isupper() -> IsUpper: """ @@ -279,19 +143,10 @@ def isupper() -> IsUpper: return IsUpper() class IsSpace(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isspace() - def to_json(self, validate=True) -> dict: - return {'type': 'IsSpace'} - - @classmethod - def from_json(cls, d: dict) -> 'IsSpace': - return IsSpace() - def isspace() -> IsSpace: """ Return whether a given string is whitespace @@ -299,19 +154,10 @@ def isspace() -> IsSpace: return IsSpace() class IsAlnum(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isalnum() - def to_json(self, validate=True) -> dict: - return {'type': 'IsAlnum'} - - @classmethod - def from_json(cls, d: dict) -> 'IsAlnum': - return IsAlnum() - def isalnum() -> IsAlnum: """ Return whether a given string is alphanumeric @@ -319,18 +165,9 @@ def isalnum() -> IsAlnum: return IsAlnum() class IsDecimal(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.isdecimal() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsDecimal'} - - @classmethod - def from_json(cls, d: dict) -> 'IsDecimal': - return IsDecimal() def isdecimal() -> IsDecimal: """ @@ -339,18 +176,9 @@ def isdecimal() -> IsDecimal: return IsDecimal() class IsTitle(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.str.istitle() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsTitle'} - - @classmethod - def from_json(cls, d: dict) -> 'IsTitle': - return IsTitle() def istitle() -> IsTitle: """ @@ -359,18 +187,9 @@ def istitle() -> IsTitle: return IsTitle() class IsNull(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.isnull() - - def to_json(self, validate=True) -> dict: - return {'type': 'IsNull'} - - @classmethod - def from_json(cls, d: dict) -> 'IsNull': - return IsNull() def isnull() -> IsNull: """ @@ -379,18 +198,9 @@ def isnull() -> IsNull: return IsNull() class NotNull(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.notnull() - - def to_json(self, validate=True) -> dict: - return {'type': 'NotNull'} - - @classmethod - def from_json(cls, d: dict) -> 'NotNull': - return NotNull() def notnull() -> NotNull: """ diff --git a/graphistry/compute/predicates/temporal.py b/graphistry/compute/predicates/temporal.py index 329e95dcf2..3858478ede 100644 --- a/graphistry/compute/predicates/temporal.py +++ b/graphistry/compute/predicates/temporal.py @@ -4,18 +4,9 @@ from .ASTPredicate import ASTPredicate class IsMonthStart(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_month_start - - def to_json(self, validate=True) -> dict: - return {'type': 'IsMonthStart'} - - @classmethod - def from_json(cls, d: dict) -> 'IsMonthStart': - return IsMonthStart() def is_month_start() -> IsMonthStart: """ @@ -24,18 +15,9 @@ def is_month_start() -> IsMonthStart: return IsMonthStart() class IsMonthEnd(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_month_end - - def to_json(self, validate=True) -> dict: - return {'type': 'IsMonthEnd'} - - @classmethod - def from_json(cls, d: dict) -> 'IsMonthEnd': - return IsMonthEnd() def is_month_end() -> IsMonthEnd: """ @@ -44,18 +26,9 @@ def is_month_end() -> IsMonthEnd: return IsMonthEnd() class IsQuarterStart(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_quarter_start - - def to_json(self, validate=True) -> dict: - return {'type': 'IsQuarterStart'} - - @classmethod - def from_json(cls, d: dict) -> 'IsQuarterStart': - return IsQuarterStart() def is_quarter_start() -> IsQuarterStart: """ @@ -64,18 +37,9 @@ def is_quarter_start() -> IsQuarterStart: return IsQuarterStart() class IsQuarterEnd(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_quarter_end - - def to_json(self, validate=True) -> dict: - return {'type': 'IsQuarterEnd'} - - @classmethod - def from_json(cls, d: dict) -> 'IsQuarterEnd': - return IsQuarterEnd() def is_quarter_end() -> IsQuarterEnd: """ @@ -84,18 +48,9 @@ def is_quarter_end() -> IsQuarterEnd: return IsQuarterEnd() class IsYearStart(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_year_start - - def to_json(self, validate=True) -> dict: - return {'type': 'IsYearStart'} - - @classmethod - def from_json(cls, d: dict) -> 'IsYearStart': - return IsYearStart() def is_year_start() -> IsYearStart: """ @@ -104,18 +59,9 @@ def is_year_start() -> IsYearStart: return IsYearStart() class IsYearEnd(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_year_end - - def to_json(self, validate=True) -> dict: - return {'type': 'IsYearEnd'} - - @classmethod - def from_json(cls, d: dict) -> 'IsYearEnd': - return IsYearEnd() def is_year_end() -> IsYearEnd: """ @@ -124,18 +70,9 @@ def is_year_end() -> IsYearEnd: return IsYearEnd() class IsLeapYear(ASTPredicate): - def __init__(self) -> None: - pass def __call__(self, s: pd.Series) -> pd.Series: return s.dt.is_leap_year - - def to_json(self, validate=True) -> dict: - return {'type': 'IsLeapYear'} - - @classmethod - def from_json(cls, d: dict) -> 'IsLeapYear': - return IsLeapYear() def is_leap_year() -> IsLeapYear: """ From b1541fa1654cee7e8bc4437b08892b9a9d1b7521 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 22 Dec 2023 16:03:51 -0800 Subject: [PATCH 087/104] refactor(gfql): chain json use, node/edge field name convention --- CHANGELOG.md | 8 +- README.md | 8 +- graphistry/compute/ASTSerializable.py | 40 +++++ graphistry/compute/__init__.py | 1 + graphistry/compute/ast.py | 152 +++++++++--------- graphistry/compute/chain.py | 66 +++++--- graphistry/compute/predicates/ASTPredicate.py | 35 +--- graphistry/compute/predicates/from_json.py | 1 + graphistry/tests/compute/test_ast.py | 4 +- graphistry/tests/compute/test_chain.py | 52 +++--- 10 files changed, 206 insertions(+), 161 deletions(-) create mode 100644 graphistry/compute/ASTSerializable.py diff --git a/CHANGELOG.md b/CHANGELOG.md index e7761fba2a..b58dc37994 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,13 +9,19 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ### Added -* GFQL query serialization: `graphistry.compute.from_json(graphistry.compute.to_json([...]))` +* GFQL `Chain` AST object +* GFQL query serialization - `Chain`, `ASTObject`, and `ASTPredict` implement `ASTSerializable` + - Ex:`Chain.from_json(Chain([n(), e(), n()]).to_json())` * GFQL predicate `is_year_end` ### Docs * GFQL in readme.md +### Changes + +* Refactor `ASTEdge`, `ASTNode` field naming convention to match other `ASTSerializable`s + ### Breaking 🔥 * GFQL `e()` now aliases `e_undirected` instead of the base class `ASTEdge` diff --git a/README.md b/README.md index 8e6808c9dc..be64eccaa9 100644 --- a/README.md +++ b/README.md @@ -1319,11 +1319,11 @@ See table above for more predicates like `is_in()` and `gt()` Queries can be serialized and deserialized, such as for saving and remote execution: ```python -from graphistry.compute.chain import from_json, to_json +from graphistry.compute.chain import Chain -pattern = [n(), e(), n()] -pattern_json = to_json(pattern) -pattern2 = from_json(pattern_json) +pattern = Chain([n(), e(), n()]) +pattern_json = pattern.to_json() +pattern2 = Chain.from_json(pattern_json) g.chain(pattern2).plot() ``` diff --git a/graphistry/compute/ASTSerializable.py b/graphistry/compute/ASTSerializable.py new file mode 100644 index 0000000000..4122f189b0 --- /dev/null +++ b/graphistry/compute/ASTSerializable.py @@ -0,0 +1,40 @@ +from abc import ABC, abstractmethod +from typing import Dict +import pandas as pd + +from graphistry.utils.json import JSONVal, serialize_to_json_val + + +class ASTSerializable(ABC): + """ + Internal, not intended for use outside of this module. + Class name becomes o['type'], and all non reserved_fields become JSON-typed key + """ + + reserved_fields = ['type'] + + def validate(self) -> None: + pass + + def to_json(self, validate=True) -> Dict[str, JSONVal]: + """ + Returns JSON-compatible dictionry {"type": "ClassName", "arg1": val1, ...} + Emits all non-reserved instance fields + """ + if validate: + self.validate() + data: Dict[str, JSONVal] = {'type': self.__class__.__name__} + for key, value in self.__dict__.items(): + if key not in self.reserved_fields: + data[key] = serialize_to_json_val(value) + return data + + @classmethod + def from_json(cls, d: Dict[str, JSONVal]) -> 'ASTSerializable': + """ + Given c.to_json(), hydrate back c + + Corresponding c.__class__.__init__ must accept all non-reserved instance fields + """ + constructor_args = {k: v for k, v in d.items() if k not in cls.reserved_fields} + return cls(**constructor_args) diff --git a/graphistry/compute/__init__.py b/graphistry/compute/__init__.py index 5065246bd9..0bed507004 100644 --- a/graphistry/compute/__init__.py +++ b/graphistry/compute/__init__.py @@ -2,6 +2,7 @@ from .ast import ( n, e_forward, e_reverse, e_undirected ) +from .chain import Chain from .predicates.is_in import ( is_in, IsIn ) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index bbdc23933a..06359e0c8d 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -5,8 +5,9 @@ import pandas as pd from graphistry.Plottable import Plottable +from graphistry.compute.ASTSerializable import ASTSerializable from graphistry.util import setup_logger -from graphistry.utils.json import is_json_serializable +from graphistry.utils.json import JSONVal, is_json_serializable from .predicates.ASTPredicate import ASTPredicate from .predicates.is_in import ( is_in, IsIn @@ -60,7 +61,7 @@ ############################################################################## -class ASTObject(object): +class ASTObject(ASTSerializable): """ Internal, not intended for use outside of this module. These are operator-level expressions used as g.chain(List) @@ -76,13 +77,6 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta @abstractmethod def reverse(self) -> 'ASTObject': raise RuntimeError('reverse not implemented') - - @abstractmethod - def to_json(self, validate=True) -> dict: - raise NotImplementedError() - - def validate(self) -> None: - pass ############################################################################## @@ -120,30 +114,32 @@ def __init__(self, filter_dict: Optional[dict] = None, name: Optional[str] = Non if filter_dict == {}: filter_dict = None - self._filter_dict = filter_dict - self._query = query + self.filter_dict = filter_dict + self.query = query def __repr__(self) -> str: - return f'ASTNode(filter_dict={self._filter_dict}, name={self._name})' + return f'ASTNode(filter_dict={self.filter_dict}, name={self._name})' def validate(self) -> None: - if self._filter_dict is not None: - assert_record_match(self._filter_dict) + if self.filter_dict is not None: + assert_record_match(self.filter_dict) if self._name is not None: assert isinstance(self._name, str) - if self._query is not None: - assert isinstance(self._query, str) + if self.query is not None: + assert isinstance(self.query, str) def to_json(self, validate=True) -> dict: + if validate: + self.validate() return { 'type': 'Node', 'filter_dict': { k: v.to_json() if isinstance(v, ASTPredicate) else v - for k, v in self._filter_dict.items() + for k, v in self.filter_dict.items() if v is not None - } if self._filter_dict is not None else {}, + } if self.filter_dict is not None else {}, **({'name': self._name} if self._name is not None else {}), - **({'query': self._query } if self._query is not None else {}) + **({'query': self.query } if self.query is not None else {}) } @classmethod @@ -159,8 +155,8 @@ def from_json(cls, d: dict) -> 'ASTNode': def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: out_g = (g .nodes(prev_node_wavefront if prev_node_wavefront is not None else g._nodes) - .filter_nodes_by_dict(self._filter_dict) - .nodes(lambda g_dynamic: g_dynamic._nodes.query(self._query) if self._query is not None else g_dynamic._nodes) + .filter_nodes_by_dict(self.filter_dict) + .nodes(lambda g_dynamic: g_dynamic._nodes.query(self.query) if self.query is not None else g_dynamic._nodes) .edges(g._edges[:0]) ) if target_wave_front is not None: @@ -221,65 +217,65 @@ def __init__( if destination_node_match == {}: destination_node_match = None - self._hops = hops - self._to_fixed_point = to_fixed_point - self._direction : Direction = direction - self._source_node_match = source_node_match - self._edge_match = edge_match - self._destination_node_match = destination_node_match - self._source_node_query = source_node_query - self._destination_node_query = destination_node_query - self._edge_query = edge_query + self.hops = hops + self.to_fixed_point = to_fixed_point + self.direction : Direction = direction + self.source_node_match = source_node_match + self.edge_match = edge_match + self.destination_node_match = destination_node_match + self.source_node_query = source_node_query + self.destination_node_query = destination_node_query + self.edge_query = edge_query def __repr__(self) -> str: - return f'ASTEdge(direction={self._direction}, edge_match={self._edge_match}, hops={self._hops}, to_fixed_point={self._to_fixed_point}, source_node_match={self._source_node_match}, destination_node_match={self._destination_node_match}, name={self._name}, source_node_query={self._source_node_query}, destination_node_query={self._destination_node_query}, edge_query={self._edge_query})' + return f'ASTEdge(direction={self.direction}, edge_match={self.edge_match}, hops={self.hops}, to_fixed_point={self.to_fixed_point}, source_node_match={self.source_node_match}, destination_node_match={self.destination_node_match}, name={self._name}, source_node_query={self.source_node_query}, destination_node_query={self.destination_node_query}, edge_query={self.edge_query})' def validate(self) -> None: - assert self._hops is None or isinstance(self._hops, int) - assert isinstance(self._to_fixed_point, bool) - assert self._direction in ['forward', 'reverse', 'undirected'] - if self._source_node_match is not None: - assert_record_match(self._source_node_match) - if self._edge_match is not None: - assert_record_match(self._edge_match) - if self._destination_node_match is not None: - assert_record_match(self._destination_node_match) + assert self.hops is None or isinstance(self.hops, int) + assert isinstance(self.to_fixed_point, bool) + assert self.direction in ['forward', 'reverse', 'undirected'] + if self.source_node_match is not None: + assert_record_match(self.source_node_match) + if self.edge_match is not None: + assert_record_match(self.edge_match) + if self.destination_node_match is not None: + assert_record_match(self.destination_node_match) if self._name is not None: assert isinstance(self._name, str) - if self._source_node_query is not None: - assert isinstance(self._source_node_query, str) - if self._destination_node_query is not None: - assert isinstance(self._destination_node_query, str) - if self._edge_query is not None: - assert isinstance(self._edge_query, str) + if self.source_node_query is not None: + assert isinstance(self.source_node_query, str) + if self.destination_node_query is not None: + assert isinstance(self.destination_node_query, str) + if self.edge_query is not None: + assert isinstance(self.edge_query, str) def to_json(self, validate=True) -> dict: if validate: self.validate() return { 'type': 'Edge', - 'hops': self._hops, - 'to_fixed_point': self._to_fixed_point, - 'direction': self._direction, + 'hops': self.hops, + 'to_fixed_point': self.to_fixed_point, + 'direction': self.direction, **({'source_node_match': { k: v.to_json() if isinstance(v, ASTPredicate) else v - for k, v in self._source_node_match.items() + for k, v in self.source_node_match.items() if v is not None - }} if self._source_node_match is not None else {}), + }} if self.source_node_match is not None else {}), **({'edge_match': { k: v.to_json() if isinstance(v, ASTPredicate) else v - for k, v in self._edge_match.items() + for k, v in self.edge_match.items() if v is not None - }} if self._edge_match is not None else {}), + }} if self.edge_match is not None else {}), **({'destination_node_match': { k: v.to_json() if isinstance(v, ASTPredicate) else v - for k, v in self._destination_node_match.items() + for k, v in self.destination_node_match.items() if v is not None - }} if self._destination_node_match is not None else {}), + }} if self.destination_node_match is not None else {}), **({'name': self._name} if self._name is not None else {}), - **({'source_node_query': self._source_node_query} if self._source_node_query is not None else {}), - **({'destination_node_query': self._destination_node_query} if self._destination_node_query is not None else {}), - **({'edge_query': self._edge_query} if self._edge_query is not None else {}) + **({'source_node_query': self.source_node_query} if self.source_node_query is not None else {}), + **({'destination_node_query': self.destination_node_query} if self.destination_node_query is not None else {}), + **({'edge_query': self.edge_query} if self.edge_query is not None else {}) } @classmethod @@ -312,17 +308,17 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta out_g = g.hop( nodes=prev_node_wavefront, - hops=self._hops, - to_fixed_point=self._to_fixed_point, - direction=self._direction, - source_node_match=self._source_node_match, - edge_match=self._edge_match, - destination_node_match=self._destination_node_match, + hops=self.hops, + to_fixed_point=self.to_fixed_point, + direction=self.direction, + source_node_match=self.source_node_match, + edge_match=self.edge_match, + destination_node_match=self.destination_node_match, return_as_wave_front=True, target_wave_front=target_wave_front, - source_node_query=self._source_node_query, - destination_node_query=self._destination_node_query, - edge_query=self._edge_query + source_node_query=self.source_node_query, + destination_node_query=self.destination_node_query, + edge_query=self.edge_query ) if self._name is not None: @@ -337,22 +333,22 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta def reverse(self) -> 'ASTEdge': # updates both edges and nodes direction : Direction - if self._direction == 'reverse': + if self.direction == 'reverse': direction = 'forward' - elif self._direction == 'forward': + elif self.direction == 'forward': direction = 'reverse' else: direction = 'undirected' return ASTEdge( direction=direction, - edge_match=self._edge_match, - hops=self._hops, - to_fixed_point=self._to_fixed_point, - source_node_match=self._destination_node_match, - destination_node_match=self._source_node_match, - source_node_query=self._destination_node_query, - destination_node_query=self._source_node_query, - edge_query=self._edge_query + edge_match=self.edge_match, + hops=self.hops, + to_fixed_point=self.to_fixed_point, + source_node_match=self.destination_node_match, + destination_node_match=self.source_node_match, + source_node_query=self.destination_node_query, + destination_node_query=self.source_node_query, + edge_query=self.edge_query ) class ASTEdgeForward(ASTEdge): @@ -448,7 +444,7 @@ def __init__(self, ### -def from_json(o: Dict) -> Union[ASTNode, ASTEdge]: +def from_json(o: JSONVal) -> Union[ASTNode, ASTEdge]: assert isinstance(o, dict) assert 'type' in o out : Union[ASTNode, ASTEdge] diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 44e0133138..d8536ea1df 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -1,8 +1,10 @@ -from typing import Dict, cast, List, Tuple +from typing import Dict, Union, cast, List, Tuple import pandas as pd from graphistry.Plottable import Plottable +from graphistry.compute.ASTSerializable import ASTSerializable from graphistry.util import setup_logger +from graphistry.utils.json import JSONVal from .ast import ASTObject, ASTNode, ASTEdge, from_json as ASTObject_from_json logger = setup_logger(__name__) @@ -11,6 +13,44 @@ ############################################################################### +class Chain(ASTSerializable): + + def __init__(self, chain: List[ASTObject]) -> None: + self.chain = chain + + def validate(self) -> None: + assert isinstance(self.chain, list) + for op in self.chain: + assert isinstance(op, ASTObject) + op.validate() + + @classmethod + def from_json(cls, d: Dict[str, JSONVal]) -> 'Chain': + """ + Convert a JSON AST into a list of ASTObjects + """ + assert isinstance(d, dict) + assert 'chain' in d + assert isinstance(d['chain'], list) + out = cls([ASTObject_from_json(op) for op in d['chain']]) + out.validate() + return out + + def to_json(self, validate=True) -> Dict[str, JSONVal]: + """ + Convert a list of ASTObjects into a JSON AST + """ + if validate: + self.validate() + return { + 'type': self.__class__.__name__, + 'chain': [op.to_json() for op in self.chain] + } + + +############################################################################### + + def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable]]) -> pd.DataFrame: """ Collect nodes and edges, taking care to deduplicate and tag any names @@ -92,13 +132,15 @@ def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable # ############################################################################### -def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: +def chain(self: Plottable, ops: Union[List[ASTObject], Chain]) -> Plottable: """ - Experimental: Chain a list of operations + Chain a list of ASTObject (node/edge) traversal operations Return subgraph of matches according to the list of node & edge matchers If any matchers are named, add a correspondingly named boolean-valued column to the output + For direct calls, exposes convenience `List[ASTObject]`. Internal operational should prefer `Chain`. + :param ops: List[ASTObject] Various node and edge matchers :returns: Plotter @@ -162,6 +204,9 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: """ + if isinstance(ops, Chain): + ops = ops.chain + if len(ops) == 0: return self @@ -253,18 +298,3 @@ def chain(self: Plottable, ops: List[ASTObject]) -> Plottable: g_out = g.nodes(final_nodes_df).edges(final_edges_df) return g_out - -### - -def from_json(d: Dict) -> List[ASTObject]: - """ - Convert a JSON AST into a list of ASTObjects - """ - assert isinstance(d, list) - return [ASTObject_from_json(op) for op in d] - -def to_json(ops: List[ASTObject]) -> List[Dict]: - """ - Convert a list of ASTObjects into a JSON AST - """ - return [op.to_json() for op in ops] diff --git a/graphistry/compute/predicates/ASTPredicate.py b/graphistry/compute/predicates/ASTPredicate.py index cc6e25b966..0d9b300223 100644 --- a/graphistry/compute/predicates/ASTPredicate.py +++ b/graphistry/compute/predicates/ASTPredicate.py @@ -1,44 +1,15 @@ -from abc import ABC, abstractmethod -from typing import Any, Dict +from abc import abstractmethod import pandas as pd -from graphistry.utils.json import JSONVal, serialize_to_json_val +from graphistry.compute.ASTSerializable import ASTSerializable -class ASTPredicate(ABC): +class ASTPredicate(ASTSerializable): """ Internal, not intended for use outside of this module. These are fancy columnar predicates used in {k: v, ...} node/edge df matching when going beyond primitive equality """ - reserved_fields = ['type'] - @abstractmethod def __call__(self, s: pd.Series) -> pd.Series: raise NotImplementedError() - - def validate(self) -> None: - pass - - def to_json(self, validate=True) -> Dict[str, JSONVal]: - """ - Returns JSON-compatible dictionry {"type": "ClassName", "arg1": val1, ...} - Emits all non-reserved instance fields - """ - if validate: - self.validate() - data: Dict[str, JSONVal] = {'type': self.__class__.__name__} - for key, value in self.__dict__.items(): - if key not in self.reserved_fields: - data[key] = serialize_to_json_val(value) - return data - - @classmethod - def from_json(cls, d: Dict[str, JSONVal]) -> 'ASTPredicate': - """ - Given c.to_json(), hydrate back c - - Corresponding c.__class__.__init__ must accept all non-reserved instance fields - """ - constructor_args = {k: v for k, v in d.items() if k not in cls.reserved_fields} - return cls(**constructor_args) diff --git a/graphistry/compute/predicates/from_json.py b/graphistry/compute/predicates/from_json.py index 544cc87cda..fd248ec73e 100644 --- a/graphistry/compute/predicates/from_json.py +++ b/graphistry/compute/predicates/from_json.py @@ -37,5 +37,6 @@ def from_json(d: Dict[str, JSONVal]) -> ASTPredicate: assert isinstance(d['type'], str) pred = type_to_predicate[d['type']] out = pred.from_json(d) + assert isinstance(out, ASTPredicate) out.validate() return out diff --git a/graphistry/tests/compute/test_ast.py b/graphistry/tests/compute/test_ast.py index f08977223b..61f082d21c 100644 --- a/graphistry/tests/compute/test_ast.py +++ b/graphistry/tests/compute/test_ast.py @@ -6,7 +6,7 @@ def test_serialization_node(): o = node.to_json() node2 = from_json(o) assert isinstance(node2, ASTNode) - assert node2._query == 'zzz' + assert node2.query == 'zzz' assert node2._name == 'abc' o2 = node2.to_json() assert o == o2 @@ -17,7 +17,7 @@ def test_serialization_edge(): o = edge.to_json() edge2 = from_json(o) assert isinstance(edge2, ASTEdge) - assert edge2._edge_query == 'zzz' + assert edge2.edge_query == 'zzz' assert edge2._name == 'abc' o2 = edge2.to_json() assert o == o2 diff --git a/graphistry/tests/compute/test_chain.py b/graphistry/tests/compute/test_chain.py index 0f3d221c82..30760d6b29 100644 --- a/graphistry/tests/compute/test_chain.py +++ b/graphistry/tests/compute/test_chain.py @@ -1,38 +1,38 @@ from graphistry.compute.ast import ASTNode, ASTEdge, n, e -from graphistry.compute.chain import to_json as chain_to_json, from_json as chain_from_json +from graphistry.compute.chain import Chain def test_chain_serialization_mt(): - o = chain_to_json([]) - d = chain_from_json(o) - assert d == [] - assert o == [] + o = Chain([]).to_json() + d = Chain.from_json(o) + assert d.chain == [] + assert o['chain'] == [] def test_chain_serialization_node(): - o = chain_to_json([n(query='zzz', name='abc')]) - d = chain_from_json(o) - assert isinstance(d[0], ASTNode) - assert d[0]._query == 'zzz' - assert d[0]._name == 'abc' - o2 = chain_to_json(d) + o = Chain([n(query='zzz', name='abc')]).to_json() + d = Chain.from_json(o) + assert isinstance(d.chain[0], ASTNode) + assert d.chain[0].query == 'zzz' + assert d.chain[0]._name == 'abc' + o2 = d.to_json() assert o == o2 def test_chain_serialization_edge(): - o = chain_to_json([e(edge_query='zzz', name='abc')]) - d = chain_from_json(o) - assert isinstance(d[0], ASTEdge) - assert d[0]._edge_query == 'zzz' - assert d[0]._name == 'abc' - o2 = chain_to_json(d) + o = Chain([e(edge_query='zzz', name='abc')]).to_json() + d = Chain.from_json(o) + assert isinstance(d.chain[0], ASTEdge) + assert d.chain[0].edge_query == 'zzz' + assert d.chain[0]._name == 'abc' + o2 = d.to_json() assert o == o2 def test_chain_serialization_multi(): - o = chain_to_json([n(query='zzz', name='abc'), e(edge_query='zzz', name='abc')]) - d = chain_from_json(o) - assert isinstance(d[0], ASTNode) - assert d[0]._query == 'zzz' - assert d[0]._name == 'abc' - assert isinstance(d[1], ASTEdge) - assert d[1]._edge_query == 'zzz' - assert d[1]._name == 'abc' - o2 = chain_to_json(d) + o = Chain([n(query='zzz', name='abc'), e(edge_query='zzz', name='abc')]).to_json() + d = Chain.from_json(o) + assert isinstance(d.chain[0], ASTNode) + assert d.chain[0].query == 'zzz' + assert d.chain[0]._name == 'abc' + assert isinstance(d.chain[1], ASTEdge) + assert d.chain[1].edge_query == 'zzz' + assert d.chain[1]._name == 'abc' + o2 = d.to_json() assert o == o2 From 060c0e87540fca59951b1e1a4f7d8035389a608f Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 22 Dec 2023 16:07:16 -0800 Subject: [PATCH 088/104] refactor(gfql): export Chain --- graphistry/__init__.py | 1 + 1 file changed, 1 insertion(+) diff --git a/graphistry/__init__.py b/graphistry/__init__.py index c3e5f6610d..43bcc8660e 100644 --- a/graphistry/__init__.py +++ b/graphistry/__init__.py @@ -51,6 +51,7 @@ from graphistry.compute import ( n, e_forward, e_reverse, e_undirected, + Chain, is_in, IsIn, From fe2e342afb759bcbe3e3c9dfdc8c4246fadaa905 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 22 Dec 2023 16:09:12 -0800 Subject: [PATCH 089/104] fix(docs): new types --- docs/source/conf.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/conf.py b/docs/source/conf.py index c055149334..50ff684da7 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -48,6 +48,8 @@ ('py:class', '3'), ('py:class', ""), ('py:class', ""), + ('py:class', "graphistry.compute.ASTSerializable.ASTSerializable"), + ('py:class', "graphistry.compute.chain.Chain"), ('py:class', "graphistry.compute.predicates.ASTPredicate.ASTPredicate"), ('py:class', 'graphistry.compute.predicates.categorical.Duplicated'), ('py:class', 'graphistry.compute.predicates.is_in.IsIn'), From 87cb5c7ed996e37fc0ff998320749048a4727e28 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Fri, 22 Dec 2023 16:15:04 -0800 Subject: [PATCH 090/104] docs(changelog): 0.32.0 --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index b58dc37994..c4a9c837d5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,8 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +## [0.32.0 - 2023-12-22] + ### Added * GFQL `Chain` AST object From a18800cb50bb8155bc0312c5eb4a6c90764fb173 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 19:20:50 -0800 Subject: [PATCH 091/104] infra(gpu tester): thread LOG_LEVEL --- docker/test-gpu-local.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docker/test-gpu-local.sh b/docker/test-gpu-local.sh index d0d239d023..14d4c27791 100755 --- a/docker/test-gpu-local.sh +++ b/docker/test-gpu-local.sh @@ -10,6 +10,7 @@ WITH_TYPECHECK=${WITH_TYPECHECK:-1} WITH_TEST=${WITH_TEST:-1} WITH_BUILD=${WITH_BUILD:-1} TEST_CPU_VERSION=${TEST_CPU_VERSION:-latest} +LOG_LEVEL=${LOG_LEVEL:-DEBUG} NETWORK="" if [ "$WITH_NEO4J" == "1" ] @@ -39,6 +40,7 @@ docker run \ -e WITH_TYPECHECK=$WITH_TYPECHECK \ -e WITH_TEST=$WITH_TEST \ -e WITH_BUILD=$WITH_BUILD \ + -e LOG_LEVEL=$LOG_LEVEL \ -v "`pwd`/../graphistry:/opt/pygraphistry/graphistry:ro" \ --security-opt seccomp=unconfined \ --rm \ From 5f29bde66d3551fd238add25f722aaa4963c46aa Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 19:21:33 -0800 Subject: [PATCH 092/104] refactor(Engine): Add EngineAbstract --- graphistry/Engine.py | 67 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 3 deletions(-) diff --git a/graphistry/Engine.py b/graphistry/Engine.py index 5b0eb7e2ff..8753aa71f7 100644 --- a/graphistry/Engine.py +++ b/graphistry/Engine.py @@ -1,5 +1,5 @@ import pandas as pd -from typing import Any +from typing import Any, Optional from enum import Enum @@ -9,11 +9,63 @@ class Engine(Enum): DASK : str = 'dask' DASK_CUDF : str = 'dask_cudf' +class EngineAbstract(Enum): + PANDAS = Engine.PANDAS.value + CUDF = Engine.CUDF.value + DASK = Engine.DASK.value + DASK_CUDF = Engine.DASK_CUDF.value + AUTO = 'auto' + DataframeLike = Any # pdf, cudf, ddf, dgdf DataframeLocalLike = Any # pdf, cudf GraphistryLke = Any +#TODO use new importer when it lands (this is copied from umap_utils) +def lazy_cudf_import_has_dependancy(): + try: + import warnings + + warnings.filterwarnings("ignore") + import cudf # type: ignore + + return True, "ok", cudf + except ModuleNotFoundError as e: + return False, e, None + +def resolve_engine( + engine: EngineAbstract, + g_or_df: Optional[Any] = None, +) -> Engine: + # if an Engine (concrete), just use that + if engine != EngineAbstract.AUTO: + return Engine(engine.value) + + if g_or_df is not None: + # work around circular dependency + from graphistry.Plottable import Plottable + if isinstance(g_or_df, Plottable): + if g_or_df._nodes is not None and g_or_df._edges is not None: + if not isinstance(g_or_df._nodes, type(g_or_df._edges)): + raise ValueError(f'Edges and nodes must be same type for auto engine selection, got: {type(g_or_df._edges)} and {type(g_or_df._nodes)}') + g_or_df = g_or_df._edges if g_or_df._edges is not None else g_or_df._nodes + + if g_or_df is not None: + if isinstance(g_or_df, pd.DataFrame): + return Engine.PANDAS + + has_cudf_dependancy_, _, _ = lazy_cudf_import_has_dependancy() + if has_cudf_dependancy_: + import cudf + if isinstance(g_or_df, cudf.DataFrame): + return Engine.CUDF + raise ValueError(f'Expected cudf dataframe, got: {type(g_or_df)}') + + has_cudf_dependancy_, _, _ = lazy_cudf_import_has_dependancy() + if has_cudf_dependancy_: + return Engine.CUDF + return Engine.PANDAS + def df_to_pdf(df, engine: Engine): if engine == Engine.PANDAS: return df @@ -35,8 +87,7 @@ def df_to_engine(df, engine: Engine): return df else: return cudf.DataFrame.from_pandas(df) - else: - raise ValueError('Only engines pandas/cudf supported') + raise ValueError('Only engines pandas/cudf supported') def df_concat(engine: Engine): if engine == Engine.PANDAS: @@ -44,6 +95,7 @@ def df_concat(engine: Engine): elif engine == Engine.CUDF: import cudf return cudf.concat + raise NotImplementedError("Only pandas/cudf supported") def df_cons(engine: Engine): if engine == Engine.PANDAS: @@ -51,3 +103,12 @@ def df_cons(engine: Engine): elif engine == Engine.CUDF: import cudf return cudf.DataFrame + raise NotImplementedError("Only pandas/cudf supported") + +def s_cons(engine: Engine): + if engine == Engine.PANDAS: + return pd.Series + elif engine == Engine.CUDF: + import cudf + return cudf.Series + raise NotImplementedError("Only pandas/cudf supported") From 25aab669e55d6dfcd4bce727acd8f4e8e7f12a7d Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 19:25:12 -0800 Subject: [PATCH 093/104] refactor(pd): abstract out --- graphistry/Plottable.py | 4 +- graphistry/compute/ComputeMixin.py | 24 +++++----- graphistry/compute/ast.py | 35 ++++++++++++--- graphistry/compute/chain.py | 36 ++++++++++----- graphistry/compute/filter_by_dict.py | 29 +++++++++--- graphistry/compute/hop.py | 45 +++++++++++++------ graphistry/compute/predicates/ASTPredicate.py | 9 +++- graphistry/compute/predicates/categorical.py | 9 +++- graphistry/compute/predicates/is_in.py | 11 +++-- graphistry/compute/predicates/numeric.py | 26 ++++++----- graphistry/compute/predicates/str.py | 38 +++++++++------- graphistry/compute/predicates/temporal.py | 23 ++++++---- graphistry/plugins/cugraph.py | 6 +-- 13 files changed, 206 insertions(+), 89 deletions(-) diff --git a/graphistry/Plottable.py b/graphistry/Plottable.py index 56cb22cc9b..e7589e92ea 100644 --- a/graphistry/Plottable.py +++ b/graphistry/Plottable.py @@ -3,7 +3,7 @@ import pandas as pd from graphistry.plugins_types.cugraph_types import CuGraphKind -from graphistry.Engine import Engine +from graphistry.Engine import Engine, EngineAbstract if TYPE_CHECKING: @@ -149,7 +149,7 @@ def get_degrees( raise RuntimeError('should not happen') return self - def materialize_nodes(self, reuse: bool = True, engine: Union[Engine, Literal['auto']] = 'auto') -> 'Plottable': + def materialize_nodes(self, reuse: bool = True, engine:EngineAbstract = EngineAbstract.AUTO) -> 'Plottable': if 1 + 1: raise RuntimeError('should not happen') return self diff --git a/graphistry/compute/ComputeMixin.py b/graphistry/compute/ComputeMixin.py index a5a0431f00..ea00ad34c8 100644 --- a/graphistry/compute/ComputeMixin.py +++ b/graphistry/compute/ComputeMixin.py @@ -2,7 +2,7 @@ from typing import Any, List, Union, TYPE_CHECKING from typing_extensions import Literal -from graphistry.Engine import Engine +from graphistry.Engine import Engine, EngineAbstract from graphistry.Plottable import Plottable from graphistry.util import setup_logger from .chain import chain as chain_base @@ -28,7 +28,7 @@ def __init__(self, *args, **kwargs): def materialize_nodes( self, reuse: bool = True, - engine: Union[Engine, Literal['auto']] = "auto" + engine: EngineAbstract = EngineAbstract.AUTO ) -> "Plottable": """ Generate g._nodes based on g._edges @@ -72,21 +72,25 @@ def materialize_nodes( return g node_id = g._node if g._node is not None else "id" - if engine == 'auto': + engine_concrete : Engine + if engine == EngineAbstract.AUTO: if isinstance(g._edges, pd.DataFrame): - engine = Engine.PANDAS + engine_concrete = Engine.PANDAS else: try: import cudf if isinstance(g._edges, cudf.DataFrame): - engine = Engine.CUDF + engine_concrete = Engine.CUDF except ImportError: pass - if engine == 'auto': - raise ValueError('Could not determine engine for edges, expected pandas or cudf dataframe, got: {}'.format(type(g._edges))) - if engine == Engine.PANDAS: + if engine == EngineAbstract.AUTO: + raise ValueError('Could not determine engine for edges, expected pandas or cudf dataframe, got: {}'.format(type(g._edges))) + else: + engine_concrete = Engine(engine.value) + + if engine_concrete == Engine.PANDAS: concat_df = pd.concat([g._edges[g._source], g._edges[g._destination]]) - elif engine == Engine.CUDF: + elif engine_concrete == Engine.CUDF: import cudf if isinstance(g._edges, cudf.DataFrame): edges_gdf = g._edges @@ -96,7 +100,7 @@ def materialize_nodes( raise ValueError('Unexpected edges type; convert edges to cudf.DataFrame') concat_df = cudf.concat([edges_gdf[g._source].rename(node_id), edges_gdf[g._destination].rename(node_id)]) else: - raise ValueError('Expected engine to be pandas or cudf, got: {}'.format(engine)) + raise ValueError('Expected engine to be pandas or cudf, got: {}'.format(engine_concrete)) nodes_df = concat_df.rename(node_id).drop_duplicates().to_frame().reset_index(drop=True) return g.nodes(nodes_df, node_id) diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 06359e0c8d..83c1d9020a 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -1,8 +1,9 @@ from abc import abstractmethod import logging -from typing import Dict, Optional, Union, cast +from typing import Any, TYPE_CHECKING, Dict, Optional, Union, cast from typing_extensions import Literal import pandas as pd +from graphistry.Engine import Engine from graphistry.Plottable import Plottable from graphistry.compute.ASTSerializable import ASTSerializable @@ -58,6 +59,12 @@ logger = setup_logger(__name__) +if TYPE_CHECKING: + DataFrameT = pd.DataFrame +else: + DataFrameT = Any + + ############################################################################## @@ -71,7 +78,13 @@ def __init__(self, name: Optional[str] = None): pass @abstractmethod - def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: + def __call__( + self, + g: Plottable, + prev_node_wavefront: Optional[DataFrameT], + target_wave_front: Optional[DataFrameT], + engine: Engine + ) -> Plottable: raise RuntimeError('__call__ not implemented') @abstractmethod @@ -152,7 +165,13 @@ def from_json(cls, d: dict) -> 'ASTNode': out.validate() return out - def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: + def __call__( + self, + g: Plottable, + prev_node_wavefront: Optional[DataFrameT], + target_wave_front: Optional[DataFrameT], + engine: Engine + ) -> Plottable: out_g = (g .nodes(prev_node_wavefront if prev_node_wavefront is not None else g._nodes) .filter_nodes_by_dict(self.filter_dict) @@ -161,7 +180,7 @@ def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], ta ) if target_wave_front is not None: assert g._node is not None - reduced_nodes = cast(pd.DataFrame, out_g._nodes).merge(target_wave_front[[g._node]], on=g._node, how='inner') + reduced_nodes = cast(DataFrameT, out_g._nodes).merge(target_wave_front[[g._node]], on=g._node, how='inner') out_g = out_g.nodes(reduced_nodes) if self._name is not None: @@ -295,7 +314,13 @@ def from_json(cls, d: dict) -> 'ASTEdge': out.validate() return out - def __call__(self, g: Plottable, prev_node_wavefront: Optional[pd.DataFrame], target_wave_front: Optional[pd.DataFrame]) -> Plottable: + def __call__( + self, + g: Plottable, + prev_node_wavefront: Optional[DataFrameT], + target_wave_front: Optional[DataFrameT], + engine: Engine + ) -> Plottable: if logger.isEnabledFor(logging.DEBUG): logger.debug('----------------------------------------') diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index d8536ea1df..de55955212 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -1,5 +1,6 @@ -from typing import Dict, Union, cast, List, Tuple +from typing import Any, Dict, Union, cast, List, Tuple, TYPE_CHECKING import pandas as pd +from graphistry.Engine import Engine, EngineAbstract, df_concat, resolve_engine from graphistry.Plottable import Plottable from graphistry.compute.ASTSerializable import ASTSerializable @@ -13,6 +14,12 @@ ############################################################################### +if TYPE_CHECKING: + DataFrameT = pd.DataFrame +else: + DataFrameT = Any + + class Chain(ASTSerializable): def __init__(self, chain: List[ASTObject]) -> None: @@ -51,7 +58,7 @@ def to_json(self, validate=True) -> Dict[str, JSONVal]: ############################################################################### -def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable]]) -> pd.DataFrame: +def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable]], engine: Engine) -> DataFrameT: """ Collect nodes and edges, taking care to deduplicate and tag any names """ @@ -74,14 +81,17 @@ def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable prev_node_wavefront=g_step._nodes, # start from where backwards step says is reachable #target_wave_front=steps[i+1][1]._nodes # end at where next backwards step says is reachable - target_wave_front=None # ^^^ optimization: valid transitions already limit to known-good ones + target_wave_front=None, # ^^^ optimization: valid transitions already limit to known-good ones + engine=engine ) ) for (op, g_step) in steps ] + concat = df_concat(engine) + # df[[id]] - out_df = pd.concat([ + out_df = concat([ getattr(g_step, df_fld)[[id]] for (_, g_step) in steps ]).drop_duplicates(subset=[id]) @@ -132,7 +142,7 @@ def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable # ############################################################################### -def chain(self: Plottable, ops: Union[List[ASTObject], Chain]) -> Plottable: +def chain(self: Plottable, ops: Union[List[ASTObject], Chain], engine: EngineAbstract = EngineAbstract.AUTO) -> Plottable: """ Chain a list of ASTObject (node/edge) traversal operations @@ -212,6 +222,9 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain]) -> Plottable: logger.debug('orig chain >> %s', ops) + engine_concrete = resolve_engine(engine, self) + logger.debug('chain engine: %s => %s', engine, engine_concrete) + if isinstance(ops[0], ASTEdge): logger.debug('adding initial node to ensure initial link has needed reversals') ops = cast(List[ASTObject], [ ASTNode() ]) + ops @@ -222,7 +235,7 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain]) -> Plottable: logger.debug('final chain >> %s', ops) - g = self.materialize_nodes() + g = self.materialize_nodes(engine=EngineAbstract(engine_concrete.value)) if g._edge is None: if 'index' in g._edges.columns: @@ -252,7 +265,8 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain]) -> Plottable: op( g=g, # transition via any original edge prev_node_wavefront=prev_step_nodes, - target_wave_front=None # implicit any + target_wave_front=None, # implicit any + engine=engine_concrete ) ) g_stack.append(g_step) @@ -282,16 +296,18 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain]) -> Plottable: prev_node_wavefront=prev_loop_step._nodes, # only allow transitions to these nodes (vs prev_node_wavefront) - target_wave_front=prev_orig_step._nodes if prev_orig_step is not None else None + target_wave_front=prev_orig_step._nodes if prev_orig_step is not None else None, + + engine=engine_concrete ) ) g_stack_reverse.append(g_step_reverse) logger.debug('============ COMBINE NODES ============') - final_nodes_df = combine_steps(g, 'nodes', list(zip(ops, reversed(g_stack_reverse)))) + final_nodes_df = combine_steps(g, 'nodes', list(zip(ops, reversed(g_stack_reverse))), engine_concrete) logger.debug('============ COMBINE EDGES ============') - final_edges_df = combine_steps(g, 'edges', list(zip(ops, reversed(g_stack_reverse)))) + final_edges_df = combine_steps(g, 'edges', list(zip(ops, reversed(g_stack_reverse))), engine_concrete) if added_edge_index: final_edges_df = final_edges_df.drop(columns=['index']) diff --git a/graphistry/compute/filter_by_dict.py b/graphistry/compute/filter_by_dict.py index db59d2605d..ebcda6bf51 100644 --- a/graphistry/compute/filter_by_dict.py +++ b/graphistry/compute/filter_by_dict.py @@ -1,17 +1,31 @@ -from typing import Dict, Optional +from typing import Any, Dict, Optional, TYPE_CHECKING import pandas as pd +from graphistry.Engine import EngineAbstract, df_to_engine, resolve_engine, s_cons +from graphistry.util import setup_logger from graphistry.Plottable import Plottable from .predicates.ASTPredicate import ASTPredicate -def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: +logger = setup_logger(__name__) + + +if TYPE_CHECKING: + DataFrameT = pd.DataFrame +else: + DataFrameT = Any + +def filter_by_dict(df: DataFrameT, filter_dict: Optional[dict] = None, engine: EngineAbstract = EngineAbstract.AUTO) -> DataFrameT: """ return df where rows match all values in filter_dict """ if filter_dict is None or filter_dict == {}: return df + + engine_concrete = resolve_engine(engine, df) + df = df_to_engine(df, engine_concrete) + logger.debug('filter_by_dict engine: %s => %s', engine, engine_concrete) predicates: Dict[str, ASTPredicate] = {} for col, val in filter_dict.items(): @@ -26,7 +40,8 @@ def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: } if filter_dict_concrete: - hits = (df[list(filter_dict_concrete)] == pd.Series(filter_dict_concrete)).all(axis=1) + S = s_cons(engine_concrete) + hits = (df[list(filter_dict_concrete)] == S(filter_dict_concrete)).all(axis=1) else: hits = df[[]].assign(x=True).x if predicates: @@ -35,17 +50,17 @@ def filter_by_dict(df, filter_dict: Optional[dict] = None) -> pd.DataFrame: return df[hits] -def filter_nodes_by_dict(self: Plottable, filter_dict: dict) -> Plottable: +def filter_nodes_by_dict(self: Plottable, filter_dict: dict, engine: EngineAbstract = EngineAbstract.AUTO) -> Plottable: """ filter nodes to those that match all values in filter_dict """ - nodes2 = filter_by_dict(self._nodes, filter_dict) + nodes2 = filter_by_dict(self._nodes, filter_dict, engine) return self.nodes(nodes2) -def filter_edges_by_dict(self: Plottable, filter_dict: dict) -> Plottable: +def filter_edges_by_dict(self: Plottable, filter_dict: dict, engine: EngineAbstract = EngineAbstract.AUTO) -> Plottable: """ filter edges to those that match all values in filter_dict """ - edges2 = filter_by_dict(self._edges, filter_dict) + edges2 = filter_by_dict(self._edges, filter_dict, engine) return self.edges(edges2) diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index a94725b9bf..4289fda6cc 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -1,22 +1,28 @@ import logging -from typing import List, Optional +from typing import Any, List, Optional, TYPE_CHECKING import pandas as pd +from graphistry.Engine import Engine, EngineAbstract, df_concat, df_cons, df_to_engine, resolve_engine from graphistry.Plottable import Plottable from graphistry.util import setup_logger from .filter_by_dict import filter_by_dict + logger = setup_logger(__name__) +if TYPE_CHECKING: + DataFrameT = pd.DataFrame +else: + DataFrameT = Any -def query_if_not_none(query: Optional[str], df: pd.DataFrame) -> pd.DataFrame: +def query_if_not_none(query: Optional[str], df: DataFrameT) -> DataFrameT: if query is None: return df return df.query(query) def hop(self: Plottable, - nodes: Optional[pd.DataFrame] = None, # chain: incoming wavefront + nodes: Optional[DataFrameT] = None, # chain: incoming wavefront hops: Optional[int] = 1, to_fixed_point: bool = False, direction: str = 'forward', @@ -27,7 +33,8 @@ def hop(self: Plottable, destination_node_query: Optional[str] = None, edge_query: Optional[str] = None, return_as_wave_front = False, - target_wave_front: Optional[pd.DataFrame] = None # chain: limit hits to these for reverse pass + target_wave_front: Optional[DataFrameT] = None, # chain: limit hits to these for reverse pass + engine: EngineAbstract = EngineAbstract.AUTO ) -> Plottable: """ Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources @@ -45,6 +52,7 @@ def hop(self: Plottable, edge_query: dataframe query to match edges before hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use) target_wave_front: Only consider these nodes for reachability, and for intermediate hops, also consider nodes (primarily for internal use by reverse pass) + engine: 'auto', 'pandas', 'cudf' """ """ @@ -55,6 +63,14 @@ def hop(self: Plottable, """ + engine_concrete = resolve_engine(engine, self) + if not TYPE_CHECKING: + DataFrameT = df_cons(engine_concrete) + concat = df_concat(engine_concrete) + + nodes = df_to_engine(nodes, engine_concrete) if nodes is not None else None + target_wave_front = df_to_engine(target_wave_front, engine_concrete) if target_wave_front is not None else None + #TODO target_wave_front code also includes nodes for handling intermediate hops # ... better to make an explicit param of allowed intermediates? (vs recording each intermediate hop) @@ -77,6 +93,8 @@ def hop(self: Plottable, logger.debug('edge_query: %s', edge_query) logger.debug('return_as_wave_front: %s', return_as_wave_front) logger.debug('target_wave_front:\n%s', target_wave_front) + logger.debug('engine: %s', engine) + logger.debug('engine_concrete: %s', engine_concrete) logger.debug('---------------------') if not to_fixed_point and not isinstance(hops, int): @@ -91,7 +109,8 @@ def hop(self: Plottable, if destination_node_match == {}: destination_node_match = None - g2 = self.materialize_nodes() + g2 = self.materialize_nodes(engine=EngineAbstract(engine_concrete.value)) + logger.debug('materialized node/eddge types: %s, %s', type(g2._nodes), type(g2._edges)) starting_nodes = nodes if nodes is not None else g2._nodes @@ -145,7 +164,7 @@ def hop(self: Plottable, hops_remaining = hops_remaining - 1 assert len(wave_front.columns) == 1, "just indexes" - wave_front_iter : pd.DataFrame = query_if_not_none( + wave_front_iter : DataFrameT = query_if_not_none( source_node_query, filter_by_dict( starting_nodes @@ -173,7 +192,7 @@ def hop(self: Plottable, if target_wave_front is not None: assert nodes is not None, "target_wave_front indicates nodes" if hops_remaining: - intermediate_target_wave_front = pd.concat([ + intermediate_target_wave_front = concat([ target_wave_front[[g2._node]], nodes[[g2._node]] ], sort=False, ignore_index=True @@ -222,7 +241,7 @@ def hop(self: Plottable, if target_wave_front is not None: assert nodes is not None, "target_wave_front indicates nodes" if hops_remaining: - intermediate_target_wave_front = pd.concat([ + intermediate_target_wave_front = concat([ target_wave_front[[g2._node]], nodes[[g2._node]] ], sort=False, ignore_index=True @@ -258,15 +277,15 @@ def hop(self: Plottable, logger.debug('hop_edges_reverse:\n%s', hop_edges_reverse) logger.debug('new_node_ids_reverse:\n%s', new_node_ids_reverse) - mt : List[pd.DataFrame] = [] # help mypy + mt : List[DataFrameT] = [] # help mypy - matches_edges = pd.concat( + matches_edges = concat( [ matches_edges ] + ([ hop_edges_forward[[ EDGE_ID ]] ] if hop_edges_forward is not None else mt) # noqa: W503 + ([ hop_edges_reverse[[ EDGE_ID ]] ] if hop_edges_reverse is not None else mt), # noqa: W503 ignore_index=True, sort=False).drop_duplicates(subset=[EDGE_ID]) - new_node_ids = pd.concat( + new_node_ids = concat( mt + ( [ new_node_ids_forward ] if new_node_ids_forward is not None else mt ) # noqa: W503 + ( [ new_node_ids_reverse] if new_node_ids_reverse is not None else mt ), # noqa: W503 @@ -284,7 +303,7 @@ def hop(self: Plottable, if return_as_wave_front: matches_nodes = new_node_ids[:0] else: - matches_nodes = pd.concat( + matches_nodes = concat( mt + ( [hop_edges_forward[[g2._source]].rename(columns={g2._source: g2._node}).drop_duplicates()] # noqa: W503 if hop_edges_forward is not None @@ -298,7 +317,7 @@ def hop(self: Plottable, logger.debug('~~~~~~~~~~ LOOP STEP MERGES 2 ~~~~~~~~~~~') logger.debug('matches_edges:\n%s', matches_edges) - combined_node_ids = pd.concat([matches_nodes, new_node_ids], ignore_index=True, sort=False).drop_duplicates() + combined_node_ids = concat([matches_nodes, new_node_ids], ignore_index=True, sort=False).drop_duplicates() if len(combined_node_ids) == len(matches_nodes): #fixedpoint, exit early: future will come to same spot! diff --git a/graphistry/compute/predicates/ASTPredicate.py b/graphistry/compute/predicates/ASTPredicate.py index 0d9b300223..5c03121150 100644 --- a/graphistry/compute/predicates/ASTPredicate.py +++ b/graphistry/compute/predicates/ASTPredicate.py @@ -1,9 +1,16 @@ from abc import abstractmethod import pandas as pd +from typing import Any, TYPE_CHECKING from graphistry.compute.ASTSerializable import ASTSerializable +if TYPE_CHECKING: + SeriesT = pd.Series +else: + SeriesT = Any + + class ASTPredicate(ASTSerializable): """ Internal, not intended for use outside of this module. @@ -11,5 +18,5 @@ class ASTPredicate(ASTSerializable): """ @abstractmethod - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: raise NotImplementedError() diff --git a/graphistry/compute/predicates/categorical.py b/graphistry/compute/predicates/categorical.py index 9d0d0ccb9d..840daa0d9a 100644 --- a/graphistry/compute/predicates/categorical.py +++ b/graphistry/compute/predicates/categorical.py @@ -1,13 +1,20 @@ +from typing import Any, TYPE_CHECKING from typing_extensions import Literal import pandas as pd from .ASTPredicate import ASTPredicate + +if TYPE_CHECKING: + SeriesT = pd.Series +else: + SeriesT = Any + class Duplicated(ASTPredicate): def __init__(self, keep: Literal['first', 'last', False] = 'first') -> None: self.keep = keep - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.duplicated(keep=self.keep) def validate(self) -> None: diff --git a/graphistry/compute/predicates/is_in.py b/graphistry/compute/predicates/is_in.py index 4803124d78..9735a11dfe 100644 --- a/graphistry/compute/predicates/is_in.py +++ b/graphistry/compute/predicates/is_in.py @@ -1,16 +1,21 @@ -from typing import Any, List +from typing import TYPE_CHECKING, Any, List import pandas as pd from graphistry.utils.json import assert_json_serializable - from .ASTPredicate import ASTPredicate +if TYPE_CHECKING: + SeriesT = pd.Series +else: + SeriesT = Any + + class IsIn(ASTPredicate): def __init__(self, options: List[Any]) -> None: self.options = options - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.isin(self.options) def validate(self) -> None: diff --git a/graphistry/compute/predicates/numeric.py b/graphistry/compute/predicates/numeric.py index 826996214f..e558e5111a 100644 --- a/graphistry/compute/predicates/numeric.py +++ b/graphistry/compute/predicates/numeric.py @@ -1,9 +1,15 @@ -from typing import Union +from typing import Any, TYPE_CHECKING, Union import pandas as pd from .ASTPredicate import ASTPredicate +if TYPE_CHECKING: + SeriesT = pd.Series +else: + SeriesT = Any + + class NumericASTPredicate(ASTPredicate): def __init__(self, val: Union[int, float]) -> None: self.val = val @@ -17,7 +23,7 @@ class GT(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s > self.val def gt(val: float) -> GT: @@ -30,7 +36,7 @@ class LT(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s < self.val def lt(val: float) -> LT: @@ -43,7 +49,7 @@ class GE(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s >= self.val def ge(val: float) -> GE: @@ -56,7 +62,7 @@ class LE(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s <= self.val def le(val: float) -> LE: @@ -69,7 +75,7 @@ class EQ(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s == self.val def eq(val: float) -> EQ: @@ -82,7 +88,7 @@ class NE(NumericASTPredicate): def __init__(self, val: float) -> None: self.val = val - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s != self.val def ne(val: float) -> NE: @@ -97,7 +103,7 @@ def __init__(self, lower: float, upper: float, inclusive: bool = True) -> None: self.upper = upper self.inclusive = inclusive - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: if self.inclusive: return (s >= self.lower) & (s <= self.upper) else: @@ -115,7 +121,7 @@ def between(lower: float, upper: float, inclusive: bool = True) -> Between: return Between(lower, upper, inclusive) class IsNA(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.isna() def isna() -> IsNA: @@ -126,7 +132,7 @@ def isna() -> IsNA: class NotNA(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.notna() def notna() -> NotNA: diff --git a/graphistry/compute/predicates/str.py b/graphistry/compute/predicates/str.py index 43091d630a..af5e0501d3 100644 --- a/graphistry/compute/predicates/str.py +++ b/graphistry/compute/predicates/str.py @@ -1,9 +1,15 @@ -from typing import Optional +from typing import Any, TYPE_CHECKING, Optional import pandas as pd from .ASTPredicate import ASTPredicate +if TYPE_CHECKING: + SeriesT = pd.Series +else: + SeriesT = Any + + class Contains(ASTPredicate): def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None, regex: bool = True) -> None: self.pat = pat @@ -12,7 +18,7 @@ def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[boo self.na = na self.regex = regex - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.contains(self.pat, self.case, self.flags, self.na, self.regex) def validate(self) -> None: @@ -34,7 +40,7 @@ def __init__(self, pat: str, na: Optional[str] = None) -> None: self.pat = pat self.na = na - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.startswith(self.pat, self.na) def validate(self) -> None: @@ -52,7 +58,7 @@ def __init__(self, pat: str, na: Optional[str] = None) -> None: self.pat = pat self.na = na - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: """ Return whether a given pattern is at the end of a string """ @@ -72,7 +78,7 @@ def __init__(self, pat: str, case: bool = True, flags: int = 0, na: Optional[boo self.flags = flags self.na = na - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.match(self.pat, self.case, self.flags, self.na) def validate(self) -> None: @@ -89,7 +95,7 @@ def match(pat: str, case: bool = True, flags: int = 0, na: Optional[bool] = None class IsNumeric(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isnumeric() def isnumeric() -> IsNumeric: @@ -100,7 +106,7 @@ def isnumeric() -> IsNumeric: class IsAlpha(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isalpha() def isalpha() -> IsAlpha: @@ -111,7 +117,7 @@ def isalpha() -> IsAlpha: class IsDigit(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isdigit() def isdigit() -> IsDigit: @@ -122,7 +128,7 @@ def isdigit() -> IsDigit: class IsLower(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.islower() def islower() -> IsLower: @@ -133,7 +139,7 @@ def islower() -> IsLower: class IsUpper(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isupper() def isupper() -> IsUpper: @@ -144,7 +150,7 @@ def isupper() -> IsUpper: class IsSpace(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isspace() def isspace() -> IsSpace: @@ -155,7 +161,7 @@ def isspace() -> IsSpace: class IsAlnum(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isalnum() def isalnum() -> IsAlnum: @@ -166,7 +172,7 @@ def isalnum() -> IsAlnum: class IsDecimal(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.isdecimal() def isdecimal() -> IsDecimal: @@ -177,7 +183,7 @@ def isdecimal() -> IsDecimal: class IsTitle(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.str.istitle() def istitle() -> IsTitle: @@ -188,7 +194,7 @@ def istitle() -> IsTitle: class IsNull(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.isnull() def isnull() -> IsNull: @@ -199,7 +205,7 @@ def isnull() -> IsNull: class NotNull(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.notnull() def notnull() -> NotNull: diff --git a/graphistry/compute/predicates/temporal.py b/graphistry/compute/predicates/temporal.py index 3858478ede..740baf2e27 100644 --- a/graphistry/compute/predicates/temporal.py +++ b/graphistry/compute/predicates/temporal.py @@ -1,11 +1,18 @@ -from typing import Optional +from typing import Any, TYPE_CHECKING, Optional import pandas as pd from .ASTPredicate import ASTPredicate + +if TYPE_CHECKING: + SeriesT = pd.Series +else: + SeriesT = Any + + class IsMonthStart(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_month_start def is_month_start() -> IsMonthStart: @@ -16,7 +23,7 @@ def is_month_start() -> IsMonthStart: class IsMonthEnd(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_month_end def is_month_end() -> IsMonthEnd: @@ -27,7 +34,7 @@ def is_month_end() -> IsMonthEnd: class IsQuarterStart(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_quarter_start def is_quarter_start() -> IsQuarterStart: @@ -38,7 +45,7 @@ def is_quarter_start() -> IsQuarterStart: class IsQuarterEnd(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_quarter_end def is_quarter_end() -> IsQuarterEnd: @@ -49,7 +56,7 @@ def is_quarter_end() -> IsQuarterEnd: class IsYearStart(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_year_start def is_year_start() -> IsYearStart: @@ -60,7 +67,7 @@ def is_year_start() -> IsYearStart: class IsYearEnd(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_year_end def is_year_end() -> IsYearEnd: @@ -71,7 +78,7 @@ def is_year_end() -> IsYearEnd: class IsLeapYear(ASTPredicate): - def __call__(self, s: pd.Series) -> pd.Series: + def __call__(self, s: SeriesT) -> SeriesT: return s.dt.is_leap_year def is_leap_year() -> IsLeapYear: diff --git a/graphistry/plugins/cugraph.py b/graphistry/plugins/cugraph.py index f1e3857fb2..ed9bb08515 100644 --- a/graphistry/plugins/cugraph.py +++ b/graphistry/plugins/cugraph.py @@ -1,7 +1,7 @@ import pandas as pd from typing import Any, Dict, List, Optional, Union from graphistry.constants import NODE -from graphistry.Engine import Engine +from graphistry.Engine import EngineAbstract from graphistry.Plottable import Plottable from graphistry.plugins_types import CuGraphKind from graphistry.util import setup_logger @@ -270,7 +270,7 @@ def compute_cugraph( out = getattr(cugraph, alg)(G, **params) if isinstance(out, tuple): out = out[0] - g = self.materialize_nodes(engine=Engine.CUDF) + g = self.materialize_nodes(engine=EngineAbstract.CUDF) if g._node != 'vertex': out = out.rename(columns={'vertex': g._node}) expected_cols = node_compute_algs_to_attr[alg] @@ -396,7 +396,7 @@ def layout_cugraph( import cugraph - g = self.materialize_nodes(engine=Engine.CUDF) + g = self.materialize_nodes(engine=EngineAbstract.CUDF) if layout not in layout_algs: raise ValueError('Unsupported algorithm: %s', layout) From 34aaacfea4ed0d782b8aed9d1bf443439e5abc1e Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 19:25:30 -0800 Subject: [PATCH 094/104] test(gfql cudf) --- graphistry/tests/compute/test_chain.py | 89 +++++++++++++++++++++++++ graphistry/tests/compute/test_hop.py | 90 ++++++++++++++++++++++++++ graphistry/tests/test_compute.py | 2 +- 3 files changed, 180 insertions(+), 1 deletion(-) create mode 100644 graphistry/tests/compute/test_hop.py diff --git a/graphistry/tests/compute/test_chain.py b/graphistry/tests/compute/test_chain.py index 30760d6b29..c685edb84c 100644 --- a/graphistry/tests/compute/test_chain.py +++ b/graphistry/tests/compute/test_chain.py @@ -1,5 +1,12 @@ +import os +import pandas as pd +from graphistry.compute.predicates.is_in import is_in +import pytest + from graphistry.compute.ast import ASTNode, ASTEdge, n, e from graphistry.compute.chain import Chain +from graphistry.tests.test_compute import CGFull + def test_chain_serialization_mt(): o = Chain([]).to_json() @@ -36,3 +43,85 @@ def test_chain_serialization_multi(): assert d.chain[1]._name == 'abc' o2 = d.to_json() assert o == o2 + +def test_chain_simple_cudf_pd(): + nodes_df = pd.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_df = pd.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst') + #g_nodes = g.chain([n()]) + #assert isinstance(g_nodes._nodes, pd.DataFrame) + #assert len(g_nodes._nodes) == 3 + g_edges = g.chain([e()]) + assert isinstance(g_edges._edges, pd.DataFrame) + assert len(g_edges._edges) == 3 + + +@pytest.mark.skipif( + not ("TEST_CUDF" in os.environ and os.environ["TEST_CUDF"] == "1"), + reason="cudf tests need TEST_CUDF=1", +) +def test_chain_simple_cudf(): + import cudf + nodes_gdf = cudf.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_gdf = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_gdf, 'id').edges(edges_gdf, 'src', 'dst') + g_nodes = g.chain([n()]) + assert isinstance(g_nodes._nodes, cudf.DataFrame) + assert len(g_nodes._nodes) == 3 + g_edges = g.chain([e()]) + assert isinstance(g_edges._edges, cudf.DataFrame) + assert len(g_edges._edges) == 3 + +def test_chain_kv_cudf_pd(): + nodes_df = pd.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_df = pd.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst') + g_nodes = g.chain([n({'id': 0})]) + assert isinstance(g_nodes._nodes, pd.DataFrame) + assert len(g_nodes._nodes) == 1 + g_edges = g.chain([e({'src': 0})]) + assert isinstance(g_edges._edges, pd.DataFrame) + assert len(g_edges._edges) == 1 + +@pytest.mark.skipif( + not ("TEST_CUDF" in os.environ and os.environ["TEST_CUDF"] == "1"), + reason="cudf tests need TEST_CUDF=1", +) +def test_chain_kv_cudf(): + import cudf + nodes_gdf = cudf.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_gdf = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_gdf, 'id').edges(edges_gdf, 'src', 'dst') + g_nodes = g.chain([n({'id': 0})]) + assert isinstance(g_nodes._nodes, cudf.DataFrame) + assert len(g_nodes._nodes) == 1 + g_edges = g.chain([e({'src': 0})]) + assert isinstance(g_edges._edges, cudf.DataFrame) + assert len(g_edges._edges) == 1 + +def test_chain_pred_cudf_pd(): + nodes_df = pd.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_df = pd.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst') + g_nodes = g.chain([n({'id': is_in([0])})]) + assert isinstance(g_nodes._nodes, pd.DataFrame) + assert len(g_nodes._nodes) == 1 + g_edges = g.chain([e({'src': is_in([0])})]) + assert isinstance(g_edges._edges, pd.DataFrame) + assert len(g_edges._edges) == 1 + +@pytest.mark.skipif( + not ("TEST_CUDF" in os.environ and os.environ["TEST_CUDF"] == "1"), + reason="cudf tests need TEST_CUDF=1", +) +def test_chain_pred_cudf(): + import cudf + nodes_gdf = cudf.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_gdf = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_gdf, 'id').edges(edges_gdf, 'src', 'dst') + g_nodes = g.chain([n({'id': is_in([0])})]) + assert isinstance(g_nodes._nodes, cudf.DataFrame) + assert len(g_nodes._nodes) == 1 + g_edges = g.chain([e({'src': is_in([0])})]) + assert isinstance(g_edges._edges, cudf.DataFrame) + assert len(g_edges._edges) == 1 diff --git a/graphistry/tests/compute/test_hop.py b/graphistry/tests/compute/test_hop.py new file mode 100644 index 0000000000..2960c58325 --- /dev/null +++ b/graphistry/tests/compute/test_hop.py @@ -0,0 +1,90 @@ +import os +import pandas as pd +from graphistry.compute.predicates.is_in import is_in +import pytest + +from graphistry.compute.ast import ASTNode, ASTEdge, n, e +from graphistry.tests.test_compute import CGFull + + +def test_hop_simple_cudf_pd(): + nodes_df = pd.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_df = pd.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst') + g_nodes = g.hop() + assert isinstance(g_nodes._nodes, pd.DataFrame) + assert len(g_nodes._nodes) == 3 + g_edges = g.hop() + assert isinstance(g_edges._edges, pd.DataFrame) + assert len(g_edges._edges) == 3 + + +@pytest.mark.skipif( + not ("TEST_CUDF" in os.environ and os.environ["TEST_CUDF"] == "1"), + reason="cudf tests need TEST_CUDF=1", +) +def test_hop_simple_cudf(): + import cudf + nodes_gdf = cudf.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_gdf = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_gdf, 'id').edges(edges_gdf, 'src', 'dst') + g_nodes = g.hop() + assert isinstance(g_nodes._nodes, cudf.DataFrame) + assert len(g_nodes._nodes) == 3 + g_edges = g.hop() + assert isinstance(g_edges._edges, cudf.DataFrame) + assert len(g_edges._edges) == 3 + +def test_hop_kv_cudf_pd(): + nodes_df = pd.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_df = pd.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst') + g_nodes = g.hop(source_node_match=({'id': 0})) + assert isinstance(g_nodes._nodes, pd.DataFrame) + assert len(g_nodes._nodes) == 2 + g_edges = g.hop(edge_match={'src': 0}) + assert isinstance(g_edges._edges, pd.DataFrame) + assert len(g_edges._edges) == 1 + +@pytest.mark.skipif( + not ("TEST_CUDF" in os.environ and os.environ["TEST_CUDF"] == "1"), + reason="cudf tests need TEST_CUDF=1", +) +def test_hop_kv_cudf(): + import cudf + nodes_gdf = cudf.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_gdf = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_gdf, 'id').edges(edges_gdf, 'src', 'dst') + g_nodes = g.hop(source_node_match={'id': 0}) + assert isinstance(g_nodes._nodes, cudf.DataFrame) + assert len(g_nodes._nodes) == 2 + g_edges = g.hop(edge_match={'src': 0}) + assert isinstance(g_edges._edges, cudf.DataFrame) + assert len(g_edges._edges) == 1 + +def test_hop_pred_cudf_pd(): + nodes_df = pd.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_df = pd.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_df, 'id').edges(edges_df, 'src', 'dst') + g_nodes = g.hop(source_node_match={'id': is_in([0])}) + assert isinstance(g_nodes._nodes, pd.DataFrame) + assert len(g_nodes._nodes) == 2 + g_edges = g.hop(edge_match={'src': is_in([0])}) + assert isinstance(g_edges._edges, pd.DataFrame) + assert len(g_edges._edges) == 1 + +@pytest.mark.skipif( + not ("TEST_CUDF" in os.environ and os.environ["TEST_CUDF"] == "1"), + reason="cudf tests need TEST_CUDF=1", +) +def test_hop_pred_cudf(): + import cudf + nodes_gdf = cudf.DataFrame({'id': [0, 1, 2], 'label': ['a', 'b', 'c']}) + edges_gdf = cudf.DataFrame({'src': [0, 1, 2], 'dst': [1, 2, 0]}) + g = CGFull().nodes(nodes_gdf, 'id').edges(edges_gdf, 'src', 'dst') + g_nodes = g.hop(source_node_match={'id': is_in([0])}) + assert isinstance(g_nodes._nodes, cudf.DataFrame) + assert len(g_nodes._nodes) == 2 + g_edges = g.hop(edge_match={'src': is_in([0])}) + assert isinstance(g_edges._edges, cudf.DataFrame) + assert len(g_edges._edges) == 1 diff --git a/graphistry/tests/test_compute.py b/graphistry/tests/test_compute.py index 735f0c68a9..49d72d3b61 100644 --- a/graphistry/tests/test_compute.py +++ b/graphistry/tests/test_compute.py @@ -1,9 +1,9 @@ # -*- coding: utf-8 -*- import os, pandas as pd, pytest, unittest -from common import NoAuthTestCase from graphistry.compute import ComputeMixin from graphistry.plotter import PlotterBase +from .common import NoAuthTestCase class CG(ComputeMixin): From 0a5fb728ad48cc08102f4abbe9ac117d1166fa78 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 19:32:28 -0800 Subject: [PATCH 095/104] fix(test): paths --- graphistry/tests/test_compute.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/graphistry/tests/test_compute.py b/graphistry/tests/test_compute.py index 49d72d3b61..b328804a46 100644 --- a/graphistry/tests/test_compute.py +++ b/graphistry/tests/test_compute.py @@ -3,7 +3,7 @@ from graphistry.compute import ComputeMixin from graphistry.plotter import PlotterBase -from .common import NoAuthTestCase +from graphistry.tests.common import NoAuthTestCase class CG(ComputeMixin): From ead3316ea87ba5a6af01d4341f9ae209b5a02151 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 19:50:30 -0800 Subject: [PATCH 096/104] fix(graphistry): utils as module --- graphistry/utils/__init__.py | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 graphistry/utils/__init__.py diff --git a/graphistry/utils/__init__.py b/graphistry/utils/__init__.py new file mode 100644 index 0000000000..e69de29bb2 From 42a727aa2055c72d4661d1407991ea0b82273b13 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 21:43:59 -0800 Subject: [PATCH 097/104] fix(engineabstract): support str specs --- graphistry/Engine.py | 8 ++++++-- graphistry/Plottable.py | 2 +- graphistry/compute/ComputeMixin.py | 6 +++++- graphistry/compute/chain.py | 5 ++++- graphistry/compute/filter_by_dict.py | 11 +++++++---- graphistry/compute/hop.py | 7 +++++-- 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/graphistry/Engine.py b/graphistry/Engine.py index 8753aa71f7..8bc2bc2b1d 100644 --- a/graphistry/Engine.py +++ b/graphistry/Engine.py @@ -1,5 +1,5 @@ import pandas as pd -from typing import Any, Optional +from typing import Any, Optional, Union from enum import Enum @@ -34,9 +34,13 @@ def lazy_cudf_import_has_dependancy(): return False, e, None def resolve_engine( - engine: EngineAbstract, + engine: Union[EngineAbstract, str], g_or_df: Optional[Any] = None, ) -> Engine: + + if isinstance(engine, str): + engine = EngineAbstract(engine) + # if an Engine (concrete), just use that if engine != EngineAbstract.AUTO: return Engine(engine.value) diff --git a/graphistry/Plottable.py b/graphistry/Plottable.py index e7589e92ea..72ac7d8831 100644 --- a/graphistry/Plottable.py +++ b/graphistry/Plottable.py @@ -149,7 +149,7 @@ def get_degrees( raise RuntimeError('should not happen') return self - def materialize_nodes(self, reuse: bool = True, engine:EngineAbstract = EngineAbstract.AUTO) -> 'Plottable': + def materialize_nodes(self, reuse: bool = True, engine: Union[EngineAbstract, str] = EngineAbstract.AUTO) -> 'Plottable': if 1 + 1: raise RuntimeError('should not happen') return self diff --git a/graphistry/compute/ComputeMixin.py b/graphistry/compute/ComputeMixin.py index ea00ad34c8..6148b66c27 100644 --- a/graphistry/compute/ComputeMixin.py +++ b/graphistry/compute/ComputeMixin.py @@ -28,7 +28,7 @@ def __init__(self, *args, **kwargs): def materialize_nodes( self, reuse: bool = True, - engine: EngineAbstract = EngineAbstract.AUTO + engine: Union[EngineAbstract, str] = EngineAbstract.AUTO ) -> "Plottable": """ Generate g._nodes based on g._edges @@ -50,6 +50,10 @@ def materialize_nodes( print(g2._nodes) # pd.DataFrame """ + + if isinstance(engine, str): + engine = EngineAbstract(engine) + g = self if g._edges is None: raise ValueError("Missing edges") diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index de55955212..52a86957a3 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -142,7 +142,7 @@ def combine_steps(g: Plottable, kind: str, steps: List[Tuple[ASTObject,Plottable # ############################################################################### -def chain(self: Plottable, ops: Union[List[ASTObject], Chain], engine: EngineAbstract = EngineAbstract.AUTO) -> Plottable: +def chain(self: Plottable, ops: Union[List[ASTObject], Chain], engine: Union[EngineAbstract, str] = EngineAbstract.AUTO) -> Plottable: """ Chain a list of ASTObject (node/edge) traversal operations @@ -214,6 +214,9 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain], engine: EngineAbs """ + if isinstance(engine, str): + engine = EngineAbstract(engine) + if isinstance(ops, Chain): ops = ops.chain diff --git a/graphistry/compute/filter_by_dict.py b/graphistry/compute/filter_by_dict.py index ebcda6bf51..1fbbbf9477 100644 --- a/graphistry/compute/filter_by_dict.py +++ b/graphistry/compute/filter_by_dict.py @@ -1,4 +1,4 @@ -from typing import Any, Dict, Optional, TYPE_CHECKING +from typing import Any, Dict, Optional, TYPE_CHECKING, Union import pandas as pd from graphistry.Engine import EngineAbstract, df_to_engine, resolve_engine, s_cons from graphistry.util import setup_logger @@ -15,11 +15,14 @@ else: DataFrameT = Any -def filter_by_dict(df: DataFrameT, filter_dict: Optional[dict] = None, engine: EngineAbstract = EngineAbstract.AUTO) -> DataFrameT: +def filter_by_dict(df: DataFrameT, filter_dict: Optional[dict] = None, engine: Union[EngineAbstract, str] = EngineAbstract.AUTO) -> DataFrameT: """ return df where rows match all values in filter_dict """ + if isinstance(engine, str): + engine = EngineAbstract(engine) + if filter_dict is None or filter_dict == {}: return df @@ -50,7 +53,7 @@ def filter_by_dict(df: DataFrameT, filter_dict: Optional[dict] = None, engine: E return df[hits] -def filter_nodes_by_dict(self: Plottable, filter_dict: dict, engine: EngineAbstract = EngineAbstract.AUTO) -> Plottable: +def filter_nodes_by_dict(self: Plottable, filter_dict: dict, engine: Union[EngineAbstract, str] = EngineAbstract.AUTO) -> Plottable: """ filter nodes to those that match all values in filter_dict """ @@ -58,7 +61,7 @@ def filter_nodes_by_dict(self: Plottable, filter_dict: dict, engine: EngineAbstr return self.nodes(nodes2) -def filter_edges_by_dict(self: Plottable, filter_dict: dict, engine: EngineAbstract = EngineAbstract.AUTO) -> Plottable: +def filter_edges_by_dict(self: Plottable, filter_dict: dict, engine: Union[EngineAbstract, str] = EngineAbstract.AUTO) -> Plottable: """ filter edges to those that match all values in filter_dict """ diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 4289fda6cc..66c5414270 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -1,5 +1,5 @@ import logging -from typing import Any, List, Optional, TYPE_CHECKING +from typing import Any, List, Optional, TYPE_CHECKING, Union import pandas as pd from graphistry.Engine import Engine, EngineAbstract, df_concat, df_cons, df_to_engine, resolve_engine @@ -34,7 +34,7 @@ def hop(self: Plottable, edge_query: Optional[str] = None, return_as_wave_front = False, target_wave_front: Optional[DataFrameT] = None, # chain: limit hits to these for reverse pass - engine: EngineAbstract = EngineAbstract.AUTO + engine: Union[EngineAbstract, str] = EngineAbstract.AUTO ) -> Plottable: """ Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources @@ -63,6 +63,9 @@ def hop(self: Plottable, """ + if isinstance(engine, str): + engine = EngineAbstract(engine) + engine_concrete = resolve_engine(engine, self) if not TYPE_CHECKING: DataFrameT = df_cons(engine_concrete) From dcdda9210e696722e7c2f53efd286154c52450ff Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 23:31:51 -0800 Subject: [PATCH 098/104] fix(docs) --- docs/source/conf.py | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/conf.py b/docs/source/conf.py index 50ff684da7..c166e8ccee 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -86,6 +86,7 @@ ('py:class', 'graphistry.compute.predicates.temporal.IsYearStart'), ('py:class', 'graphistry.compute.predicates.temporal.IsYearEnd'), ('py:class', 'graphistry.Engine.Engine'), + ('py:class', 'graphistry.Engine.EngineAbstract'), ('py:class', 'graphistry.gremlin.CosmosMixin'), ('py:class', 'graphistry.gremlin.GremlinMixin'), ('py:class', 'graphistry.gremlin.NeptuneMixin'), From be7fe97d27ac51d66e0a80c39534892e1a13d200 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Sun, 24 Dec 2023 23:39:59 -0800 Subject: [PATCH 099/104] fix(docs): missing module --- docs/source/graphistry.rst | 9 ++++++++- docs/source/graphistry.utils.rst | 8 ++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) create mode 100644 docs/source/graphistry.utils.rst diff --git a/docs/source/graphistry.rst b/docs/source/graphistry.rst index 2fd55094a2..a6f87bc9db 100644 --- a/docs/source/graphistry.rst +++ b/docs/source/graphistry.rst @@ -15,7 +15,6 @@ Plugins graphistry.plugins - Compute ================== .. toctree:: @@ -33,6 +32,14 @@ Layouts graphistry.layout +Utilities +================== +.. toctree:: + :maxdepth: 3 + + graphistry.utils + + Featurize ================== .. automodule:: graphistry.feature_utils diff --git a/docs/source/graphistry.utils.rst b/docs/source/graphistry.utils.rst new file mode 100644 index 0000000000..037dd78676 --- /dev/null +++ b/docs/source/graphistry.utils.rst @@ -0,0 +1,8 @@ +Module contents +--------------- + +.. automodule:: graphistry.utils + :members: + :undoc-members: + :show-inheritance: + :noindex: From d409f5587a450626ea4a5a839e773ca65f8f02d4 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 25 Dec 2023 12:55:43 -0800 Subject: [PATCH 100/104] refactor(gfql): use DataFrameT --- graphistry/compute/__init__.py | 1 + graphistry/compute/ast.py | 8 +------- graphistry/compute/chain.py | 7 +------ graphistry/compute/filter_by_dict.py | 6 +----- graphistry/compute/hop.py | 5 +---- graphistry/compute/typing.py | 8 ++++++++ 6 files changed, 13 insertions(+), 22 deletions(-) create mode 100644 graphistry/compute/typing.py diff --git a/graphistry/compute/__init__.py b/graphistry/compute/__init__.py index 0bed507004..360b038992 100644 --- a/graphistry/compute/__init__.py +++ b/graphistry/compute/__init__.py @@ -46,3 +46,4 @@ isnull, IsNull, notnull, NotNull, ) +from .typing import DataFrameT diff --git a/graphistry/compute/ast.py b/graphistry/compute/ast.py index 83c1d9020a..c1e7b4e046 100644 --- a/graphistry/compute/ast.py +++ b/graphistry/compute/ast.py @@ -54,17 +54,11 @@ notnull, NotNull ) from .filter_by_dict import filter_by_dict +from .typing import DataFrameT logger = setup_logger(__name__) - -if TYPE_CHECKING: - DataFrameT = pd.DataFrame -else: - DataFrameT = Any - - ############################################################################## diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index 52a86957a3..ad5fb13bab 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -7,6 +7,7 @@ from graphistry.util import setup_logger from graphistry.utils.json import JSONVal from .ast import ASTObject, ASTNode, ASTEdge, from_json as ASTObject_from_json +from .typing import DataFrameT logger = setup_logger(__name__) @@ -14,12 +15,6 @@ ############################################################################### -if TYPE_CHECKING: - DataFrameT = pd.DataFrame -else: - DataFrameT = Any - - class Chain(ASTSerializable): def __init__(self, chain: List[ASTObject]) -> None: diff --git a/graphistry/compute/filter_by_dict.py b/graphistry/compute/filter_by_dict.py index 1fbbbf9477..ea6350d086 100644 --- a/graphistry/compute/filter_by_dict.py +++ b/graphistry/compute/filter_by_dict.py @@ -5,16 +5,12 @@ from graphistry.Plottable import Plottable from .predicates.ASTPredicate import ASTPredicate +from .typing import DataFrameT logger = setup_logger(__name__) -if TYPE_CHECKING: - DataFrameT = pd.DataFrame -else: - DataFrameT = Any - def filter_by_dict(df: DataFrameT, filter_dict: Optional[dict] = None, engine: Union[EngineAbstract, str] = EngineAbstract.AUTO) -> DataFrameT: """ return df where rows match all values in filter_dict diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index 66c5414270..a171dcca1b 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -6,14 +6,11 @@ from graphistry.Plottable import Plottable from graphistry.util import setup_logger from .filter_by_dict import filter_by_dict +from .typing import DataFrameT logger = setup_logger(__name__) -if TYPE_CHECKING: - DataFrameT = pd.DataFrame -else: - DataFrameT = Any def query_if_not_none(query: Optional[str], df: DataFrameT) -> DataFrameT: if query is None: diff --git a/graphistry/compute/typing.py b/graphistry/compute/typing.py new file mode 100644 index 0000000000..8ad1a80969 --- /dev/null +++ b/graphistry/compute/typing.py @@ -0,0 +1,8 @@ +import pandas as pd +from typing import Any, TYPE_CHECKING + +# TODO stubs for Union[cudf.DataFrame, dask.DataFrame, ..] at checking time +if TYPE_CHECKING: + DataFrameT = pd.DataFrame +else: + DataFrameT = Any From 547b968ddd1cad7bc896bb92ed9c4ae9c9c19423 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 25 Dec 2023 12:55:53 -0800 Subject: [PATCH 101/104] docs(changelog) --- CHANGELOG.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index c4a9c837d5..fd1fcab40b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,20 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +### Added + +* `AbstractEngine` to `engine.py::Engine` enum +* `compute.typing.DataFrameT` to centralize df-lib-agnostic type checking +* `chain`, `hop`, `filter_by_dict` variants support GPU execution + +### Refactor + +* GFQL and more of compute uses generic dataframe methods and threads through engine + +### Infra + +* GPU tester threads through LOG_LEVEL + ## [0.32.0 - 2023-12-22] ### Added From 72e778c5be6ac8a7b9edd02d6df89215db6e5b95 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Mon, 25 Dec 2023 13:14:01 -0800 Subject: [PATCH 102/104] docs(gfql): GPU --- README.md | 32 ++++++++++++++++++++++++++++---- graphistry/compute/chain.py | 24 ++++++++++++++++++++++++ graphistry/compute/hop.py | 6 +++++- 3 files changed, 57 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index be64eccaa9..eaca3805b5 100644 --- a/README.md +++ b/README.md @@ -11,9 +11,9 @@ [![Uptime Robot status](https://img.shields.io/uptimerobot/status/m787548531-e9c7b7508fc76fea927e2313?label=hub.graphistry.com)](https://status.graphistry.com/) [](https://join.slack.com/t/graphistry-community/shared_invite/zt-53ik36w2-fpP0Ibjbk7IJuVFIRSnr6g) [![Twitter Follow](https://img.shields.io/twitter/follow/graphistry)](https://twitter.com/graphistry) -**PyGraphistry is a Python visual graph AI library to extract, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** Installing with optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your `time to graph` for going from raw data to visualizations and AI models down to three lines of code. +**PyGraphistry is a dataframe-native Python visual graph AI library to extract, query, transform, analyze, model, and visualize big graphs, and especially alongside [Graphistry](https://www.graphistry.com) end-to-end GPU server sessions.** The GFQL query language supports running a large subset of the Cypher property graph query language without requiring external software and adds optional GPU acceleration. Installing PyGraphistry with the optional `graphistry[ai]` dependencies adds **graph autoML**, including automatic feature engineering, UMAP, and graph neural net support. Combined, PyGraphistry reduces your **time to graph** for going from raw data to visualizations and AI models down to three lines of code. -Graphistry gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors. +The optional visual engine, Graphistry, gets used on problems like visually mapping the behavior of devices and users, investigating fraud, analyzing machine learning results, and starting in graph AI. It provides point-and-click features like timebars, search, filtering, clustering, coloring, sharing, and more. Graphistry is the only tool built ground-up for large graphs. The client's custom WebGL rendering engine renders up to 8MM nodes + edges at a time, and most older client GPUs smoothly support somewhere between 100K and 2MM elements. The serverside GPU analytics engine supports even bigger graphs. It smoothes graph workflows over the PyData ecosystem including Pandas/Spark/Dask dataframes, Nvidia RAPIDS GPU dataframes & GPU graphs, DGL/PyTorch graph neural networks, and various data connectors. The PyGraphistry Python client helps several kinds of usage modes: @@ -147,14 +147,14 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: ```python from graphistry import n, e_undirected, is_in - g2 = g.chain([ + g2 = g1.chain([ n({'user': 'Biden'}), e_undirected(), n(name='bridge'), @@ -166,6 +166,17 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` + Enable GFQL's optional automatic GPU acceleration for 43X+ speedups: + + ```python + # Switch from Pandas CPU dataframes to RAPIDS GPU dataframes + import cudf + g2 = g1.edges(lambda g: cudf.DataFrame(g._edges)) + # GFQL will automaticallly run on a GPU + g3 = g2.chain([n(), e(hops=3), n()]) + g3.plot() + ``` + * [Spark](https://spark.apache.org/)/[Databricks](https://databricks.com/) ([ipynb demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.ipynb), [dbc demo](demos/demos_databases_apis/databricks_pyspark/graphistry-notebook-dashboard.dbc)) ```python @@ -1163,6 +1174,8 @@ Both `hop()` and `chain()` (GFQL) match dictionary expressions support dataframe * numeric: gt, lt, ge, le, eq, ne, between, isna, notna * string: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull +Both `hop()` and `chain()` will run on GPUs when passing in RAPIDS dataframes. Specify parameter `engine='cudf'` to be sure. + #### Table to graph ```python @@ -1327,6 +1340,17 @@ pattern2 = Chain.from_json(pattern_json) g.chain(pattern2).plot() ``` +Benefit from automatic GPU acceleration by passing in GPU dataframes: + +```python +import cudf + +g1 = graphistry.edges(cudf.read_csv('data.csv'), 's', 'd') +g2 = g1.chain(..., engine='cudf') +``` + +The parameter `engine` is optional, defaulting to `'auto'`. + #### Pipelining ```python diff --git a/graphistry/compute/chain.py b/graphistry/compute/chain.py index ad5fb13bab..20de5e83f7 100644 --- a/graphistry/compute/chain.py +++ b/graphistry/compute/chain.py @@ -146,6 +146,8 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain], engine: Union[Eng For direct calls, exposes convenience `List[ASTObject]`. Internal operational should prefer `Chain`. + Use `engine='cudf'` to force automatic GPU acceleration mode + :param ops: List[ASTObject] Various node and edge matchers :returns: Plotter @@ -206,6 +208,28 @@ def chain(self: Plottable, ops: Union[List[ASTObject], Chain], engine: Union[Eng n({"risk2": True}) ]) print('# hits:', len(g_risky._nodes[ g_risky._nodes.hit ])) + + **Example: Run with automatic GPU acceleration** + + :: + + import cudf + import graphistry + + e_gdf = cudf.from_pandas(df) + g1 = graphistry.edges(e_gdf, 's', 'd') + g2 = g1.chain([ ... ]) + + **Example: Run with automatic GPU acceleration, and force GPU mode** + + :: + + import cudf + import graphistry + + e_gdf = cudf.from_pandas(df) + g1 = graphistry.edges(e_gdf, 's', 'd') + g2 = g1.chain([ ... ], engine='cudf') """ diff --git a/graphistry/compute/hop.py b/graphistry/compute/hop.py index a171dcca1b..7d8425a690 100644 --- a/graphistry/compute/hop.py +++ b/graphistry/compute/hop.py @@ -36,6 +36,10 @@ def hop(self: Plottable, """ Given a graph and some source nodes, return subgraph of all paths within k-hops from the sources + This can be faster than the equivalent chain([...]) call that wraps it with additional steps + + See chain() examples for examples of many of the parameters + g: Plotter nodes: dataframe with id column matching g._node. None signifies all nodes (default). hops: consider paths of length 1 to 'hops' steps, if any (default 1). @@ -49,7 +53,7 @@ def hop(self: Plottable, edge_query: dataframe query to match edges before hopping (including intermediate) return_as_wave_front: Only return the nodes/edges reached, ignoring past ones (primarily for internal use) target_wave_front: Only consider these nodes for reachability, and for intermediate hops, also consider nodes (primarily for internal use by reverse pass) - engine: 'auto', 'pandas', 'cudf' + engine: 'auto', 'pandas', 'cudf' (GPU) """ """ From 5e818246d2ff24ad33e271e6fbdae2c1856464a6 Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 26 Dec 2023 12:56:20 -0800 Subject: [PATCH 103/104] docs(benchmark) --- README.md | 4 +- demos/gfql/benchmark_hops_cpu_gpu.ipynb | 4825 +++++++++++++++++++++++ 2 files changed, 4827 insertions(+), 2 deletions(-) create mode 100644 demos/gfql/benchmark_hops_cpu_gpu.ipynb diff --git a/README.md b/README.md index eaca3805b5..b4dcae3167 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: @@ -1248,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java -See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb) Traverse within a graph, or expand one graph against another diff --git a/demos/gfql/benchmark_hops_cpu_gpu.ipynb b/demos/gfql/benchmark_hops_cpu_gpu.ipynb new file mode 100644 index 0000000000..bf17b630e7 --- /dev/null +++ b/demos/gfql/benchmark_hops_cpu_gpu.ipynb @@ -0,0 +1,4825 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "gpuType": "T4" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# GFQL CPU, GPU Benchmark\n", + "\n", + "This notebook examines GFQL progerty graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", + "\n", + "The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).\n", + "\n", + "## Data\n", + "From [SNAP](https://snap.stanford.edu/data/)\n", + "\n", + "| Network | Nodes | Edges |\n", + "|-------------|-----------|--------------|\n", + "| **Facebook**| 4,039 | 88,234 |\n", + "| **Twitter** | 81,306 | 2,420,766 |\n", + "| **GPlus** | 107,614 | 30,494,866 |\n", + "| **Orkut** | 3,072,441 | 117,185,082 |\n", + "\n", + "## Results\n", + "\n", + "Definitions:\n", + "\n", + "* GTEPS: Giga (billion) edges traversed per second\n", + "\n", + "* T edges / \\$: Estimated trillion edges traversed for 1\\$ USD based on observed GTEPS and a 3yr AWS reservation (as of 12/2023)\n", + "\n", + "Tasks:\n", + "\n", + "1. `chain()` - includes complex pre/post processing\n", + "\n", + " **Task**: `g.chain([n({'id': some_id}), e_forward(hops=some_n)])`\n", + "\n", + "\n", + "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", + "|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|\n", + "| **Facebook**| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", + "| **Twitter** | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", + "| **GPlus** | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", + "| **Orkut** | N/A | N/A | 12.15 | N/A | 208.3 |\n", + "| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0\n", + "| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3\n", + "\n", + "\n", + "2. `hop()` - core property search primitive similar to BFS\n", + "\n", + " **Task**: `g.hop(nodes=[some_id], direction='forward', hops=some_n)`\n", + "\n", + "\n", + "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", + "|-------------|-------------|-----------|-----------|--------------------|--------------------------------|\n", + "| **Facebook**| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", + "| **Twitter** | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", + "| **GPlus** | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", + "| **Orkut** | N/A | N/A | 41.50 | N/A | 711.4 |\n", + "| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8\n", + "| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4\n" + ], + "metadata": { + "id": "GZxoiU8sQDk_" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Optional: GPU setup - Google Colab" + ], + "metadata": { + "id": "SAj8lhREEOwS" + } + }, + { + "cell_type": "markdown", + "source": [], + "metadata": { + "id": "4hrEEAAm7DTO" + } + }, + { + "cell_type": "code", + "source": [ + "# Report GPU used when GPU benchmarking\n", + "! nvidia-smi" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "W2MF6ZsjDv3B", + "outputId": "46088cbc-2db9-4529-f724-dc57ed85dfb7" + }, + "execution_count": 1, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:50:30 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 54C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "| No running processes found |\n", + "+---------------------------------------------------------------------------------------+\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "# if in google colab\n", + "!git clone https://github.com/rapidsai/rapidsai-csp-utils.git\n", + "!python rapidsai-csp-utils/colab/pip-install.py" + ], + "metadata": { + "id": "Aikh0x4ID_wK" + }, + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "import cudf\n", + "cudf.__version__" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "Lwekdei1dH3N", + "outputId": "71f5b01d-7917-4283-8338-969167d6e1e8" + }, + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'23.12.01'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 1. Install & configure" + ], + "metadata": { + "id": "QQpsrtwBT7sa" + } + }, + { + "cell_type": "code", + "source": [ + "#! pip install graphistry[igraph]\n", + "\n", + "!pip install -q igraph\n", + "#!pip install -q git+https://github.com/graphistry/pygraphistry.git@dev/cugfql\n", + "!pip install -q graphistry\n" + ], + "metadata": { + "id": "cYjRbgkU9Sx8", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2cf25531-9b8b-4715-ccc7-e79094d84ebd" + }, + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Imports" + ], + "metadata": { + "id": "Ff6Tt9DhkePl" + } + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "import graphistry\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "graphistry.__version__" + ], + "metadata": { + "id": "S5_y0CbLkjft", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "a68a9c4b-c9c5-4b8b-ea4f-7bf1e4ddf315" + }, + "execution_count": 3, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'0.32.0+12.g72e778c'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 3 + } + ] + }, + { + "cell_type": "code", + "source": [ + "import cudf" + ], + "metadata": { + "id": "I7Fg75jsG4co" + }, + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "#work around google colab shell encoding bugs\n", + "\n", + "import locale\n", + "locale.getpreferredencoding = lambda: \"UTF-8\"" + ], + "metadata": { + "id": "uLZKph2-a5M4" + }, + "execution_count": 7, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# 2. Perf benchmarks" + ], + "metadata": { + "id": "eU9SyauNUHtR" + } + }, + { + "cell_type": "markdown", + "source": [ + "### Facebook: 88K edges" + ], + "metadata": { + "id": "NA0Ym11fkB8j" + } + }, + { + "cell_type": "code", + "source": [ + "df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])\n", + "print(df.shape)\n", + "df.head(5)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + }, + "id": "vXuQogHekClJ", + "outputId": "64db92c0-2704-438b-d0e4-25865acbb5e9" + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(88234, 2)\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " s d\n", + "0 0 1\n", + "1 0 2\n", + "2 0 3\n", + "3 0 4\n", + "4 0 5" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sd
001
102
203
304
405
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ] + }, + { + "cell_type": "code", + "source": [ + "fg = graphistry.edges(df, 's', 'd').materialize_nodes()\n", + "print(fg._nodes.shape, fg._edges.shape)\n", + "fg._nodes.head(5)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + }, + "id": "jEma7hvvkzkN", + "outputId": "dbf21342-6b80-429c-bd3f-b1494c6854c7" + }, + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(4039, 1) (88234, 2)\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 0\n", + "1 1\n", + "2 2\n", + "3 3\n", + "4 4" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
00
11
22
33
44
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "for i in range(100):\n", + " fg2 = fg.chain([n({'id': 0}), e_forward(hops=2)])" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5lEdCBw9lzd7", + "outputId": "ed7451e0-401e-4edc-c8de-79c5afd0c95b" + }, + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 13.6 s, sys: 2.08 s, total: 15.7 s\n", + "Wall time: 18 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "fg_gdf = fg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(100):\n", + " fg2 = fg_gdf.chain([n({'id': 0}), e_forward(hops=2)])\n", + "print(fg._nodes.shape, fg._edges.shape)\n", + "print(fg2._nodes.shape, fg2._edges.shape)\n", + "del fg_gdf\n", + "del fg2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JFKIBa8mJCvJ", + "outputId": "c22022f0-b33d-483a-db64-29992c5161e8" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(4039, 1) (88234, 2)\n", + "(1519, 1) (4060, 2)\n", + "CPU times: user 11.8 s, sys: 28.1 ms, total: 11.8 s\n", + "Wall time: 11.9 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "for i in range(50):\n", + " fg2 = fg.chain([n({'id': 0}), e_forward(hops=5)])\n", + "print(fg._nodes.shape, fg._edges.shape)\n", + "print(fg2._nodes.shape, fg2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-KBGLexek5tS", + "outputId": "2f462e6c-578a-4fa1-ec29-91bae753f4c5" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(4039, 1) (88234, 2)\n", + "(3829, 1) (86074, 2)\n", + "CPU times: user 15.4 s, sys: 808 ms, total: 16.2 s\n", + "Wall time: 16.2 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "fg_gdf = fg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(50):\n", + " fg2 = fg_gdf.chain([n({'id': 0}), e_forward(hops=5)])\n", + "print(fg._nodes.shape, fg._edges.shape)\n", + "print(fg2._nodes.shape, fg2._edges.shape)\n", + "del fg_gdf\n", + "del fg2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CVpcbhpdHFEF", + "outputId": "aba04ee1-781e-4226-b593-b42415a55fc4" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(4039, 1) (88234, 2)\n", + "(3829, 1) (86074, 2)\n", + "CPU times: user 9.82 s, sys: 133 ms, total: 9.95 s\n", + "Wall time: 10.1 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "for i in range(100):\n", + " fg2 = fg.chain([e_forward(source_node_match={'id': 0}, hops=5)])" + ], + "metadata": { + "id": "1cFIyJF9pLjE", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "107329af-8e4b-428c-8b03-77ed00bdf5bf" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 11.8 s, sys: 377 ms, total: 12.1 s\n", + "Wall time: 13.1 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "fg_gdf = fg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(100):\n", + " fg2 = fg_gdf.chain([e_forward(source_node_match={'id': 0}, hops=5)])\n", + "print(fg._nodes.shape, fg._edges.shape)\n", + "print(fg2._nodes.shape, fg2._edges.shape)\n", + "del fg_gdf\n", + "del fg2\n", + "\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "M5uRiD6uJVNW", + "outputId": "5e938a19-2992-4280-80c2-784382d40113" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(4039, 1) (88234, 2)\n", + "(348, 1) (347, 2)\n", + "CPU times: user 14.1 s, sys: 48.5 ms, total: 14.2 s\n", + "Wall time: 14.2 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({fg._node: [0]})\n", + "for i in range(100):\n", + " fg2 = fg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "print(fg2._nodes.shape, fg2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Y9vgzfT69x41", + "outputId": "6882c1ce-0df8-4087-dda4-0a105a8617e1" + }, + "execution_count": 17, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(1519, 1) (4060, 2)\n", + "CPU times: user 4.5 s, sys: 1.35 s, total: 5.85 s\n", + "Wall time: 6.09 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({fg._node: [0]})\n", + "fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))\n", + "for i in range(100):\n", + " fg2 = fg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "print(fg2._nodes.shape, fg2._edges.shape)\n", + "del start_nodes\n", + "del fg_gdf\n", + "del fg2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "c7ybJqjc-T31", + "outputId": "37ccc1fb-6460-4193-8aa7-22837ff06d0a" + }, + "execution_count": 18, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(1519, 1) (4060, 2)\n", + "CPU times: user 2.58 s, sys: 6.75 ms, total: 2.59 s\n", + "Wall time: 2.58 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({fg._node: [0]})\n", + "for i in range(100):\n", + " fg2 = fg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + "print(fg2._nodes.shape, fg2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Dy7a4zDZ-7_G", + "outputId": "077b5d9c-c9ae-411a-8228-3c026b07a910" + }, + "execution_count": 19, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(3829, 1) (86074, 2)\n", + "CPU times: user 13.2 s, sys: 2 s, total: 15.2 s\n", + "Wall time: 18.3 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({fg._node: [0]})\n", + "fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))\n", + "for i in range(100):\n", + " fg2 = fg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + "print(fg2._nodes.shape, fg2._edges.shape)\n", + "del start_nodes\n", + "del fg_gdf\n", + "del fg2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "N5aUtF1a--ML", + "outputId": "0c2b67b8-fac6-45b3-dfbe-8002b5506e91" + }, + "execution_count": 20, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(3829, 1) (86074, 2)\n", + "CPU times: user 5.72 s, sys: 159 ms, total: 5.88 s\n", + "Wall time: 5.86 s\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Twitter\n", + "\n", + "- edges: 2420766\n", + "- nodes: 81306" + ], + "metadata": { + "id": "KrJKjXy2KLos" + } + }, + { + "cell_type": "code", + "source": [ + "! wget 'https://snap.stanford.edu/data/twitter_combined.txt.gz'\n", + "#! curl -L 'https://snap.stanford.edu/data/twitter_combined.txt.gz' -o twitter_combined.txt.gz" + ], + "metadata": { + "id": "fO2qasGqpubr", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d41a110e-9f7c-4710-9ce3-3f4906ab02ae" + }, + "execution_count": 21, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2023-12-25 21:58:27-- https://snap.stanford.edu/data/twitter_combined.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 10621918 (10M) [application/x-gzip]\n", + "Saving to: ‘twitter_combined.txt.gz’\n", + "\n", + "twitter_combined.tx 100%[===================>] 10.13M 3.00MB/s in 4.0s \n", + "\n", + "2023-12-25 21:58:32 (2.52 MB/s) - ‘twitter_combined.txt.gz’ saved [10621918/10621918]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "! gunzip twitter_combined.txt.gz" + ], + "metadata": { + "id": "fn7zeA3SGlEo" + }, + "execution_count": 22, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "! head -n 5 twitter_combined.txt" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "68TAZkhLGz9g", + "outputId": "8ba7c23d-267f-4b59-d6c6-b3f66caec9cf" + }, + "execution_count": 24, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "214328887 34428380\n", + "17116707 28465635\n", + "380580781 18996905\n", + "221036078 153460275\n", + "107830991 17868918\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "te_df = pd.read_csv('twitter_combined.txt', sep=' ', names=['s', 'd'])\n", + "te_df.shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QU2wNeGXG2GC", + "outputId": "349ac9c0-6f6c-4ce6-fec0-8bae75fca635" + }, + "execution_count": 25, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 474 ms, sys: 61.9 ms, total: 536 ms\n", + "Wall time: 534 ms\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(2420766, 2)" + ] + }, + "metadata": {}, + "execution_count": 25 + } + ] + }, + { + "cell_type": "code", + "source": [ + "import graphistry" + ], + "metadata": { + "id": "EK5gQH2iG5UU" + }, + "execution_count": 26, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()\n", + "g._nodes.shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZtIW-eFGG_R4", + "outputId": "0686e9b3-b684-4b93-da03-289244394338" + }, + "execution_count": 27, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 86.4 ms, sys: 106 ms, total: 193 ms\n", + "Wall time: 191 ms\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(81306, 1)" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "for i in range(10):\n", + " g2 = g.chain([n({'id': 17116707}), e_forward(hops=1)])\n", + "g2._nodes.shape, g2._edges.shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yUaRfw4FHGMb", + "outputId": "3945cc5a-c36c-451b-ac95-8af992a3546f" + }, + "execution_count": 29, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 11.8 s, sys: 8.4 s, total: 20.2 s\n", + "Wall time: 23 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "((140, 1), (615, 2))" + ] + }, + "metadata": {}, + "execution_count": 29 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=1)])._nodes\n", + "print(out.shape)\n", + "del g_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5hM4NBu2_eks", + "outputId": "54505262-4871-44ee-e5e4-ad7ab32c13c2" + }, + "execution_count": 30, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(140, 1)\n", + "CPU times: user 1.33 s, sys: 46.6 ms, total: 1.38 s\n", + "Wall time: 1.63 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "for i in range(10):\n", + " out = g.chain([n({'id': 17116707}), e_forward(hops=2)])\n", + "print(out._nodes.shape, out._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "m2-MxD5lHX6u", + "outputId": "e89b9d4b-6c04-45c7-9e7f-cbdbbe0a4730" + }, + "execution_count": 31, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(2345, 1) (68536, 2)\n", + "CPU times: user 13.3 s, sys: 8.05 s, total: 21.4 s\n", + "Wall time: 21.6 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=2)])._nodes\n", + "print(out.shape)\n", + "del g_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7EQSRbIqLaGw", + "outputId": "60c00a03-9e7b-46b5-fce3-f4f567a09430" + }, + "execution_count": 36, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(2345, 1)\n", + "CPU times: user 1.67 s, sys: 55.8 ms, total: 1.72 s\n", + "Wall time: 1.75 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "for i in range(10):\n", + " out = g.chain([n({'id': 17116707}), e_forward(hops=8)])\n", + "print(out._nodes.shape, out._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hh6WnjI3ITpB", + "outputId": "33138efe-a581-49ed-b2b4-247f8e9bdc09" + }, + "execution_count": 37, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(81304, 1) (2417796, 2)\n", + "CPU times: user 1min 56s, sys: 17.1 s, total: 2min 13s\n", + "Wall time: 2min 22s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=8)])._nodes\n", + "print(out.shape)\n", + "del g_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7jFFVUenM87j", + "outputId": "2cceb720-9de3-488e-8b74-b820fd06e6c1" + }, + "execution_count": 38, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(81304, 1)\n", + "CPU times: user 5.3 s, sys: 1.48 s, total: 6.78 s\n", + "Wall time: 7.89 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({g._node: [17116707]})\n", + "for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=1)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_5LD0bZB_lU4", + "outputId": "bc31bd03-e79f-46d2-ea8f-3b01d9ef39a2" + }, + "execution_count": 39, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(0, 1) (0, 2)\n", + "CPU times: user 2.58 s, sys: 1.59 s, total: 4.17 s\n", + "Wall time: 6.02 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + "g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + "for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del g_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "M_rHjqtvACQw", + "outputId": "8d3e308e-b1e2-452b-f402-573be0dd5b58" + }, + "execution_count": 44, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(61827, 1) (1473599, 2)\n", + "CPU times: user 822 ms, sys: 179 ms, total: 1 s\n", + "Wall time: 997 ms\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({g._node: [17116707]})\n", + "for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0zEIucaCAbj_", + "outputId": "83e64b0f-2b3a-4e4b-d189-3e6a8ef78f53" + }, + "execution_count": 40, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(2345, 1) (68536, 2)\n", + "CPU times: user 8.93 s, sys: 5.92 s, total: 14.9 s\n", + "Wall time: 15.8 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + "g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + "for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del g_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "LKJh5gRtAdIj", + "outputId": "e3c7883d-74c0-4d55-b238-88457296c6bc" + }, + "execution_count": 41, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(2345, 1) (68536, 2)\n", + "CPU times: user 374 ms, sys: 6.92 ms, total: 381 ms\n", + "Wall time: 379 ms\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({g._node: [17116707]})\n", + "for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=8)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JZwxdofNAfmb", + "outputId": "2731be4c-75d9-47f4-8602-4f2d6cb2ddac" + }, + "execution_count": 42, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(81304, 1) (2417796, 2)\n", + "CPU times: user 38.8 s, sys: 8.7 s, total: 47.5 s\n", + "Wall time: 48.2 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + "g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + "for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=8)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del g_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9o_og8bSAhe3", + "outputId": "dd3e4f8f-f426-4705-98c4-60f1912ba28a" + }, + "execution_count": 43, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(81304, 1) (2417796, 2)\n", + "CPU times: user 1.8 s, sys: 506 ms, total: 2.3 s\n", + "Wall time: 2.3 s\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### GPlus\n", + "\n", + "- edges: 30494866\n", + "- nodes: 107614" + ], + "metadata": { + "id": "9dZzAAVONCD2" + } + }, + { + "cell_type": "code", + "source": [ + "! wget https://snap.stanford.edu/data/gplus_combined.txt.gz" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-nhWGNekKpcZ", + "outputId": "e2175290-337c-4faa-e5d8-4bc401583326" + }, + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2023-12-26 18:36:29-- https://snap.stanford.edu/data/gplus_combined.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 398930514 (380M) [application/x-gzip]\n", + "Saving to: ‘gplus_combined.txt.gz’\n", + "\n", + "gplus_combined.txt. 100%[===================>] 380.45M 34.7MB/s in 9.6s \n", + "\n", + "2023-12-26 18:36:39 (39.7 MB/s) - ‘gplus_combined.txt.gz’ saved [398930514/398930514]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "! gunzip gplus_combined.txt.gz" + ], + "metadata": { + "id": "g5wgA_c2KqwJ" + }, + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])\n", + "print(ge_df.shape)\n", + "ge_df.head(5)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 258 + }, + "id": "52hgDbr0Kti6", + "outputId": "217203fc-7095-4784-c4c4-d46ee9c78808" + }, + "execution_count": 6, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(30494866, 2)\n", + "CPU times: user 16 s, sys: 1.45 s, total: 17.5 s\n", + "Wall time: 22.5 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " s d\n", + "0 116374117927631468606 101765416973555767821\n", + "1 112188647432305746617 107727150903234299458\n", + "2 116719211656774388392 100432456209427807893\n", + "3 117421021456205115327 101096322838605097368\n", + "4 116407635616074189669 113556266482860931616" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 6 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", + "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", + "print(gg._edges.shape, gg._nodes.shape)\n", + "gg._nodes.head(5)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 258 + }, + "id": "w5YkN-nLK6UV", + "outputId": "dc98380d-54c2-4b36-c56e-5e8401c4ffa4" + }, + "execution_count": 7, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(30494866, 2) (107614, 1)\n", + "CPU times: user 4.49 s, sys: 1.25 s, total: 5.74 s\n", + "Wall time: 5.97 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606\n", + "1 112188647432305746617\n", + "2 116719211656774388392\n", + "3 117421021456205115327\n", + "4 116407635616074189669" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
1112188647432305746617
2116719211656774388392
3117421021456205115327
4116407635616074189669
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 7 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg.chain([ n({'id': '116374117927631468606'})])._nodes" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 115 + }, + "id": "NKtz54uELX-8", + "outputId": "5d8f3eef-893d-47cc-e7a9-c5cbfec8270c" + }, + "execution_count": 49, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 534 ms, sys: 598 ms, total: 1.13 s\n", + "Wall time: 1.65 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 49 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=1)])._nodes\n", + "out.shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "iNWdi00VLmZG", + "outputId": "ecfb56a6-c564-4bf6-f43f-2c95a103f4be" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 27.5 s, sys: 11.1 s, total: 38.5 s\n", + "Wall time: 39.5 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(1473, 1)" + ] + }, + "metadata": {}, + "execution_count": 75 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=1)])\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del gg_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Q6p3h6uCOABh", + "outputId": "817fc80f-ef5d-4070-eb48-a12344be709c" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(1473, 1) (13375, 2)\n", + "CPU times: user 4.57 s, sys: 2.11 s, total: 6.68 s\n", + "Wall time: 7.63 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=2)])._nodes\n", + "out.shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "6UdCcMdqLw-P", + "outputId": "70742c79-b22b-4db2-c548-cb1e25d572eb" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 45.8 s, sys: 17 s, total: 1min 2s\n", + "Wall time: 1min 5s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(44073, 1)" + ] + }, + "metadata": {}, + "execution_count": 77 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=2)])\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del gg_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QElqatDyNYCS", + "outputId": "0e15bd3e-d2d9-4965-df7d-c8856d036680" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(44073, 1) (2069325, 2)\n", + "CPU times: user 4.97 s, sys: 2.36 s, total: 7.34 s\n", + "Wall time: 10.6 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=3)])._nodes\n", + "out.shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3HJOItZ4MQMG", + "outputId": "f5be7bb4-7f09-4f80-c549-e703e99f5067" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 3min 45s, sys: 1min 5s, total: 4min 50s\n", + "Wall time: 4min 52s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(102414, 1)" + ] + }, + "metadata": {}, + "execution_count": 79 + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=3)])\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del gg_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "G32t_xthOUle", + "outputId": "7721741f-9c86-41aa-eb0b-2c8f0db2ed54" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(102414, 1) (24851333, 2)\n", + "CPU times: user 6.95 s, sys: 2.63 s, total: 9.57 s\n", + "Wall time: 9.84 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=4)])\n", + "print(out._nodes.shape, out._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bXy2yyJsMsEG", + "outputId": "911f2680-067c-44f2-9ba2-7f27d3c9bc6b" + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105479, 1) (30450354, 2)\n", + "CPU times: user 4min 36s, sys: 1min 25s, total: 6min 2s\n", + "Wall time: 6min 4s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=4)])\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del gg_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Vt8hhjWDP_W_", + "outputId": "824ae644-e1cf-4239-bda9-84aecde52ad8" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105479, 1) (30450354, 2)\n", + "CPU times: user 7.44 s, sys: 2.45 s, total: 9.88 s\n", + "Wall time: 9.9 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=5)])\n", + "print(out._nodes.shape, out._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_z4KpNZaOH8t", + "outputId": "2417f78b-e1b7-452d-8e26-7df259620c88" + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105604, 1) (30468335, 2)\n", + "CPU times: user 5min 36s, sys: 1min 39s, total: 7min 16s\n", + "Wall time: 7min 15s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=5)])\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del gg_gdf\n", + "del out" + ], + "metadata": { + "id": "spUBH9EHSz2O", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "22340ce3-e8d4-4a72-b485-9839c667b965" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105604, 1) (30468335, 2)\n", + "CPU times: user 8.82 s, sys: 2.71 s, total: 11.5 s\n", + "Wall time: 11.9 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + "for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=1)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "vCsdmc62A7OM", + "outputId": "adc05d29-c628-49ed-cd6d-8921c6dcd206" + }, + "execution_count": 50, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(1473, 1) (13375, 2)\n", + "CPU times: user 19.9 s, sys: 9.36 s, total: 29.2 s\n", + "Wall time: 41.8 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + "for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=1)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del gg_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "J3kV8NBYBQdW", + "outputId": "76073248-43e1-4c3c-c004-67324cc1d312" + }, + "execution_count": 52, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(1473, 1) (13375, 2)\n", + "CPU times: user 3.71 s, sys: 2.09 s, total: 5.8 s\n", + "Wall time: 6.05 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + "for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ONv1RQeWBeeK", + "outputId": "58d57fa4-be72-45bc-abfa-5de9d1102f55" + }, + "execution_count": 53, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(44073, 1) (2069325, 2)\n", + "CPU times: user 27.8 s, sys: 13.2 s, total: 41 s\n", + "Wall time: 43.9 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + "for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del gg_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ke5SZZ01BgqR", + "outputId": "4173fd28-a11b-4300-d28b-6fdb87e8e9f3" + }, + "execution_count": 54, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(44073, 1) (2069325, 2)\n", + "CPU times: user 4.26 s, sys: 2.37 s, total: 6.63 s\n", + "Wall time: 7.91 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + "for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=3)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "U795pIBUBiZV", + "outputId": "d499433c-cc0c-4bbf-c69f-36b5d55402d9" + }, + "execution_count": 55, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(102414, 1) (24851333, 2)\n", + "CPU times: user 1min 3s, sys: 22.7 s, total: 1min 26s\n", + "Wall time: 1min 35s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + "for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=3)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del gg_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "kIZYwSe1Bj2e", + "outputId": "b7e1ed9f-47d1-412e-9593-ecc436ac1486" + }, + "execution_count": 56, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(102414, 1) (24851333, 2)\n", + "CPU times: user 3.96 s, sys: 2.11 s, total: 6.07 s\n", + "Wall time: 6.05 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + "for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=4)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "YTI5sD6YBpYL", + "outputId": "b37bf2df-07dc-404c-8a83-a83f28e38bf6" + }, + "execution_count": 57, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105479, 1) (30450354, 2)\n", + "CPU times: user 1min 34s, sys: 30.6 s, total: 2min 5s\n", + "Wall time: 2min 5s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + "for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=4)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del gg_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "d5WBazICBrSz", + "outputId": "ef95e893-3a0f-4d47-ede4-bd8a6faebf98" + }, + "execution_count": 58, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105479, 1) (30450354, 2)\n", + "CPU times: user 5.25 s, sys: 2.41 s, total: 7.67 s\n", + "Wall time: 7.69 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + "for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + "print(g2._nodes.shape, g2._edges.shape)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ozQlRPaFBtPD", + "outputId": "4f1655c4-38fd-47f9-942d-836585e0d866" + }, + "execution_count": 59, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105604, 1) (30468335, 2)\n", + "CPU times: user 2min 16s, sys: 39.1 s, total: 2min 55s\n", + "Wall time: 2min 58s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + "for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del gg_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-ACkMG20B6HM", + "outputId": "f26c03a9-9f25-4f93-c7d3-0e8676694040" + }, + "execution_count": 60, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(105604, 1) (30468335, 2)\n", + "CPU times: user 5.79 s, sys: 2.51 s, total: 8.3 s\n", + "Wall time: 8.29 s\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Orkut\n", + "- 117M edges\n", + "- 3M nodes" + ], + "metadata": { + "id": "R03M_swxarKC" + } + }, + { + "cell_type": "code", + "source": [ + "! wget https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QoabYR2maxPo", + "outputId": "2bb6275d-46bb-42da-ec05-d0e5a58b1f77" + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2023-12-26 00:55:52-- https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 447251958 (427M) [application/x-gzip]\n", + "Saving to: ‘com-orkut.ungraph.txt.gz’\n", + "\n", + "com-orkut.ungraph.t 100%[===================>] 426.53M 45.1MB/s in 9.7s \n", + "\n", + "2023-12-26 00:56:02 (44.0 MB/s) - ‘com-orkut.ungraph.txt.gz’ saved [447251958/447251958]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "! gunzip com-orkut.ungraph.txt.gz" + ], + "metadata": { + "id": "BvvfFPKWbAVJ" + }, + "execution_count": 9, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "! head -n 7 com-orkut.ungraph.txt" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "YsWwRoPqbPIb", + "outputId": "2eb4f862-b4e1-42bf-ff5d-eec10b27cedc" + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "# Undirected graph: ../../data/output/orkut.txt\n", + "# Orkut\n", + "# Nodes: 3072441 Edges: 117185083\n", + "# FromNodeId\tToNodeId\n", + "1\t2\n", + "1\t3\n", + "1\t4\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "import pandas as pd\n", + "\n", + "import graphistry\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "\n", + "import cudf\n", + "\n", + "#work around google colab shell encoding bugs\n", + "import locale\n", + "locale.getpreferredencoding = lambda: \"UTF-8\"\n", + "\n", + "cudf.__version__, graphistry.__version__" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cbMC8r2ldjbW", + "outputId": "82688d53-7d56-4563-d65e-7c5cd32ac14e" + }, + "execution_count": 11, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "('23.12.01', '0.32.0+12.g72e778c')" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ] + }, + { + "cell_type": "code", + "source": [ + "! nvidia-smi" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "TopFxAvnh_Cv", + "outputId": "cc9d9dc9-e594-4190-fe84-3f1b6dce8a1a" + }, + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:27 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 47C P0 27W / 70W | 103MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_df = cudf.read_csv('com-orkut.ungraph.txt', sep='\\t', names=['s', 'd'], skiprows=5).to_pandas()\n", + "print(co_df.shape)\n", + "print(co_df.head(5))\n", + "print(co_df.dtypes)\n", + "#del co_df" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Oczs87ITbJgw", + "outputId": "ac203ddd-e684-4eb9-a586-f6a49fd1625d" + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(117185082, 2)\n", + " s d\n", + "0 1 3\n", + "1 1 4\n", + "2 1 5\n", + "3 1 6\n", + "4 1 7\n", + "s int64\n", + "d int64\n", + "dtype: object\n", + "CPU times: user 2.56 s, sys: 4.2 s, total: 6.76 s\n", + "Wall time: 6.76 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_g = graphistry.edges(cudf.DataFrame(co_df), 's', 'd').materialize_nodes(engine='cudf')\n", + "co_g = co_g.nodes(lambda g: g._nodes.to_pandas()).edges(lambda g: g._edges.to_pandas())\n", + "print(co_g._nodes.shape, co_g._edges.shape)\n", + "co_g._nodes.head(5)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 258 + }, + "id": "gGSDjTtveFAT", + "outputId": "e7b38f4f-dc07-4f35-9bab-9c80a80bbf0b" + }, + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(3072441, 1) (117185082, 2)\n", + "CPU times: user 1.96 s, sys: 2.95 s, total: 4.91 s\n", + "Wall time: 4.92 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 1\n", + "1 2\n", + "2 3\n", + "3 4\n", + "4 5" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
01
12
23
34
45
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ] + }, + { + "cell_type": "code", + "source": [ + "! nvidia-smi" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "V5qL8K7-dqIZ", + "outputId": "e08319fc-74d3-4f33-df0f-f98950dc8c99" + }, + "execution_count": 15, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:39 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 49C P0 27W / 70W | 2819MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "# crashes\n", + "if False:\n", + " out = co_g.chain([ n({'id': 1}), e_forward(hops=1)])._nodes\n", + " print(out.shape)\n", + " del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hCbxZ8UmhRLp", + "outputId": "519aed6c-733d-41f4-d462-e57f5e32b131" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 4 µs, sys: 1 µs, total: 5 µs\n", + "Wall time: 47.7 µs\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=1)])\n", + "! nvidia-smi\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del co_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Q682scC_eC-S", + "outputId": "7ff5f829-0de7-4a6c-a77d-e2857896a8a5" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Dec 25 06:23:46 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 63C P0 30W / 70W | 1925MiB / 15360MiB | 35% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Mon Dec 25 06:23:49 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 66C P0 72W / 70W | 2845MiB / 15360MiB | 84% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(12, 1) (11, 2)\n", + "CPU times: user 4.42 s, sys: 131 ms, total: 4.55 s\n", + "Wall time: 4.42 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=2)])\n", + "! nvidia-smi\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del co_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "i0AXhfqVbVsm", + "outputId": "8271f469-a73f-48e3-e1a9-3077026ab8ec" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Dec 25 06:24:52 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 61C P0 29W / 70W | 1925MiB / 15360MiB | 22% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Mon Dec 25 06:24:58 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 64C P0 71W / 70W | 2845MiB / 15360MiB | 57% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(391, 1) (461, 2)\n", + "CPU times: user 5.34 s, sys: 132 ms, total: 5.47 s\n", + "Wall time: 6.13 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=3)])\n", + "! nvidia-smi\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del co_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Hid0-iPKhpOd", + "outputId": "ecaeb534-d4d7-48fa-d4e1-c80b22626afe" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Dec 25 06:25:25 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 61C P0 29W / 70W | 1925MiB / 15360MiB | 31% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Mon Dec 25 06:25:31 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 65C P0 71W / 70W | 2849MiB / 15360MiB | 58% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(21767, 1) (28480, 2)\n", + "CPU times: user 6.25 s, sys: 100 ms, total: 6.35 s\n", + "Wall time: 6.37 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=4)])\n", + "! nvidia-smi\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del co_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "buutj-ZjhrEe", + "outputId": "ae11addd-6bea-44e9-81c0-b431e1db8089" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Dec 25 06:26:04 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 61C P0 29W / 70W | 1927MiB / 15360MiB | 36% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Mon Dec 25 06:26:13 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 65C P0 71W / 70W | 2931MiB / 15360MiB | 90% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(718640, 1) (2210961, 2)\n", + "CPU times: user 9.01 s, sys: 1.03 s, total: 10 s\n", + "Wall time: 9.84 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=5)])\n", + "! nvidia-smi\n", + "print(out._nodes.shape, out._edges.shape)\n", + "del co_gdf\n", + "del out" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bK4C9Ly0hso-", + "outputId": "8a9a32ab-03e2-42b4-8b71-2bcf797b31b1" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Dec 25 06:27:18 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 60C P0 29W / 70W | 1927MiB / 15360MiB | 28% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Mon Dec 25 06:27:57 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 72C P0 43W / 70W | 4351MiB / 15360MiB | 100% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(3041556, 1) (47622917, 2)\n", + "CPU times: user 34.9 s, sys: 4.76 s, total: 39.6 s\n", + "Wall time: 39.2 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "out = co_gdf.chain([ n({'id': 1}), e_forward(hops=6)])._nodes\n", + "print(out.shape)\n", + "del co_gdf\n", + "del out" + ], + "metadata": { + "id": "qrga-la0hwhh" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "!lscpu\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "eiXFImxF-rzw", + "outputId": "b807cc3d-ed1a-4bef-c6e0-bfc2df7356ff" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Architecture: x86_64\n", + " CPU op-mode(s): 32-bit, 64-bit\n", + " Address sizes: 46 bits physical, 48 bits virtual\n", + " Byte Order: Little Endian\n", + "CPU(s): 2\n", + " On-line CPU(s) list: 0,1\n", + "Vendor ID: GenuineIntel\n", + " Model name: Intel(R) Xeon(R) CPU @ 2.20GHz\n", + " CPU family: 6\n", + " Model: 79\n", + " Thread(s) per core: 2\n", + " Core(s) per socket: 1\n", + " Socket(s): 1\n", + " Stepping: 0\n", + " BogoMIPS: 4399.99\n", + " Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf\n", + " lush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_\n", + " good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fm\n", + " a cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyp\n", + " ervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsb\n", + " ase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsa\n", + " veopt arat md_clear arch_capabilities\n", + "Virtualization features: \n", + " Hypervisor vendor: KVM\n", + " Virtualization type: full\n", + "Caches (sum of all): \n", + " L1d: 32 KiB (1 instance)\n", + " L1i: 32 KiB (1 instance)\n", + " L2: 256 KiB (1 instance)\n", + " L3: 55 MiB (1 instance)\n", + "NUMA: \n", + " NUMA node(s): 1\n", + " NUMA node0 CPU(s): 0,1\n", + "Vulnerabilities: \n", + " Gather data sampling: Not affected\n", + " Itlb multihit: Not affected\n", + " L1tf: Mitigation; PTE Inversion\n", + " Mds: Vulnerable; SMT Host state unknown\n", + " Meltdown: Vulnerable\n", + " Mmio stale data: Vulnerable\n", + " Retbleed: Vulnerable\n", + " Spec rstack overflow: Not affected\n", + " Spec store bypass: Vulnerable\n", + " Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swap\n", + " gs barriers\n", + " Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected\n", + " Srbds: Not affected\n", + " Tsx async abort: Vulnerable\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "!free -h\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wJohLi58-sN5", + "outputId": "c3e144f6-c19a-4c68-e867-f5e7fa2e9df4" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " total used free shared buff/cache available\n", + "Mem: 12Gi 717Mi 8.0Gi 1.0Mi 3.9Gi 11Gi\n", + "Swap: 0B 0B 0B\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = pd.DataFrame({'id': [1]})\n", + "! nvidia-smi\n", + "for i in range(1):\n", + " g2 = co_g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=1)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "#del start_nodes\n", + "#del co_gdf\n", + "#del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "zak4Inhco5il", + "outputId": "30bcf2bc-853e-4e5e-8c57-ba0cd9429554" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 01:01:43 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 64C P0 30W / 70W | 2821MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({'id': [1]})\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " g2 = co_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=1)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del co_gdf\n", + "del g2" + ], + "metadata": { + "id": "-SmFlCBS_Bgx", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "d2326cf7-3ea6-4f99-9548-f2e98ece59a4" + }, + "execution_count": 16, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:45 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 49C P0 28W / 70W | 1923MiB / 15360MiB | 37% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Tue Dec 26 00:56:47 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 52C P0 70W / 70W | 2819MiB / 15360MiB | 79% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(12, 1) (11, 2)\n", + "CPU times: user 1.6 s, sys: 37.3 ms, total: 1.64 s\n", + "Wall time: 1.84 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({'id': [1]})\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " g2 = co_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=2)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del co_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fjjt3YnYnabv", + "outputId": "05762f50-bfe1-4d23-9153-31431418c8e5" + }, + "execution_count": 17, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:47 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 51C P0 35W / 70W | 1923MiB / 15360MiB | 59% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Tue Dec 26 00:56:49 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 53C P0 59W / 70W | 2821MiB / 15360MiB | 86% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(391, 1) (461, 2)\n", + "CPU times: user 2.32 s, sys: 58.5 ms, total: 2.38 s\n", + "Wall time: 2.51 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({'id': [1]})\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " g2 = co_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=3)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del co_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oIouuORgnbcY", + "outputId": "f07abe4c-5137-4ee3-935a-afbb2c5eaa1e" + }, + "execution_count": 18, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:50 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 52C P0 36W / 70W | 1925MiB / 15360MiB | 55% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Tue Dec 26 00:56:53 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 54C P0 75W / 70W | 2825MiB / 15360MiB | 74% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(21767, 1) (28480, 2)\n", + "CPU times: user 3.04 s, sys: 63.6 ms, total: 3.1 s\n", + "Wall time: 3.25 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({'id': [1]})\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " g2 = co_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=4)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del co_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oNLZGjwInc85", + "outputId": "534097cf-4022-48cc-9419-a00c135f69e1" + }, + "execution_count": 19, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:53 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 54C P0 36W / 70W | 1927MiB / 15360MiB | 54% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Tue Dec 26 00:56:58 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 56C P0 38W / 70W | 2907MiB / 15360MiB | 89% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(718640, 1) (2210961, 2)\n", + "CPU times: user 4.58 s, sys: 309 ms, total: 4.89 s\n", + "Wall time: 5.02 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({'id': [1]})\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " g2 = co_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del co_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ePqaeujMneX8", + "outputId": "ffd88fff-016e-4ac0-ecb9-fa06baca60f8" + }, + "execution_count": 20, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:56:58 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 55C P0 37W / 70W | 1925MiB / 15360MiB | 59% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Tue Dec 26 00:57:10 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 60C P0 48W / 70W | 4325MiB / 15360MiB | 99% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(3041556, 1) (47622917, 2)\n", + "CPU times: user 10.8 s, sys: 1.29 s, total: 12.1 s\n", + "Wall time: 12 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%time\n", + "start_nodes = cudf.DataFrame({'id': [1]})\n", + "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "! nvidia-smi\n", + "for i in range(10):\n", + " g2 = co_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=6)\n", + "! nvidia-smi\n", + "print(g2._nodes.shape, g2._edges.shape)\n", + "del start_nodes\n", + "del co_gdf\n", + "del g2" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "PTBkoIVHnfzK", + "outputId": "5615ecd7-47ea-46ab-fd36-13bce4b3c787" + }, + "execution_count": 21, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Tue Dec 26 00:57:10 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 59C P0 38W / 70W | 1925MiB / 15360MiB | 44% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "Tue Dec 26 00:57:38 2023 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 68C P0 55W / 70W | 6445MiB / 15360MiB | 95% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n", + "(3071927, 1) (117032738, 2)\n", + "CPU times: user 23.5 s, sys: 2.68 s, total: 26.2 s\n", + "Wall time: 28.2 s\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "Ygc2nrkznlCu" + }, + "execution_count": null, + "outputs": [] + } + ] +} \ No newline at end of file From 9a9de515a725e1d173f9f3e2197249e34349833e Mon Sep 17 00:00:00 2001 From: Leo Meyerovich Date: Tue, 26 Dec 2023 23:52:50 -0800 Subject: [PATCH 104/104] docs(changelog) --- CHANGELOG.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fd1fcab40b..84348f3c9d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,11 +7,13 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm ## [Development] +## [0.33.0 - 2023-12-26] + ### Added +* GFQL: GPU acceleration of `chain`, `hop`, `filter_by_dict` * `AbstractEngine` to `engine.py::Engine` enum * `compute.typing.DataFrameT` to centralize df-lib-agnostic type checking -* `chain`, `hop`, `filter_by_dict` variants support GPU execution ### Refactor