Skip to content

Commit

Permalink
Merge pull request #21 from lsst-sqre/u/afausti/review-notebooks
Browse files Browse the repository at this point in the history
Update Chunked queries notebook
  • Loading branch information
afausti authored Nov 16, 2023
2 parents 70eea17 + 9c93bac commit 6f6d783
Showing 1 changed file with 7 additions and 37 deletions.
44 changes: 7 additions & 37 deletions docs/user-guide/notebooks/ChunkedQueries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When dealing with large result sets, fetching the entire data at once can lead to excessive memory usage and slower performance. \n",
"Fortunately, there is a solution called \"chunked queries\" that allows us to retrieve data in smaller, manageable chunks. \n",
"By employing this technique, we can optimize memory usage and significantly improve query performance.\n",
"When dealing with large result sets, fetching all the data at once can lead to excessive memory usage and slower performance. \n",
"Fortunately, there is a solution called \"chunked queries\" to retrieve data in smaller, manageable chunks. \n",
"\n",
"Chunked queries are particularly useful when working with datasets that contain millions of data points. \n",
"Chunked queries are handy when working with millions of data points. \n",
"Rather than requesting the entire result set in one go, we can specify a maximum chunk size to split the data into smaller portions. \n",
"\n",
"It's important to note that the optimal chunk size may vary depending on the specific query.\n",
"While it may seem intuitive that a smaller chunk size would result in faster query execution, that's not always the case. In fact, setting the chunk size too small can introduce overhead by generating a large number of requests to the database. \n"
"While it may seem intuitive that a smaller chunk size would result in faster query execution, that's not always the case. In fact, setting the chunk size too small can introduce overhead by generating many requests to the database. \n"
]
},
{
Expand Down Expand Up @@ -102,18 +101,7 @@
},
"outputs": [],
"source": [
"fields = \", \".join([f\"xForce{i}\" for i in range(156)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"query = f'''SELECT {fields} FROM \"lsst.sal.MTM1M3.forceActuatorData\" WHERE time > now() - 1d '''\n",
"query = f'''SELECT /xForce/ FROM \"lsst.sal.MTM1M3.forceActuatorData\" WHERE time > now()-6h'''\n",
"query"
]
},
Expand All @@ -130,7 +118,7 @@
"tags": []
},
"source": [
"By implementing chunked queries with the appropriate configuration, we can retrieve a dataframe with hundreds of millions dof ata points in a few minutes."
"By implementing chunked queries with the appropriate configuration, we can retrieve a dataframe with millions of data points in less than a minute."
]
},
{
Expand Down Expand Up @@ -166,24 +154,6 @@
"source": [
"df.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After retrieving the data, it is recommended to save a local copy and utilize it for analysis, as this helps prevent overloading the database."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.to_parquet('df.parquet')"
]
}
],
"metadata": {
Expand All @@ -202,7 +172,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 6f6d783

Please sign in to comment.