You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I am using Athena OpenSearch lambda connector to query OpenSearch index data in SQL manner. While doing so, i am seeing from lambda logs that it is unable to evaluate predicates (filters) for nested object, and hence, its scanning/scrolling all data from OpenSearch index.
I have OpenSearch index schema which looks like this:
creationTime : string
usage : struct -- struct contains first class attributes and other nested attributes
This is how Struct looks like : struct<date:string,version:bigint,revenueList:array<struct<country:string,unit:string,cost:double,quantity:bigint,eventList:array<structcount:bigint,type:string>,state:string>>>
Issue with Athena Query : "select * from index where usage.version = 1"
When i query nested object (here, usage), lambda did not evaluate any predicates and starts to scroll/scan full OpenSearch index. While doing so, athena query times out at 15 mins (lambda max time).
Lambda logs for the same :
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO ElasticsearchQueryUtils:114 - Predicates are NOT formed.
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO GeneratedRowWriter:129 - recompile: Detected a new block, rebuilding field writers so they point to the correct Arrow vectors.
2024-01-09 19:36:03 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO S3BlockSpiller:208 - writeRow: Spilling block with 33625 rows and 16000220 bytes and config 16000000 bytes
(This logs keeps on coming until lambda timeout, i.e. scrolling all index data and writing 16000000 bytes in athena spill bucket)
Expected behavior
Ideally, lambda should be evaluating right set of predicates on nested objects as well. In our case predicates are important because it totally defines how the filtering clauses would get executed on the OpenSearch query. If right set of filters are not passed to OpenSearch, then lambda would starts to scan all index data which is costlier in terms of time.
The text was updated successfully, but these errors were encountered:
Describe the bug
I am using Athena OpenSearch lambda connector to query OpenSearch index data in SQL manner. While doing so, i am seeing from lambda logs that it is unable to evaluate predicates (filters) for nested object, and hence, its scanning/scrolling all data from OpenSearch index.
I have OpenSearch index schema which looks like this:
creationTime : string
usage : struct -- struct contains first class attributes and other nested attributes
This is how Struct looks like : struct<date:string,version:bigint,revenueList:array<struct<country:string,unit:string,cost:double,quantity:bigint,eventList:array<structcount:bigint,type:string>,state:string>>>
Issue with Athena Query : "select * from index where usage.version = 1"
When i query nested object (here, usage), lambda did not evaluate any predicates and starts to scroll/scan full OpenSearch index. While doing so, athena query times out at 15 mins (lambda max time).
Lambda logs for the same :
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO ElasticsearchQueryUtils:114 - Predicates are NOT formed.
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO GeneratedRowWriter:129 - recompile: Detected a new block, rebuilding field writers so they point to the correct Arrow vectors.
2024-01-09 19:36:03 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO S3BlockSpiller:208 - writeRow: Spilling block with 33625 rows and 16000220 bytes and config 16000000 bytes
(This logs keeps on coming until lambda timeout, i.e. scrolling all index data and writing 16000000 bytes in athena spill bucket)
Expected behavior
Ideally, lambda should be evaluating right set of predicates on nested objects as well. In our case predicates are important because it totally defines how the filtering clauses would get executed on the OpenSearch query. If right set of filters are not passed to OpenSearch, then lambda would starts to scan all index data which is costlier in terms of time.
The text was updated successfully, but these errors were encountered: