[BUG] Unable to push down predicates for NESTED data type from Athena to Lambda connector #1693

shubhA941 · 2024-01-09T19:41:43Z

Describe the bug
I am using Athena OpenSearch lambda connector to query OpenSearch index data in SQL manner. While doing so, i am seeing from lambda logs that it is unable to evaluate predicates (filters) for nested object, and hence, its scanning/scrolling all data from OpenSearch index.

I have OpenSearch index schema which looks like this:
creationTime : string
usage : struct -- struct contains first class attributes and other nested attributes

This is how Struct looks like : struct<date:string,version:bigint,revenueList:array<struct<country:string,unit:string,cost:double,quantity:bigint,eventList:array<structcount:bigint,type:string>,state:string>>>

Issue with Athena Query : "select * from index where usage.version = 1"
When i query nested object (here, usage), lambda did not evaluate any predicates and starts to scroll/scan full OpenSearch index. While doing so, athena query times out at 15 mins (lambda max time).

Lambda logs for the same :
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO ElasticsearchQueryUtils:114 - Predicates are NOT formed.
2024-01-09 19:35:52 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO GeneratedRowWriter:129 - recompile: Detected a new block, rebuilding field writers so they point to the correct Arrow vectors.
2024-01-09 19:36:03 595425a9-f70d-4ff2-85f9-415a2f4e075f INFO S3BlockSpiller:208 - writeRow: Spilling block with 33625 rows and 16000220 bytes and config 16000000 bytes

(This logs keeps on coming until lambda timeout, i.e. scrolling all index data and writing 16000000 bytes in athena spill bucket)

Expected behavior
Ideally, lambda should be evaluating right set of predicates on nested objects as well. In our case predicates are important because it totally defines how the filtering clauses would get executed on the OpenSearch query. If right set of filters are not passed to OpenSearch, then lambda would starts to scan all index data which is costlier in terms of time.

shubhA941 added the bug Something isn't working label Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unable to push down predicates for NESTED data type from Athena to Lambda connector #1693

[BUG] Unable to push down predicates for NESTED data type from Athena to Lambda connector #1693

shubhA941 commented Jan 9, 2024

[BUG] Unable to push down predicates for NESTED data type from Athena to Lambda connector #1693

[BUG] Unable to push down predicates for NESTED data type from Athena to Lambda connector #1693

Comments

shubhA941 commented Jan 9, 2024