Update/delete operation on acid table sometimes fetches wrong bucket ID to write to #92

sourabh912 · 2020-08-06T18:56:48Z

One of the issues in issue #70 was that update in merge command was failing because multiple spark tasks were trying to write to same bucket file. Ideally this should not have happened because in spark acid writer, we repartition data frame on the basis of rowId.bucketID column so that all rows with same bucketID go to the same task. But there is a bug in getting bucket ID from each InternalRow of the table during update/delete operation . The issue is that while fetching bucket id from unsafe row, we are passing table schema. Instead we should have passed rowID schema (which is a struct type and contains bucketID, rowID and writeID). As a result of it, unsafe row returns wrong integer value for rowID column.

sourabh912 · 2020-08-06T19:03:17Z

PR link: https://github.com/qubole/spark-acid/pull/93/files

@amoghmargoor : Please review.

amoghmargoor linked a pull request Aug 6, 2020 that will close this issue

issue #92: Fixes bucket ID bug in update/delete operation on acid table #93

Merged

amoghmargoor closed this as completed in #93 Aug 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update/delete operation on acid table sometimes fetches wrong bucket ID to write to #92

Update/delete operation on acid table sometimes fetches wrong bucket ID to write to #92

sourabh912 commented Aug 6, 2020

sourabh912 commented Aug 6, 2020 •

edited

Loading

Update/delete operation on acid table sometimes fetches wrong bucket ID to write to #92

Update/delete operation on acid table sometimes fetches wrong bucket ID to write to #92

Comments

sourabh912 commented Aug 6, 2020

sourabh912 commented Aug 6, 2020 • edited Loading

sourabh912 commented Aug 6, 2020 •

edited

Loading