Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update/delete operation on acid table sometimes fetches wrong bucket ID to write to #92

Closed
sourabh912 opened this issue Aug 6, 2020 · 1 comment · Fixed by #93
Closed

Comments

@sourabh912
Copy link
Contributor

One of the issues in issue #70 was that update in merge command was failing because multiple spark tasks were trying to write to same bucket file. Ideally this should not have happened because in spark acid writer, we repartition data frame on the basis of rowId.bucketID column so that all rows with same bucketID go to the same task. But there is a bug in getting bucket ID from each InternalRow of the table during update/delete operation . The issue is that while fetching bucket id from unsafe row, we are passing table schema. Instead we should have passed rowID schema (which is a struct type and contains bucketID, rowID and writeID). As a result of it, unsafe row returns wrong integer value for rowID column.

@sourabh912
Copy link
Contributor Author

sourabh912 commented Aug 6, 2020

PR link: https://github.com/qubole/spark-acid/pull/93/files

@amoghmargoor : Please review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant