You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the issues in issue #70 was that update in merge command was failing because multiple spark tasks were trying to write to same bucket file. Ideally this should not have happened because in spark acid writer, we repartition data frame on the basis of rowId.bucketID column so that all rows with same bucketID go to the same task. But there is a bug in getting bucket ID from each InternalRow of the table during update/delete operation . The issue is that while fetching bucket id from unsafe row, we are passing table schema. Instead we should have passed rowID schema (which is a struct type and contains bucketID, rowID and writeID). As a result of it, unsafe row returns wrong integer value for rowID column.
The text was updated successfully, but these errors were encountered:
One of the issues in issue #70 was that update in merge command was failing because multiple spark tasks were trying to write to same bucket file. Ideally this should not have happened because in spark acid writer, we repartition data frame on the basis of rowId.bucketID column so that all rows with same bucketID go to the same task. But there is a bug in getting bucket ID from each InternalRow of the table during update/delete operation . The issue is that while fetching bucket id from unsafe row, we are passing table schema. Instead we should have passed rowID schema (which is a struct type and contains bucketID, rowID and writeID). As a result of it, unsafe row returns wrong integer value for rowID column.
The text was updated successfully, but these errors were encountered: