-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage increased by more than 4x until OOM(140G) when upgrading from v1.5.4 to the latest commit #581
Comments
(UPDATE): |
(Update): Adding some information obtained from previous debugs, the The below couple logs are the memory usage(
Currently I am using |
Hello there, as the title.
Due to some cases, I need to change from
s3
to use thegocloud/blob
implementation to support the other storage providers.I initially used the release of
github.com/xitongsys/parquet-go v1.5.4
for testing thegocloud/blob
, but found thatgocloud/blob
sometimes failed to write/read data in this version, so I later changed it to use the latest release (github.com/xitongsys/parquet-go v1.6.2
), and also have updated the parquet tags to meet the new release.(e.g., update the below tag from
type=UTF8
totype=BYTE_ARRAY, convertedtype=UTF8
)to
The read/write tests were all good at first, but when I needed to write dozens gigabytes(over 100G) of data, I found that the updated version caused OOM problems.
At the beginning, I thought this might be because there might be an issue with the implementation of
gocloud/blob
, so I changed back tos3
for testing, but found that the problem was still not solved.I saw that after the latest release, there were several fixes that seemed to be related to memory, so I upgraded my version to the latest commit (
github.com/xitongsys/parquet-go v1.6.3-0.20231102094431-8ca067b2bd32
), but the OOM problem is still not solved. 😭 😔==============================
So I'm here and raise my hand.
Does anyone know of any reasons that may cause 4x(or more) of memory usage after upgrading from
v1.5.4
tov1.6.2
/v1.6.3-0.20231102094431-8ca067b2bd32
?When writing dozens gigabytes of data, memory usage grows as follows:
v1.5.4
withS3
- 35G (works nicely)v1.6.0
withS3
- 35G (works nicely)v1.6.0
withgocloud/blob
- 35G (works nicely)v1.6.2
withS3
- 140G (OOM, so might more than 140G)v1.6.2
withgocloud/blob
- 140G (OOM, so might more than 140G)v1.6.3-0.20231102094431-8ca067b2bd32
withS3
- 140G (OOM, so might more than 140G)My writer parameters(there isn't any code change at here from
v1.5.4
):The text was updated successfully, but these errors were encountered: