1BRC in KDB/Q #208
Replies: 4 comments 3 replies
-
I think the issue for KDB/Q is efficiently reading the file (or rather, how to parallelise the file read). Just reading in the file alone, via \ts .Q.fsn[{};`:measurements.txt;536870912] // 512MB chunks
127813 4050501232 I had a go at figuring out the right places to split the file and read it in, but it's still awfully slow. The findLineFeed:{[filePath;offset;searchLength]
offset + first where 0x0a = read1 (filePath;offset;searchLength)
}
getChunks:{[filePath;numberOfRows]
fileSize: hcount filePath;
// approximately 13 chars per line on average in test data
approxChunkLength: numberOfRows * 13;
numberOfChunks: ceiling fileSize % approxChunkLength;
approxOffsets: 1 _ til[numberOfChunks] * approxChunkLength;
// max row length: 100 + count ";-99.9\n"
offsets: 0,1 + findLineFeed[filePath;;107] peach approxOffsets;
lengths: (1 _ deltas offsets),fileSize - last offsets;
// returns list of (filePath;offset;length)
:filePath,/:offsets,'lengths;
};
f:{[rows]
0!select Min:min Measurement, Max:max Measurement, Count:count i, Sum:sum Measurement by Station from flip`Station`Measurement!("sf";";") 0: rows
};
filename: `:measurements.txt
\ts t:raze f peach getChunks[filename;10000000] // approximately 1mio rows per chunk -> 267646 54368656
select min Min, max Max, sum[Sum] % sum[Count] by Station from t Edit: q)\ts t:raze f peach getChunks[filename;10000000]
189610 18400 ... hopefully a Q God can swoop in and come up with a decent implementation. |
Beta Was this translation helpful? Give feedback.
-
With enough RAM the calculation can be done as 1 line, but breaking it into the load vs the aggregation: \ts t:flip`Station`Measurement!("sf";";") 0: `:measurements.txt
304032 25769821840 // (milliseconds;bytes) i.e. ~5 minutes, 24GB RAM used
\ts select Min:min Measurement, Max:max Measurement, Mean:avg Measurement by Station from t
2847 8589992928 // ~2.8 seconds, 8GB RAM I haven't managed to get any improvement with the above code leveraging \ts t:raze f peach filename,/:getChunks[filename;100000000] / 11 chunk(s)
383154 331920
\ts t:raze f peach filename,/:getChunks[filename;10000000] / 107 chunks
316436 2631952
\ts t:raze f peach filename,/:getChunks[filename;1000000] / 1062 chunks
311177 21089552 |
Beta Was this translation helpful? Give feedback.
-
Testing out the new 4.1t release with multithreaded
|
Beta Was this translation helpful? Give feedback.
-
1BRC in KDB/Q
Beta Was this translation helpful? Give feedback.
All reactions