1BRC is a challenge, originally in Java to process a text file of weather station names and temperatures, and for each weather station, print out the minimum, mean, and maximum. It sounds simple, but the catch is that the file contains one billion rows. Fun!
A talk related to this was presented at Code Beam Europe 2024. The slides are available here.
- This repo uses Nix flakes to manage dependencies. To get started, run
direnv allow
in the root of the repo to activate the Nix environment. - Execute
run deps
to install dependencies. - To create measurements file, execute
run create_measurements 1_000_000
. This will create a filedata/measurements.1000000.txt
with 1 million measurements. - To process the measurements, execute
run process_measurements 1_000_000
. This will process the file created in the previous step. - That's it, you can follow the codepaths, starting from
bin/run
to explore further ✌️
run deps
- installs mix dependenciesrun iex
- spawns an iex sessionrun format
- runs mix formatrun process_measurements --count=x --version=y
- runs the code for version y with the measurement file that has x number of measurements. also verifies that the results are correct by comparing with baseline resultsrun create_measurements <count>
- creates a measurements file with specified countrun create_measurements.profile <count>
- creates measurements file with profilingrun create_baseline_results <count>
- generates baseline results with a known correct wayrun process_measurements.repeat --count=x --version=y
- runs process_measurements 5 times with given count and versionrun_with_cpu_profiling <command>
- runs a command with CPU profilingrun livebook.setup
- sets up Livebookrun livebook.server
- starts Livebook serverrun livebook
- starts both iex and Livebook server concurrentlyrun process_measurements.profile.eprof --count=x --version=y
- profiles with eprofrun process_measurements.profile.cprof --count=x --version=y
- profiles with cprofrun process_measurements.profile.eflambe --count=x --version=y
- profiles with eflamberun process_measurements.profile.benchee
- runs script for using benchee to compare different versionsrun all_versions --count=x
- runs all versions with x measurements
- The goal here is to create a file with one billion rows, each row containing a weather station name and a temperature.
- There are baseline temperatures for each station given, and the temperature for each row is the baseline temperature for that station plus a random number between -10 and 10.
- The file is a text file with each line of the format
station_name;temperature
. - The fastest solution I've implemented so far creates the file with 1 billion measurements in 205 seconds. The code is at
lib/one_brc/measurements_generator.ex
.
run create_measurements 1000
- creates a file./data/measurements.1000.txt
with 1000 measurements. You can change 1000 to the number of measurements you want.run create_measurements.profile 1000
- creates a file./data/measurements.1000.txt
with 1000 measurements and profiles the execution usingeprof
.- Check
./bin/run
to see what the above commands do.
-
I've used eFlambe to get performance traces of different versions. Following are the links to the traces, viewed in Speedoscope:
-
Speedoscope: Version 1, 1000 measurements, Link to bggg file
-
Speedoscope: Version 2, 1000 measurements, Link to bggg file
-
Speedoscope: Version 3, 1000 measurements, Link to bggg file
-
Speedoscope: Version 4, 1000 measurements, Link to bggg file
-
Speedoscope: Version 5, 1000 measurements, Link to bggg file
-
Speedoscope: Version 6, 1000 measurements, Link to bggg file
-
Speedoscope: Version 7, 1000 measurements, Link to bggg file