The One Billion Row Challenge (in Elixir)

1BRC is a challenge, originally in Java to process a text file of weather station names and temperatures, and for each weather station, print out the minimum, mean, and maximum. It sounds simple, but the catch is that the file contains one billion rows. Fun!

A talk related to this was presented at Code Beam Europe 2024. The slides are available here.

Setting up this repo

This repo uses Nix flakes to manage dependencies. To get started, run direnv allow in the root of the repo to activate the Nix environment.
Execute run deps to install dependencies.
To create measurements file, execute run create_measurements 1_000_000. This will create a file data/measurements.1000000.txt with 1 million measurements.
To process the measurements, execute run process_measurements 1_000_000. This will process the file created in the previous step.
That's it, you can follow the codepaths, starting from bin/run to explore further ✌️

Available commands

run deps - installs mix dependencies
run iex - spawns an iex session
run format - runs mix format
run process_measurements --count=x --version=y - runs the code for version y with the measurement file that has x number of measurements. also verifies that the results are correct by comparing with baseline results
run create_measurements <count> - creates a measurements file with specified count
run create_measurements.profile <count> - creates measurements file with profiling
run create_baseline_results <count> - generates baseline results with a known correct way
run process_measurements.repeat --count=x --version=y - runs process_measurements 5 times with given count and version
run_with_cpu_profiling <command> - runs a command with CPU profiling
run livebook.setup - sets up Livebook
run livebook.server - starts Livebook server
run livebook - starts both iex and Livebook server concurrently
run process_measurements.profile.eprof --count=x --version=y - profiles with eprof
run process_measurements.profile.cprof --count=x --version=y - profiles with cprof
run process_measurements.profile.eflambe --count=x --version=y - profiles with eflambe
run process_measurements.profile.benchee - runs script for using benchee to compare different versions
run all_versions --count=x - runs all versions with x measurements

Writing measurements to a file

The goal here is to create a file with one billion rows, each row containing a weather station name and a temperature.
There are baseline temperatures for each station given, and the temperature for each row is the baseline temperature for that station plus a random number between -10 and 10.
The file is a text file with each line of the format station_name;temperature.
The fastest solution I've implemented so far creates the file with 1 billion measurements in 205 seconds. The code is at lib/one_brc/measurements_generator.ex.

Commands for creating measurements:

run create_measurements 1000 - creates a file ./data/measurements.1000.txt with 1000 measurements. You can change 1000 to the number of measurements you want.
run create_measurements.profile 1000 - creates a file ./data/measurements.1000.txt with 1000 measurements and profiles the execution using eprof.
Check ./bin/run to see what the above commands do.

Performance Traces of different versions

I've used eFlambe to get performance traces of different versions. Following are the links to the traces, viewed in Speedoscope:
Speedoscope: Version 1, 1000 measurements, Link to bggg file
Speedoscope: Version 2, 1000 measurements, Link to bggg file
Speedoscope: Version 3, 1000 measurements, Link to bggg file
Speedoscope: Version 4, 1000 measurements, Link to bggg file
Speedoscope: Version 5, 1000 measurements, Link to bggg file
Speedoscope: Version 6, 1000 measurements, Link to bggg file
Speedoscope: Version 7, 1000 measurements, Link to bggg file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The One Billion Row Challenge (in Elixir)

Setting up this repo

Available commands

Writing measurements to a file

Commands for creating measurements:

Performance Traces of different versions

Files

README.md

Latest commit

History

README.md

File metadata and controls

The One Billion Row Challenge (in Elixir)

Setting up this repo

Available commands

Writing measurements to a file

Commands for creating measurements:

Performance Traces of different versions