Skip to content

Latest commit

 

History

History
60 lines (46 loc) · 6.11 KB

README.md

File metadata and controls

60 lines (46 loc) · 6.11 KB

The One Billion Row Challenge (in Elixir)

1BRC is a challenge, originally in Java to process a text file of weather station names and temperatures, and for each weather station, print out the minimum, mean, and maximum. It sounds simple, but the catch is that the file contains one billion rows. Fun!

1BRC Title Slide

A talk related to this was presented at Code Beam Europe 2024. The slides are available here.

Setting up this repo

  • This repo uses Nix flakes to manage dependencies. To get started, run direnv allow in the root of the repo to activate the Nix environment.
  • Execute run deps to install dependencies.
  • To create measurements file, execute run create_measurements 1_000_000. This will create a file data/measurements.1000000.txt with 1 million measurements.
  • To process the measurements, execute run process_measurements 1_000_000. This will process the file created in the previous step.
  • That's it, you can follow the codepaths, starting from bin/run to explore further ✌️

Available commands

  • run deps - installs mix dependencies
  • run iex - spawns an iex session
  • run format - runs mix format
  • run process_measurements --count=x --version=y - runs the code for version y with the measurement file that has x number of measurements. also verifies that the results are correct by comparing with baseline results
  • run create_measurements <count> - creates a measurements file with specified count
  • run create_measurements.profile <count> - creates measurements file with profiling
  • run create_baseline_results <count> - generates baseline results with a known correct way
  • run process_measurements.repeat --count=x --version=y - runs process_measurements 5 times with given count and version
  • run_with_cpu_profiling <command> - runs a command with CPU profiling
  • run livebook.setup - sets up Livebook
  • run livebook.server - starts Livebook server
  • run livebook - starts both iex and Livebook server concurrently
  • run process_measurements.profile.eprof --count=x --version=y - profiles with eprof
  • run process_measurements.profile.cprof --count=x --version=y - profiles with cprof
  • run process_measurements.profile.eflambe --count=x --version=y - profiles with eflambe
  • run process_measurements.profile.benchee - runs script for using benchee to compare different versions
  • run all_versions --count=x - runs all versions with x measurements

Writing measurements to a file

  • The goal here is to create a file with one billion rows, each row containing a weather station name and a temperature.
  • There are baseline temperatures for each station given, and the temperature for each row is the baseline temperature for that station plus a random number between -10 and 10.
  • The file is a text file with each line of the format station_name;temperature.
  • The fastest solution I've implemented so far creates the file with 1 billion measurements in 205 seconds. The code is at lib/one_brc/measurements_generator.ex.

Commands for creating measurements:

  • run create_measurements 1000 - creates a file ./data/measurements.1000.txt with 1000 measurements. You can change 1000 to the number of measurements you want.
  • run create_measurements.profile 1000 - creates a file ./data/measurements.1000.txt with 1000 measurements and profiles the execution using eprof.
  • Check ./bin/run to see what the above commands do.

Performance Traces of different versions