Performance regresssion with (&'static str, &'static str) label set. #248

MrCroxx · 2024-11-26T09:02:45Z

Hi, thanks for building the Prometheus client in Rust. Really love it.

When doing the benchmark, I added another test case with (&'static str, &'static str) label set as follows:

    c.bench_function(
        "counter family with (&'static str, &'static str) label set",
        |b| {
            let family = Family::<(&'static str, &'static str), Counter>::default();

            b.iter(|| {
                family.get_or_create(&("method", "GET")).inc();
            })
        },
    );

Considering there should be no dynamic allocation when accessing the metric with (&'static str, &'static str) label set, I suppose the performance would be the same as the custom type label set.

But the benchmark turned out oppositely. (27ns vs 15ns on avg)

     Running benches/family.rs (target/release/deps/family-f5815167dcf70f4a)
Gnuplot not found, using plotters backend
counter family with [(&'static str, &'static str)] label set
                        time:   [41.655 ns 42.005 ns 42.483 ns]
                        change: [-1.0231% +1.7906% +7.1726%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

counter family with Vec<(&'static str, &'static str)> label set
                        time:   [62.579 ns 63.896 ns 65.827 ns]
                        change: [-1.8609% +0.9648% +4.0991%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

counter family with (&'static str, &'static str) label set
                        time:   [21.716 ns 21.771 ns 21.833 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

counter family with Vec<(String, String)> label set
                        time:   [204.69 ns 208.56 ns 214.56 ns]
                        change: [-5.1373% -3.8646% -2.4135%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

counter family with custom type label set
                        time:   [14.648 ns 15.786 ns 17.877 ns]
                        change: [-8.7872% -2.4706% +6.7788%] (p = 0.61 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

I'm not sure where the regression comes from. Is it as expected or did I miss something?

Thanks again for your help. 🙏

The text was updated successfully, but these errors were encountered:

MrCroxx · 2024-11-26T09:05:35Z

~~Does the overhead come from constructing the tuple? (e.g. &("method", "GET"))~~

~~IIUC, "method" and "GET" are &'static str, which does not need allocation, but the tuple is not?~~

mxinden · 2024-11-28T19:56:09Z

Appreciate your curiosity!

On a high level, I don't think it matters for a user whether an operation takes 40 or 20 nano seconds here.

That said, indeed, this is counterintuitive. But I am not able to reproduce it:

     Running benches/family.rs (/home/mxinden/code/github.com/prometheus/client_rust/target/release/deps/family-95f4365191b2b975)
Gnuplot not found, using plotters backend
counter family with [(&'static str, &'static str)] label set
                        time:   [57.968 ns 57.987 ns 58.008 ns]
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild
  5 (5.00%) high mild
  8 (8.00%) high severe

counter family with (&'static str, &'static str) label set
                        time:   [34.901 ns 34.903 ns 34.905 ns]
Found 15 outliers among 100 measurements (15.00%)
  13 (13.00%) high mild
  2 (2.00%) high severe

Are you able to reproduce (&'static str, &'static str) being slower on consecutive runs?

MrCroxx · 2024-12-03T03:17:28Z

Are you able to reproduce (&'static str, &'static str) being slower on consecutive runs?

Yep. Unfortunately, it is a stable regression with my machine. I'm running the benchmark with cargo bench on my MacBook Air with M2 chip. Let me try it on my linux server to see if the regression still exists.

MrCroxx · 2024-12-03T03:22:50Z

On a high level, I don't think it matters for a user whether an operation takes 40 or 20 nano seconds here.

I agree that 20+ ns vs 15+ ns looks close digitally, but the overhead is 33% more in ratio. 🥹

MrCroxx · 2024-12-03T03:24:01Z

BTW, can I open a PR to add the test case?

mxinden · 2024-12-03T08:48:49Z

Maybe use https://play.rust-lang.org/ to take a look at the ASM and LLVM IR code. The reason might be obvious from there?

BTW, can I open a PR to add the test case?

I don't think the test itself, without an explanation for the regression would be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regresssion with (&'static str, &'static str) label set. #248

Performance regresssion with (&'static str, &'static str) label set. #248

MrCroxx commented Nov 26, 2024 •

edited

Loading

MrCroxx commented Nov 26, 2024 •

edited

Loading

mxinden commented Nov 28, 2024

MrCroxx commented Dec 3, 2024 •

edited

Loading

MrCroxx commented Dec 3, 2024

MrCroxx commented Dec 3, 2024

mxinden commented Dec 3, 2024

Performance regresssion with (&'static str, &'static str) label set. #248

Performance regresssion with (&'static str, &'static str) label set. #248

Comments

MrCroxx commented Nov 26, 2024 • edited Loading

MrCroxx commented Nov 26, 2024 • edited Loading

mxinden commented Nov 28, 2024

MrCroxx commented Dec 3, 2024 • edited Loading

MrCroxx commented Dec 3, 2024

MrCroxx commented Dec 3, 2024

mxinden commented Dec 3, 2024

MrCroxx commented Nov 26, 2024 •

edited

Loading

MrCroxx commented Nov 26, 2024 •

edited

Loading

MrCroxx commented Dec 3, 2024 •

edited

Loading