Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace awk for perl in the bash history widget #3313

Merged
merged 4 commits into from
Sep 22, 2023

Conversation

step-
Copy link
Contributor

@step- step- commented May 30, 2023

While awk is POSIX, perl isn't pre-installed on all *nix flavors. This commit replaces perl with awk to eliminate the dependency on perl.

Related: #3295, #3309, #3310.

Test suite passed:

  • make error all test sections 'PASS'
  • make docker-test 215 runs, 1884 assertions, 0 failures, 0 errors, 0 skips.

Manually tested in the following environments:

  • Linux amd64 with bash 3.2, 4.4, 5.2; gawk -P, one true awk, mawk, busybox awk.
  • macOS Catalina, bash 3.2, macOS awk 20070501.

Performance comparison:

Mawk turned out the fastest, then perl.
One true awk's implementation should be the closest to macOS awk.
Test data: 230 KB history, 15102 entries, including multi-line and duplicates.
Linux, bash 4.4. Times in milliseconds.

Command Mean Min Max Relative
mawk 1.3.4 22.9 22.3 25.6 1.00
perl 5.26.1 34.3 33.6 35.1 1.49
one true awk 20221215 41.9 40.6 46.3 1.83
gawk 5.1.0 46.1 44.4 50.3 2.01
busybox awk 1.27.0 64.8 63.2 70.0 2.82

Other Notes

A bug affects bash, which fails restoring a saved multi-line history entry as a single entry. Bug fixed in version 5.0.1

While developing this PR I discovered two unsubmitted issues affecting the current perl script. The output stream ends with $'\n\0000' instead of $'\0000'. Because of this, the script does not deduplicate a duplicated entry located at the end of the history list; therefore fzf displays two identical (not necessarily adjacent) entries. A minor point about the first issue is that the top fzf entry ends with a dangling line feed symbol, which is visible in the terminal.

Footnotes

  1. https://github.com/bminor/bash/blob/ec8113b9861375e4e17b3307372569d429dec814/CHANGES#L1511
    To enable: shopt -s cmdhist lithist; HISTTIMEFORMAT='%F %T '.

While awk is POSIX, perl isn't pre-installed on all *nix flavors. This
commit replaces perl with awk to eliminate the dependency on perl.

Related: junegunn#3295, junegunn#3309, junegunn#3310.

Test suite passed:
* `make error` all test sections 'PASS'
* `make docker-test` 215 runs, 1884 assertions, 0 failures, 0 errors, 0 skips.

Manually tested in the following environments:
* Linux amd64 with bash 3.2, 4.4, 5.2; gawk -P, one true awk, mawk, busybox awk.
* macOS Catalina, bash 3.2, macOS awk 20070501.

**Performance comparison:**

Mawk turned out the fastest, then perl.
One true awk's implementation should be the closest to macOS awk.
Test data: 230 KB history, 15102 entries, including multi-line and duplicates.
Linux, bash 4.4. Times in milliseconds.

| Command                 | Mean | Min  | Max  | Relative |
| :---                    | ---: | ---: | ---: | -------: |
| `mawk 1.3.4`            | 22.9 | 22.3 | 25.6 | **1.00** |
| `perl 5.26.1`           | 34.3 | 33.6 | 35.1 |   1.49   |
| `one true awk 20221215` | 41.9 | 40.6 | 46.3 |   1.83   |
| `gawk 5.1.0`            | 46.1 | 44.4 | 50.3 |   2.01   |
| `busybox awk 1.27.0`    | 64.8 | 63.2 | 70.0 |   2.82   |

**Other Notes**

A bug affects bash, which fails restoring a saved multi-line history entry as a single entry. Bug fixed in version 5.0.[^1]

While developing this PR I discovered two unsubmitted issues affecting the current perl script. The output stream ends with `$'\n\0000'` instead of `$'\0000'`. Because of this, the script does not deduplicate a duplicated entry located at the end of the history list; therefore fzf displays two identical (not necessarily adjacent) entries. A minor point about the first issue is that the top fzf entry ends with a dangling line feed symbol, which is visible in the terminal.

[^1]: https://github.com/bminor/bash/blob/ec8113b9861375e4e17b3307372569d429dec814/CHANGES#L1511
  To enable: `shopt -s cmdhist lithist; HISTTIMEFORMAT='%F %T '`.
@step-
Copy link
Contributor Author

step- commented May 30, 2023

Alright, take 3... action!

@junegunn
Copy link
Owner

Do you get the same number of entries after de-duplication when you switch perl to awk? Because I'm getting a vastly different result.

master

image

awk

image

Okay, I added --cycle to $FZF_CTRL_R_OPTS and looked at the earliest entries and it seems that the awk version is missing earlier ones.

master

image

awk

image

@step-
Copy link
Contributor Author

step- commented May 31, 2023

Do you get the same number of entries after de-duplication when you switch perl to awk? Because I'm getting a vastly different result.

I do get the same number of entries. Below, left is perl, right is awk:

fzf-20230531

Okay, I added --cycle to $FZF_CTRL_R_OPTS and looked at the earliest entries and it seems that the awk version is missing earlier ones.

Here both versions show the same set of numbers and entries.

Your bash version? MacOS? I'm testing on Linux mainly using bash 4.4 but after your comment I tested again with bash 5.2.

What's your history setup in .bashrc? Mine is

shopt -sq histverify histreedit
shopt -s cmdhist lithist
HISTTIMEFORMAT='%F %T '
HISTSIZE=9999999999 HISTFILESIZE=9999999999

At the moment, I don't have a hypothesis about what's going on. Do you?
I do see that your history size is 10x mine, but the awk script is only buffering the current record, so size shouldn't be an issue. Perhaps your character set? The version of macOS awk that I'm using is older than yours, and shows many limitations, maybe charset is one of them.

Anyway, in summary, the two scripts appear to deduplicate differently, so I will scrutinize and stress that part with a larger data set.


Earlier I wrote:

I do see that your history size is 10x mine, but the awk script is only buffering the current record, so size shouldn't be an issue.

I'll take that back. The awk script is also storing all unique records as array keys! That could conceivably hit a size limit. I'm going to look at One True Awk's source code...


No, array keys don't seem to be a limiting factor. MacOS awk can create an array of 10 million 32-byte keys without errors.


Later...

I still can't reproduce the difference you're seeing. I have tested with 300K entries.

fzf bottom: latest entries - perl (left) - one true awk (right)

fzf-20230531-002

fzf top: earliest entries - perl (left) - one true awk (right)

fzf-20230531-001

There's only one difference: entry number 1. That's expected as explained in the Other Notes section of my message

@junegunn
Copy link
Owner

junegunn commented Jun 1, 2023

Okay, here's what I've found.

I've noticed awk is printing an error message because of a multi-byte character in an entry it cannot process.

awk: towc: multibyte conversion failure on: '5ᆭ;TZ=*********************************************'

 input record number 74030, file
 source line number 1

(redacted the command)

And fzf only receives commands after the one with the error.

95533  �5�;TZ**********************************************
95534  TZ=******************************************************

95534 is the first command listed.

@step-
Copy link
Contributor Author

step- commented Jun 1, 2023

Good, at least now I've got a concrete case I can work on.

@step-
Copy link
Contributor Author

step- commented Jun 1, 2023

Google found several references to that error message. Please try this change; add

if command -v iconv > /dev/null; then iconv -c -t utf-8; else cat; fi  |

immediately after line 61 builtin fc -lnr -2147483648 so that the added line pipes into awk.

@junegunn
Copy link
Owner

junegunn commented Jun 1, 2023

Yep, that fixes the problem.

I hope this was faster though as the difference is quite noticeable.

master branch (with perl)

real    0m0.111s
user    0m0.150s
sys     0m0.045s

real    0m0.113s
user    0m0.151s
sys     0m0.049s

real    0m0.112s
user    0m0.154s
sys     0m0.048s

real    0m0.112s
user    0m0.154s
sys     0m0.047s

real    0m0.106s
user    0m0.147s
sys     0m0.045s

real    0m0.109s
user    0m0.150s
sys     0m0.047s

real    0m0.103s
user    0m0.143s
sys     0m0.044s

real    0m0.111s
user    0m0.151s
sys     0m0.047s

with awk

real    0m0.476s
user    0m0.443s
sys     0m0.186s

real    0m0.426s
user    0m0.464s
sys     0m0.113s

real    0m0.402s
user    0m0.444s
sys     0m0.111s

real    0m0.415s
user    0m0.442s
sys     0m0.125s

real    0m0.446s
user    0m0.445s
sys     0m0.153s

real    0m0.427s
user    0m0.448s
sys     0m0.131s

real    0m0.409s
user    0m0.444s
sys     0m0.118s

real    0m0.435s
user    0m0.440s
sys     0m0.146s

real    0m0.506s
user    0m0.448s
sys     0m0.209s

@step-
Copy link
Contributor Author

step- commented Jun 1, 2023

Yep, that fixes the problem.

Good. Thanks.

I hope this was faster though as the difference is quite noticeable.

Yes, it is. I wonder which is the bottleneck, iconv or awk. In my performance test on Linux, one true awk was "only" about 25% slower than perl.


As I mentioned, the net shows several reports of that same error message. This one1 in particular was very informative and led to the suggested change.

Footnotes

  1. https://github.com/xwmx/nb/issues/248, April 2023.

@step-
Copy link
Contributor Author

step- commented Jun 1, 2023

Updated benchmarks for iconv and large test data set. Here perl is 34--37 % faster than one true awk, on Linux.
Let me know if there's anything else I could do.

300,000 entries deduplicated down to 3413

Linux amd64, milliseconds, all awk tasks also include iconv.
Mawk task is the fastest. Perl task is 37% faster than One True Awk.

Command Mean Min Max Relative
perl 5.26.1 194.0 190.9 200.7 1.69
mawk 1.3.4 114.8 113.9 117.2 1.00
gawk 5.1.0 293.3 288.3 306.9 2.55
awk (one true) 20221215 266.0 263.8 271.5 2.32
busybox awk 1.27.0 511.3 502.8 526.0 4.45
Benchmark 1: perl 5.26.1
  Time (mean ± σ):     194.0 ms ±   2.4 ms    [User: 190.9 ms, System: 6.8 ms]
  Range (min … max):   190.9 ms … 200.7 ms    15 runs

Benchmark 2: mawk 1.3.4
  Time (mean ± σ):     114.8 ms ±   0.9 ms    [User: 125.5 ms, System: 10.1 ms]
  Range (min … max):   113.9 ms … 117.2 ms    25 runs

Benchmark 3: gawk 5.1.0
  Time (mean ± σ):     293.3 ms ±   5.2 ms    [User: 303.1 ms, System: 11.9 ms]
  Range (min … max):   288.3 ms … 306.9 ms    10 runs

Benchmark 4: awk (one true) 20221215
  Time (mean ± σ):     266.0 ms ±   2.1 ms    [User: 276.0 ms, System: 10.6 ms]
  Range (min … max):   263.8 ms … 271.5 ms    11 runs

Benchmark 5: busybox awk 1.27.0
  Time (mean ± σ):     511.3 ms ±   6.7 ms    [User: 519.5 ms, System: 16.6 ms]
  Range (min … max):   502.8 ms … 526.0 ms    10 runs

Summary
  'mawk 1.3.4' ran
    1.69 ± 0.02 times faster than 'perl 5.26.1'
    2.32 ± 0.03 times faster than 'awk (one true) 20221215'
    2.55 ± 0.05 times faster than 'gawk 5.1.0'
    4.45 ± 0.07 times faster than 'busybox awk 1.27.0'

300,000 unique entries (no deduplication)

Linux amd64, milliseconds, all awk tasks also include iconv.
Mawk task is the fastest. Perl task is 34% faster than One True Awk.

Command Mean Min Max Relative
perl 5.26.1 406.6 402.0 412.8 1.16
mawk 1.3.4 349.4 343.8 354.4 1.00
gawk 5.1.0 625.1 609.6 665.4 1.79
awk (one true) 20221215 540.1 534.0 545.0 1.55
busybox awk 1.27.0 909.0 896.8 928.6 2.60
Benchmark 1: perl 5.26.1
  Time (mean ± σ):     406.6 ms ±   3.1 ms    [User: 389.9 ms, System: 21.9 ms]
  Range (min … max):   402.0 ms … 412.8 ms    10 runs

Benchmark 2: mawk 1.3.4
  Time (mean ± σ):     349.4 ms ±   3.7 ms    [User: 359.0 ms, System: 21.5 ms]
  Range (min … max):   343.8 ms … 354.4 ms    10 runs

Benchmark 3: gawk 5.1.0
  Time (mean ± σ):     625.1 ms ±  18.0 ms    [User: 618.8 ms, System: 38.9 ms]
  Range (min … max):   609.6 ms … 665.4 ms    10 runs

Benchmark 4: awk (one true) 20221215
  Time (mean ± σ):     540.1 ms ±   3.5 ms    [User: 548.3 ms, System: 23.3 ms]
  Range (min … max):   534.0 ms … 545.0 ms    10 runs

Benchmark 5: busybox awk 1.27.0
  Time (mean ± σ):     909.0 ms ±   9.0 ms    [User: 892.4 ms, System: 54.0 ms]
  Range (min … max):   896.8 ms … 928.6 ms    10 runs

Summary
  'mawk 1.3.4' ran
    1.16 ± 0.01 times faster than 'perl 5.26.1'
    1.55 ± 0.02 times faster than 'awk (one true) 20221215'
    1.79 ± 0.05 times faster than 'gawk 5.1.0'
    2.60 ± 0.04 times faster than 'busybox awk 1.27.0'

@junegunn
Copy link
Owner

junegunn commented Jun 2, 2023

Thanks for the report. Unfortunately, as mentioned above, I'm observing a much larger performance difference on macOS (4 times slower; 100ms is barely noticeable, but 400ms is not).

So the problem is, Mac users who already have Perl installed on their system by default will benefit nothing from this change, but they'll only experience a noticeable lag every time they use CTRL-R binding. A small number of users using minimal systems will benefit from this, but the question is, can't they just install Perl if they really want this? Is it impossible?

Also, the code is a bit longer, which is a minus from the maintainer's standpoint.

Even if we're going to do this, we should keep the original Perl version.

if command -v perl > /dev/null; then
  __fzf_history__() {
    ...
  }
else
  ...
fi

@P1n3appl3
Copy link

P1n3appl3 commented Jun 2, 2023

Turning perl into an optional dependency like your example would be helpful for me. When I install fzf through my package manager it has to download a 300MB+ perl package even though I don't need it since my shell is zsh (and key-bindings.zsh already uses awk).

I currently work around this by removing the perl package manually, but it'd be nice if perl could be an optional dep for packaging's sake.

@step-
Copy link
Contributor Author

step- commented Jun 2, 2023

@junegunn, I understand your viewpoint: macOS users will not gain anything from this PR because perl is native in macOS, and it is noticeably faster than awk to assist fzf bash C-R.

I will add that you positively represent the segment of fzf users who will feel the pain: macOS user, bash¹ user, fast¹ machine, huge history file. You already have, and you will, notice a significant slow down every time you press C-R.
[¹: I think.]

Other segments, like bash users with a medium size history file, will probably not notice a difference, more so on Linux.
I'm one of the maintainers of the Fatdog64 Linux distribution, in which perl is not native but fzf is. Fzf is installed as a scripted search engine rather than for its interactive bash C-R. Indeed, right now fzf shell add-ons are not installed, due to their perl dependency. So I wrote this PR exactly for our use case.

If this PR won't be merged, I still can package it for my distribution and fix my use case. (I will have to deal with users who upgrade fzf from this repo - overwriting the patched script - but I guess those will be power users who already installed perl to begin with).

@jlebon
Copy link

jlebon commented Sep 8, 2023

Another approach (which might have been discussed in one of these threads) is to code this up in fzf directly. E.g. it could be a separate hidden option that gets piped into the user-facing fzf invocation. I.e. conceptually we'd have builtin fc | fzf --parse-history-stream | fzf. We'd probably get even better performance than Perl there.

@step-
Copy link
Contributor Author

step- commented Sep 13, 2023

Replied #3077 (comment)

@junegunn
Copy link
Owner

Another approach (which might have been discussed in one of these threads) is to code this up in fzf directly.

No. Making CTRL-R binding which is already pretty fast even faster by increasing the complexity of the core program can never be a goal. I'm suggesting we shouldn't make it noticeably slower.

Regardless of Perl or awk, the implementation of CTRL-R binding should be just a few lines of code. Why can't you just copy the code to a system that doesn't have Perl installed?

Having said that, I commented above that I'm willing to accept this version if we keep the original version and keep using it on systems that have Perl installed.

@step-
Copy link
Contributor Author

step- commented Sep 13, 2023

Having said that, I commented above that I'm willing to accept this version if we keep the original version and keep using it on systems that have Perl installed.

Oh, I had interpreted your comment differently, therefore froze this PR. I can certainly look into keeping the existing perl script and only invoking the proposed awk script if perl isn't installed.

- [bash] Disable pipefail in command substitution 1894304
- [shell] Use --scheme=path when appropriate 2bed7d3
@step-
Copy link
Contributor Author

step- commented Sep 21, 2023

@junegunn, I'm working to update this PR. Given that mawk is the fastest awk for this use case, and that mawk isn't POSIX, I'm thinking to change awk "$script" to ${FZF_AWK:-awk} "$script" to allow advanced user to set a preferred awk command. Would you accept this change, too? If so, should I also document environment variable FZF_AWK in this PR, and in which files?

@junegunn
Copy link
Owner

junegunn commented Sep 21, 2023

@step- Not a fan of adding configuration knobs. Why don't we just check if it's on the system when loading the script?

if command -v perl > /dev/null; then
  # Use the current version
else
  # __ prefix for private variables
  local __fzf_awk=awk
  command -v mawk > /dev/null && __fzf_awk=mawk
  # Use awk version referring to $__fzf_awk
fi

Is there any reason one would prefer awk over mawk when both are available?

@step-
Copy link
Contributor Author

step- commented Sep 21, 2023

I didn't realize that the load script sets some shell variables prefixed by __fzf. So, something like this could work.

if command -v perl > /dev/null; then
  __fzf_history__() {
  ...
  }
else # awk
  local __fzf_awk=awk
  command -v mawk > /dev/null && __fzf_awk=mawk
  __fzf_history__() {
  ...
  $__fzf_awk "$script"
  ...
  }
fi

Is there any reason one would prefer awk over mawk when both are available?

Yes, if their mawk is old, which I have often encountered in other projects, and they're unwilling to upgrade it for whatever reason. That's why I proposed a configuration variable ${FZF_AWK:-awk}, which allows changing the command without editing the script. I prefer mawk as an opt-in.

@junegunn
Copy link
Owner

Can we suggest alias awk=mawk?

@step-
Copy link
Contributor Author

step- commented Sep 21, 2023

Can we suggest alias awk=mawk?

For the whole shell session? No. mawk scripts and command line options aren't compatible with POSIX awk, busybox awk and gawk.

@junegunn
Copy link
Owner

junegunn commented Sep 21, 2023

I see. What about checking the version of mawk and see if it meets the criteria? If this is getting too complicated, I would just support awk only and tell the users to install Perl if they're concerned about the performance.

@step-
Copy link
Contributor Author

step- commented Sep 21, 2023

I checked mawk source files to see how it prints the version number. VERSION_STRING uses the standard x.y.x semantic versioning format, patch level and the rest don't matter.

version.c:
28-   
29-   /* print VERSION and exit */
30-   void
31:   print_version(FILE *fp)
32-   {
33-       fprintf(fp, VERSION_STRING, PATCH_BASE, PATCH_LEVEL, PATCH_STRING, DATE_STRING);
34-       fflush(fp);
# mawk --version
mawk 1.3.4 20230322
Copyright 2008-2022,2023, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       arc4random_stir/arc4random
regex-funcs:        internal

compiled limits:
sprintf buffer      8192
maximum-integer     9223372036854775808

The following command verifies that at least version 1.3.4 is installed

mawk --version | awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }'

All together:

if command -v perl > /dev/null; then
  __fzf_history__() {
  ...
  }
else # awk
  local __fzf_awk=awk

  # if available, mawk is faster
  command -v mawk > /dev/null &&    # at least version 1.3.4
    mawk --version | awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }' &&
    __fzf_awk=mawk

  __fzf_history__() {
  ...
  $__fzf_awk "$script"
  ...
  }
fi

@junegunn
Copy link
Owner

SGTM.

Since this involves running two extra commands during loading, we can do it later when the user first runs the awk version of CTRL-R.

 __fzf_history__() {
  if [[ -z $__fzf_awk ]]; then
    __fzf_awk=awk
    # Check for mawk
  fi
  ...
}

@step-
Copy link
Contributor Author

step- commented Sep 22, 2023

Good. Then I'll do the changes and push the update today.

Since this involves running two extra commands during loading, we can do it later ...

  if [[ -z $__fzf_awk ]]; then
    __fzf_awk=awk
  ...

This also allows a savvy user to preset __fzf_awk with any specific awk version they want. Bonus points!

Side note: I'm about to conclude a journey of pull requests on this topic that started back in the Spring. I feel like an airplane pilot who flies over the ocean mostly by himself then, for the final landing, proceeds to a safe landing in the watchful hands of the Control tower :)

@junegunn
Copy link
Owner

Thank you for your efforts and patience. I have installed mawk and tested the awk version and I can confirm that it's fast enough for me.

@step-
Copy link
Contributor Author

step- commented Sep 22, 2023

I have installed mawk and tested the awk version and I can confirm that it's fast enough for me.

Thank you.

@junegunn junegunn merged commit 9f7684f into junegunn:master Sep 22, 2023
@junegunn
Copy link
Owner

Merged, thanks again!

@calestyo
Copy link
Contributor

@step- Nice work, but is the version check:
https://github.com/step-/fzf/blob/341c13e99c8f3f371643c80ddb7c1b8217ded407/shell/key-bindings.bash#L76-L77
really necessary?

IMO that belongs into package management (i.e. the fzf package) depending on a recent enough version of mawk… and if someone installs/compiles manually from source it's up to him to read some README with the dependencies.

The dynamic check at runtime just costs an additional mawk process to be spawned for no good reason... and many such small performance penalties do pile up.

Apart from that, 1.3.4 is from 2009... I mean except for museums, is anyone even still shipping previous versions?

Looking at https://repology.org/project/mawk/versions, there seem only very few that have <= 1.3.3, most notably Debian buster (which is out of regular support and LTS ends 2024 (but it's version of fzf anyway wouldn't contain your code). Same for the *buntus listed there... their versions of fzf doesn't contain the new mawk code.

@step-
Copy link
Contributor Author

step- commented Sep 25, 2023

@step- Nice work,

@calestyo Thank you.

but is the version check: https://github.com/step-/fzf/blob/341c13e99c8f3f371643c80ddb7c1b8217ded407/shell/key-bindings.bash#L76-L77 really necessary?

Yes.

IMO that belongs into package management (i.e. the fzf package) depending on a recent enough version of mawk… and if someone installs/compiles manually from source it's up to him to read some README with the dependencies.

The dynamic check at runtime just costs an additional mawk process to be spawned for no good reason... and many such small performance penalties do pile up.

The dynamic check only happens once, when .bashrc loads.

Apart from that, 1.3.4 is from 2009... I mean except for museums, is anyone even still shipping previous versions?

My distro is. I don't know about all other distributions. I think there are too many to be able to tell.

Looking at https://repology.org/project/mawk/versions, there seem only very few that have <= 1.3.3, most notably Debian buster (which is out of regular support and LTS ends 2024 (but it's version of fzf anyway wouldn't contain your code). Same for the *buntus listed there... their versions of fzf doesn't contain the new mawk code.

@calestyo
Copy link
Contributor

calestyo commented Oct 5, 2023

@step- One thing I've just noted:

On Debian (mawk 1.3.4.20200120-3.1):

$ mawk --version
mawk: not an option: --version

Instead:

$ mawk 
Usage: mawk [Options] [Program] [file ...]

Program:
    The -f option value is the name of a file containing program text.
    If no -f option is given, a "--" ends option processing; the following
    parameters are the program text.

Options:
    -f program-file  Program  text is read from file instead of from the
                     command-line.  Multiple -f options are accepted.
    -F value         sets the field separator, FS, to value.
    -v var=value     assigns value to program variable var.
    --               unambiguous end of options.

    Implementation-specific options are prefixed with "-W".  They can be
    abbreviated:

    -W version       show version information and exit.
    -W dump          show assembler-like listing of program and exit.
    -W help          show this message and exit.
    -W interactive   set unbuffered output, line-buffered input.
    -W exec file     use file as program as well as last option.
    -W random=number set initial random seed.
    -W sprintf=number adjust size of sprintf buffer.
    -W posix_space   do not consider "\n" a space.
    -W usage         show this message and exit.

with:

$ mawk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       srandom/random
regex-funcs:        internal
compiled limits:
sprintf buffer      8192
maximum-integer     2147483647

@calestyo
Copy link
Contributor

calestyo commented Oct 5, 2023

And it's apparently not enough to just 2>/dev/null ... the check gives always true.

@step-
Copy link
Contributor Author

step- commented Oct 5, 2023

@step- One thing I've just noted:

Good catch, thank you. This can be fixed, with some attention for details:

  1. replace --version with -W version, which is the original option syntax in mawk and should be supported by all mawk versions and builds
  2. expand the awk mini-script that tests the version to also consider the value of DATE_STRING, see my previous comment
  3. ensure not to exceed maximum-integer in the comparison, 9223372036854775808 vs. 2147483647, see the same link above

@junegunn do you want me to push a commit to this branch or do you prefer a new PR referencing this comment?
@junegunn please, what's the output of mawk -W version on your rig?
@calestyo for my information, which Debian and .deb are you reporting?
@calestyo after by-passing the mawk version check, does the overall perl-to-mawk replacement work on such Debian, can you please test?

And it's apparently not enough to just 2>/dev/null ... the check gives always true.

I'm sorry, in which context? I can't see a 2>/dev/null somewhere near a check.


calestyo reported:

On Debian (mawk 1.3.4.20200120-3.1):

$ mawk --version
mawk: not an option: --version
$ mawk -W version
mawk 1.3.4 20200120
Copyright 2008-2019,2020, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       srandom/random
regex-funcs:        internal
compiled limits:
sprintf buffer      8192
maximum-integer     2147483647

@junegunn
Copy link
Owner

junegunn commented Oct 5, 2023

@junegunn do you want me to push a commit to this branch or do you prefer a new PR referencing this comment?

A new one.

@junegunn please, what's the output of mawk -W version on your rig?

$ mawk -W version
mawk 1.3.4 20230808
Copyright 2008-2022,2023, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       arc4random_stir/arc4random
regex-funcs:        internal

compiled limits:
sprintf buffer      8192
maximum-integer     9223372036854775808


$ mawk --version
mawk 1.3.4 20230808
Copyright 2008-2022,2023, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       arc4random_stir/arc4random
regex-funcs:        internal

compiled limits:
sprintf buffer      8192
maximum-integer     9223372036854775808

@calestyo
Copy link
Contributor

calestyo commented Oct 5, 2023

Well, as said previously... I anyway think it's a wrong approach to do software version checks at runtime in such a case.

@calestyo for my information, which Debian and .deb are you reporting?

That was with Debian stable (i.e. 12, bookworm) and the package version is 1.3.4.20200120-3.1.

As of the version in Debian unstable (1.3.4.20230808-1) --version seems already supported:

$ mawk --version
mawk 1.3.4 20230808
Copyright 2008-2022,2023, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan

random-funcs:       arc4random_stir/arc4random
regex-funcs:        internal

compiled limits:
sprintf buffer      8192
maximum-integer     9223372036854775808

btw: If you adapt the code, can you make it so that at:

command -v mawk > /dev/null &&
mawk --version | # at least 1.3.4
awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }' &&
__fzf_awk=mawk

any empty or better said non-matching input also causes the test to fail?

The reason is that as part of #3458 I'm preparing a commit that would change the check about like so:

      command -v mawk > /dev/null &&
        command mawk --version |          # at least 1.3.4
          command awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }' &&
          __fzf_awk='command mawk'

That has the following consequences:

  • If mawk doesn't exist as program, but as function command -v mawk will still return mawk but the command mawk --version will give a failure.

That's btw also what I've meant with 2>/dev/null and silence the warning about the non-existent --version... the check would still select mawk, AFAICS, despite it's not clear whether the version is actually recent enough... or in my use case, whether mawk is even a program.

@calestyo
Copy link
Contributor

calestyo commented Oct 5, 2023

after by-passing the mawk version check, does the overall perl-to-mawk replacement work on such Debian, can you please test?

What exactly do you mean?

@step-
Copy link
Contributor Author

step- commented Oct 5, 2023

What exactly do you mean?

I mean I thought you could please confirm that on Debian stable (i.e. 12, bookworm) and package version 1.3.4.20200120-3.1 the mawk script works with fzf bash history. I don't have access to bookworm. If perl is installed you will need to comment it out temporarily.

As regards your request about command -v - and not to steal your thunder - what do you think if I took care of that concern directly in my PR for this piece of code? Here's a proof of concept:

when mawk doesn't exist:

bash -c "command mawk-not-exist -W version 2> /dev/null | awk 'NR==1{         } END{exit !(v>1)}'"; echo $?

1
when mawk exists:

bash -c "command mawk           -W version 2> /dev/null | awk 'NR==1{v=2; exit} END{exit !(v>1)}'"; echo $?

0

@calestyo
Copy link
Contributor

calestyo commented Oct 5, 2023

I mean I thought you could please confirm that on Debian stable (i.e. 12, bookworm) and package version 1.3.4.20200120-3.1 the mawk script works with fzf bash history. I don't have access to bookworm. If perl is installed you will need to comment it out temporarily.

AFAICS, it works (apart from printing the mawk: not an option: --version error.

Is there anything specific you wanted me to test with it? I tested now Ctrl-R from an empty readline, which gives the whole history in fzf, and from a readline that already contained a word, which gives the history with fuzzy selection on that word. Inserting the selected item into the readline also works.

btw, one further idea:
I always think we should try to avoid modifying the user's shell environment as good as we can.

Wouldn't it be possible to get rid of __fzf_awk? E.g. by first checking whether __fzf_history__() is already defined (declare -F …), if not, doing the mawk-version-check, and depending on the outcome, defining either one or the other version of the function (i.e. with awk or with mawk)?

As regards your request about command -v - and not to steal your thunder - what do you think if I took care of that concern directly in my PR for this piece of code? Here's a proof of concept:

That seems to work (I'm not really good in awk ... at all ^^). You may want to have a look at my #3462.

Though - and just for keeping the commits more focused on a specific change - I'd perhaps suggest that you only adapt the check, and my commit add the command ? But it's not really important for me, if you like to already include it in your commit.

step- added a commit to step-/fzf that referenced this pull request Oct 6, 2023
* Use the all-compatible mawk `-W version` option.
  junegunn#3313 (comment).
* Do not remap the history key if no dependent commands is installed
  (perl, awk or mawk in this order).
* Run the command and not a function consistently with junegunn#3462.

The version check bash code relies on the following mawk source code,
extracted from mawk 1.3.4 20230322.

```
version.c:
18-  #include "init.h"
19-  #include "patchlev.h"
20-
21:  #define	 VERSION_STRING	 \
22-    "mawk %d.%d%s %s\n\
23-  Copyright 2008-2022,2023, Thomas E. Dickey\n\
24-  Copyright 1991-1996,2014, Michael D. Brennan\n\n"
....
30-  void
31-  print_version(FILE *fp)
32-  {
33:      fprintf(fp, VERSION_STRING, PATCH_BASE, PATCH_LEVEL, PATCH_STRING, DATE_STRING);
34-      fflush(fp);
35-
36-  #define SHOW_RANDOM "random-funcs:"

patchlev.h:
13-  /*
14-   * $MawkId: patchlev.h,v 1.128 2023/03/23 00:23:57 tom Exp $
15-   */
16:  #define  PATCH_BASE	1
17-  #define  PATCH_LEVEL	3
18-  #define  PATCH_STRING	".4"
19-  #define  DATE_STRING    "20230322"
```
@step-
Copy link
Contributor Author

step- commented Oct 6, 2023

@calestyo thank you for confirming that the mawk script works on Debian stable (i.e. 12, bookworm) and package version 1.3.4.20200120-3.1, AFAYCS.

btw, one further idea: I always think we should try to avoid modifying the user's shell environment as good as we can. Wouldn't it be possible to get rid of __fzf_awk?

Hmm, no, not really. We want __fzf_awk also to enable an expert user to preset it and by-pass the tests altogether. Besides, other __fzf variables are already used so adding another one isn't concerning to me.

I opened #3463 with my changes. It also addresses, in a slightly different way, the concern of #3462 for this piece of code.

@calestyo
Copy link
Contributor

calestyo commented Oct 6, 2023

Hmm, no, not really. We want __fzf_awk also to enable an expert user to preset it and by-pass the tests altogether.

If someone wants really that deep control, shouldn't it be okay to assume that he can just override __fzf_history__() or simply edit the source file? Plus, there's anyway no way to tell it not to use perl and I'd strongly assume that on most systems (at least Linux) it's available, so it could anyway just be used to select the flavour of awk and this doesn't seem to me as if many people would want to do that, if the results are the same.

Besides, other __fzf variables are already used so adding another one isn't concerning to me.

Well sure, but the more we have the bigger the "pollution" gets. ;-) I mean in principle most of them should be simply in one __FZF_CONFIG associable array.

@junegunn
Copy link
Owner

junegunn commented Oct 7, 2023

I mean in principle most of them should be simply in one __FZF_CONFIG associable array.

Please note that associative arrays are not available on bash 3.

@calestyo
Copy link
Contributor

calestyo commented Oct 7, 2023

Please note that associative arrays are not available on bash 3.

sigh...

junegunn added a commit that referenced this pull request Oct 7, 2023
* Use the all-compatible mawk `-W version` option.
  #3313 (comment).
* Run the command and not a function consistently with #3462.

The version check bash code relies on the following mawk source code,
extracted from mawk 1.3.4 20230322.

```
version.c:
18-  #include "init.h"
19-  #include "patchlev.h"
20-
21:  #define	 VERSION_STRING	 \
22-    "mawk %d.%d%s %s\n\
23-  Copyright 2008-2022,2023, Thomas E. Dickey\n\
24-  Copyright 1991-1996,2014, Michael D. Brennan\n\n"
....
30-  void
31-  print_version(FILE *fp)
32-  {
33:      fprintf(fp, VERSION_STRING, PATCH_BASE, PATCH_LEVEL, PATCH_STRING, DATE_STRING);
34-      fflush(fp);
35-
36-  #define SHOW_RANDOM "random-funcs:"

patchlev.h:
13-  /*
14-   * $MawkId: patchlev.h,v 1.128 2023/03/23 00:23:57 tom Exp $
15-   */
16:  #define  PATCH_BASE	1
17-  #define  PATCH_LEVEL	3
18-  #define  PATCH_STRING	".4"
19-  #define  DATE_STRING    "20230322"
```

Co-authored-by: Junegunn Choi <[email protected]>
@glensc
Copy link

glensc commented Sep 28, 2024

if the speed is an issue? why not writte logic in C, or go/rust/...?

@step-
Copy link
Contributor Author

step- commented Sep 28, 2024

@glensc, TLDR; #3313 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants