Yard rake task is missing require #1597

jasonkarns · 2024-12-20T21:27:12Z

Steps to reproduce

The yard rake task is not self-contained and is missing requires to yard itself.

add yard to rakefile:

require "yard/rake/yardoc_task"
YARD::Rake::YardocTask.new

verify yard rake task:

$ rake -T
# snip
rake yard                     # Generate YARD Documentation

run rake yard task

Actual Output

```
$ rake yard
rake aborted!
NameError: uninitialized constant YARD::CLI

          yardoc = YARD::CLI::Yardoc.new
                       ^^^^^

Tasks: TOP => yard
(See full trace by running task with --trace)
```

Expected Output

I expected roughly the same output as yard

It seems the yard task is missing requires.

Environment details:

OS: macos 15.2
Ruby version (ruby -v): ruby 3.1.6p260 (2024-05-29 revision a777087be6) [arm64-darwin23]
YARD version (yard -v): yard 0.9.37
Relevant software dependency/versions: rake, version 13.2.1

I have read the Contributing Guide.

The text was updated successfully, but these errors were encountered:

lsegal · 2024-12-20T22:30:11Z

This is by design. You should not be reaching into and requiring any files from the YARD package other than require 'yard' itself:

require 'yard'
YARD::Rake::YardocTask.new

as documented in the README.

jasonkarns · 2024-12-23T15:57:30Z

Understood. I saw that when reading up on install, but assumed it was the "simple, but malperformant" setup that naive rake users typically fall into.

It's common to see many gems document the "simple" setup that requires their entire lib in the rakefile, only to later discover the performance impact that has on shell completion (rake <tab>) or when gem A negatively impacts gem B's completely unrelated rake task (because gem A is being fully loaded even when not run).

The common convention is to have a rake task entrypoint that only defines the task, without requiring the library. (gems that do this include: rdoc, bundler, rspec, and rubocop)

(I'm overly stating things, just to be clear on expectations and assumptions.)

The way I ended up defining the task was:

require "yard/rake/yardoc_task"
YARD::Rake::YardocTask.new :yard do |t|
  t.before = -> { require "yard" }
end

Comparing the two approaches (the above vs: require "yard"; YARD::Rake::YardocTask.new)

yielded (collected with profile gem):

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  0.02     0.02      0.00        1     0.01    16.56  Object#full_yard
  0.04     0.02      0.00        2     0.00     0.21  Object#just_task

16ms isn't a lot, but it's a lot more than 0.2ms

Also tested with ruby-prof (Measure Mode: wall_time):

just_task total: 0.000478
full_yard total: 0.018249

Still about 2 orders of magnitude.

Would you consider a request to have an "official" rake entrypoint that didn't set up all of yard? (I realize it's only configuring autoloads and a few things, but it's quite a bit more than the tasklib needs.)

I would find this to also be helpful for consumers who look for patterns in Rakefiles and, seeing require "yard" could mistakenly conclude that require _gem_ is normal or good practice.

lsegal · 2024-12-23T17:53:39Z

Still about 2 orders of magnitude.

How are you generating this benchmark? This isn't something I can reproduce. Something looks very off about those numbers given that I've never seen a sub-ms require call. Are you sure your benchmark isn't requiring an already-required file? This would be the case if you ran a bench for require "yard"; require "yard/rake/yardoc_task" within the same process.

Here's what I see:

❯ ruby -rrake -e "t=Time.now;require 'yard/rake/yardoc_task';p Time.now - t"
0.0276671

❯ ruby -rrake -e "t=Time.now;require 'yard';p Time.now - t"
0.0639603

Running this a number of times is fairly consistent-- requiring the task file directly takes anywhere between 2.5ms - 4ms, while requiring yard fairly consistently is ~6.5ms. In the "worst case" performance it's about 2-2.5x faster, but keep in mind we're talking about differences of ~2-3ms in a use case that is absolutely not going to be a tight loop.

There is probably a risk of disk cache miss where loading YARD can take a little longer, but usually this only happens once and accounts for a very small total cost of your total runtime. If the concern is that YARD will slow down the running of other tasks, then this one time possible disk cache miss is unlikely to be the primary contributor-- an extra ~20ms every 100 or so runs doesn't seem significant for the benefit of significantly improved backwards compatibility guarantees.

tl;dr you're free to use the "unsupported" path of directly requiring into YARD if you really need to tightly optimize your Rakefile load time. I think the approach you're using is perfectly reasonable if the load time is that stark on your machine. I don't see enough of a performance gap in the general case to justify increasing YARD's maintenance surface area for the whole project, though.

jasonkarns mentioned this issue Dec 20, 2024

Ensure YardocTask requires yard itself #1598

Closed

5 tasks

lsegal added the Works as Intended label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yard rake task is missing require #1597

Yard rake task is missing require #1597

jasonkarns commented Dec 20, 2024

lsegal commented Dec 20, 2024

jasonkarns commented Dec 23, 2024 •

edited

Loading

lsegal commented Dec 23, 2024 •

edited

Loading

Yard rake task is missing require #1597

Yard rake task is missing require #1597

Comments

jasonkarns commented Dec 20, 2024

Steps to reproduce

Actual Output

Expected Output

Environment details:

lsegal commented Dec 20, 2024

jasonkarns commented Dec 23, 2024 • edited Loading

lsegal commented Dec 23, 2024 • edited Loading

jasonkarns commented Dec 23, 2024 •

edited

Loading

lsegal commented Dec 23, 2024 •

edited

Loading