Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stack overflow for large projects #1484

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Skipants
Copy link

@Skipants Skipants commented Feb 27, 2023

Description

When large projects have a lot of missing objects in the first pass of parsing Ruby files we would run into stack overflows. This happens because, when there was a missing object in a parsed Ruby file, we would recursively call #parse_remaining_files. When this happens a lot the stack would get huge.

We fix this by instead keeping a list of files that we want to retry and re-parse them in another pass. When we can no longer resolve any more files we break the loop.

Fixes #1375

Completed Tasks

  • I have read the Contributing Guide.
  • The pull request is complete (implemented / written).
  • Git commits have been cleaned up (squash WIP / revert commits).
  • I wrote tests and ran bundle exec rake locally (if code is attached to PR).

It's difficult to write a test for this because it depended on a lot of files until it would break.

I tested this by running yard doc on my company's internal monolith and having it working, whereas the current release of yard would SystemStackError

parser.parse_remaining_files
retries += 1
if globals.ordered_parser
retryable_file = parser.file == "(stdin)" ? StringIO.new("void Init_Foo() { #{statement.source} }") : parser.file
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive my sin here. I wrote it this way because of tests that fail when we only try and use parser.file like in lib/yard/handlers/base.rb.

An example test that fails is

it "resolves namespace variable names across multiple files" do

Here's what happens/reasoning behind this weird hack:

  1. Foo::Bar is not resolved and is instead retried in the first string of that test
  2. Because this comes from stdin and not a file, parser.file is (stdin).
  3. If we push the string (stdin) onto the globals.ordered_parser.files_to_retry it's not able to be parsed by OrderedParser#parse. It should instead be a StringIO with the C code contents.
  4. The C code contents from statement.source are missing the wrapper void Init_Foo() { ... } and needs to be manually added.

An alternative to this was to save the contents in globals but that felt hacky as well for a couple reasons:

  1. I was afraid pushing contents of files on globals could result in runaway mem usage (though now that I think about it, each new file probably clobbers the last one)
  2. This is only useful in the niche case of C code from STDIN, so having a global for that felt like overkill and potentially confusing code

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is definitely not mergeable with a hack like this, especially one that artificially causes a failing test to pass targeted specifically for that test.

Notably, (stdin) parsing is entirely common and not specific to C. What you're suggesting here is this PR does not support this use case. That would be a problem and highlights a possible breaking change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent feedback, thank you. I'll let this one simmer a bit and see if I can come up with a less-hacky patch.

Copy link
Owner

@lsegal lsegal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to get around to this so late. To be honest, the use of the specific workaround to C parsing is definitely concerning and makes me wonder if this will cause a breaking change, and thus I have not looked too deeply at merging.

Unrolling the recursive loop might be useful here, but it would have to be done in a way that respects the existing order of operations and API capabilities. Supporting StringIOs is definitely one of those API capabilities.

I think this PR would need another pass to make it compatible.

parser.parse_remaining_files
retries += 1
if globals.ordered_parser
retryable_file = parser.file == "(stdin)" ? StringIO.new("void Init_Foo() { #{statement.source} }") : parser.file
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is definitely not mergeable with a hack like this, especially one that artificially causes a failing test to pass targeted specifically for that test.

Notably, (stdin) parsing is entirely common and not specific to C. What you're suggesting here is this PR does not support this use case. That would be a problem and highlights a possible breaking change.

When large projects have a lot of missing objects in the first pass of parsing Ruby files we would run into stack overflows. This happens because, when there was a missing object in a parsed Ruby file, we would recursively call `#parse_remaining_files`. When this happens a lot the stack would get huge.

We fix this by instead keeping a list of files that we want to retry and re-parse them in another pass. When we can no longer resolve any more files we break the loop.
@Skipants Skipants force-pushed the fix-stack-overflow branch from ec5be9e to 8a14e13 Compare July 26, 2023 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File traversal is recursive, which could result in SystemStackError
2 participants