-
-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve textDocument/definition ordering #2792
Conversation
2a0adc6
to
58cd479
Compare
5925607
to
aace084
Compare
PleaseUseCamelCaseNamingConvention (<- Translate by ChatGPT, I tried a lot) |
d57f537
to
444848c
Compare
444848c
to
7f7c592
Compare
--- @param a string[] | ||
--- @param b string[] | ||
--- @return number | ||
local function levenshteinDistance(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, your requirement is to find the length of the common prefix of two strings. Why did you use such a complex algorithm?
Also, considering the application scenario, there's a good chance that two strings might be entirely identical, so a fast handling branch for this case should be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not perfect, but is a lot of better than u1 < u2
.
I'm happy to optimise this more, but keep in mind results <= 2
, so there is little benefit in optimizing this too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, your requirement is to find the length of the common prefix of two strings. Why did you use such a complex algorithm?
This is a standard algorithm and implementation for finding the differences between two strings. I adapted it slightly to work on lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your requirement is to find the length of the common prefix of two strings
This just wakes me up 😮. Although this is a standard algorithm for finding diff between 2 strings, this is not the same as finding common prefix, and I think this algorithm is not suitable for the current situation. Lets me use some test codes:
local currentFile = "/path/to/some/project/current/file/directroy/test.lua"
local uris = {
"/path/to/some/project/current/file/directroy/test.lua",
"/path/to/some/project/current/file/directroy/file.lua",
"/path/to/some/project/current/file/dir2/test.lua",
"/path/to/some/project/current/file/other/file.lua",
"/path2/to/some/project/current/file/directroy/test.lua",
}
for i, uri in ipairs(uris) do
print(i, ("%.6f"):format(pathSimilarityRatio(currentFile, uri)), uri)
end
- this prints
1 0.000000 /path/to/some/project/current/file/directroy/test.lua
2 0.250000 /path/to/some/project/current/file/directroy/file.lua
3 0.250000 /path/to/some/project/current/file/dir2/test.lua
4 0.375000 /path/to/some/project/current/file/other/file.lua
5 0.250000 /path2/to/some/project/current/file/directroy/test.lua
1
has a distance of0.0
=> expected ✅- but
2
,3
,5
all have the same distance0.25
=> I think this is not we want? ❌ - the example
5
is the most problematic, it's difference starts with the root path, but still has the same score as2
/3
=> I don't think this is a desirable behavior either ❌ - definitely we prefer
2
more, because it has greater length of common prefixes
This is not about whether the algorithm is standard or not, it's just unsuitable for this situation. And if all we want is to find common prefix, we need not to use this algorithm with high complexity (just unnecessary, even though #results
is 95% time <= 2
).
One more edge case, when the distance score (or the common prefix length) is the same, I guess they should be ordered alphabetically? Otherwise they may be in random orders each time when we check for definitions. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common prefix isn't always suitable.
The aim wasn't to provide something that's perfect. It was to provide something that was an improvement over <
which isn't suitable in any way.
Please let's stop trying to find 0.0001% edge cases, it just isn't worth the effort.
Before the ordering for me was always 0% correct as it always favoured the builtin, and now it's 99% correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you carefully consider whether you want to prioritize files in the nearby directory or files with similar paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 😅 😅
The common prefix isn't always suitable.
Can you provide some examples where common prefix is not suitable?
By sorting with greatest common prefix length, it can solve the examples you provided in your initial description as well.
And if common prefix isn't always suitable, then I think the Levenshtein distance
isn't always suitable either. And I think it is worse than common prefix in a few major aspects:
- the code complexity for future maintenance
- the time complexity of the algorithm
- cannot handle the cases that I provided (this can be countered by providing examples that common prefix cannot handle)
As you have said, the aim wasn't to provide something that's perfect. I agree. But then when both common prefix and Levenshtein distance
are not perfect, why not use the simpler one instead of a complex one? This is just "using a sledgehammer to crack a nut" (殺雞用牛刀). And even then the nut is not cracked perfectly 😂 .
Please let's stop trying to find 0.0001% edge cases, it just isn't worth the effort.
I don't agree that edge case I mentioned is a 0.0001%
edge case. It's very often that, a value's definition is just one folder level above. The is related to the completeness of the logic. For any comparator function that are sorting records, most of the time we want to compare those major columns, and fallback to compare the id
field as a last resort. In this case it is the u1 < u2
=> this fallback comparison is always deterministic.
simularity_cache[u1] = simularity_cache[u1] or pathSimilarityRatio(uri, u1)
simularity_cache[u2] = simularity_cache[u2] or pathSimilarityRatio(uri, u2)
if simularity_cache[u1] ~= simularity_cache[u2] then
return simularity_cache[u1] < simularity_cache[u2]
end
return u1 < u2
Of course I'm no maintainer and you may just ignore my suggestion if you disagree with me. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use the simpler one instead of a complex one
Time. Which I have no more of for this pull request.
Spending more time on this, is unlikely to result in observable better functionality.
I'll just fix this on the client side.
I don't have time to test, but I believe that this pull request has been completed. EDIT: |
While working on another PR, I mistakenly merged this PR. I'm now attempting to rebuild this PR. |
It looks like there's no need to attempt to rebuild the PR. Please note that this PR was actually rejected. |
Would you accept a version of this that simply adds a score+boost for any entry in results that matches uri, to ensure a local definition always comes first while otherwise preserving the current ordering? I think this would solve the main issue/use case and the implementation would be simpler |
That wouldn't work for definitions in different files. The main problem is the builtins getting ordered first. |
I see. How about flipping it around and add a penalty for the builtins to boost project local definitions in general? |
E.g. if the current document has path
c/file.lua
and the results have paths:c/file.lua
a/file.lua
Prefer
c/file.lua
since that is most similar to the current document.Real world example used these paths:
From
/Users/lewis/projects/neovim/runtime/lua/vim/lsp/handlers.lua
with results (in order):/Users/lewis/.cache/lua-language-server/meta/LuaJIT en-us utf8/builtin.lua
/Users/lewis/projects/neovim/runtime/lua/vim/lsp/handlers.lua