-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lint for unused input parameters #12724
base: dev
Are you sure you want to change the base?
Lint for unused input parameters #12724
Conversation
So easy? Super cool, this will help a lot! I was thinking we need to go through cheetah and hook into the templates but if a simple grep works - awesome! |
Yes, but can be improved. I started of with a regexp ( |
fd35f7e
to
34be7a6
Compare
301e6ba
to
5be1139
Compare
I guess the validator should be sufficient, but maybe having this additional if block can't hurt will make the linter happy for galaxyproject/galaxy#12724 alternatively we could extend the linter such that parameters that do not appear on the command line bu have a (expression) validator? are OK??
I guess the validator should be sufficient, but maybe having this additional if block can't hurt will make the linter happy for galaxyproject/galaxy#12724 alternatively we could extend the linter such that parameters that do not appear on the command line bu have a (expression) validator? are OK??
Spend some time to improve this: Instead of a simple regex search for the parameter name the code searches now in the Compiled template. The starting motivation was to avoid the problem of placeholders listed in comments (and removing comments seems not trivial), but also very short variable names (eg single letter) had false negatives. The main advantage for searching in the compiled templates is that cheetah adds some code ( I developed a series of regular expressions for discovering all syntax that I could think of. Unfortunatelly some cheetah expressions require quite "unspecific" (a bit less specific than I would find really good). So at the moment most of the specific ones are commented (if covered by the less specific ones). I also checked the tool repos for which I had local clones and opened issues:
For completeness: found none in (W4M)[https://github.com/workflow4metabolomics/tools-metabolomics.git] |
eeca2f2
to
ee0e5c4
Compare
Will this work for tools that use galaxy.json or the configfile to dump parameters ( |
continue | ||
|
||
if change_format: | ||
lint_ctx.error(f"Param input [{param_name}] is only used in a change_format tag", line=param.sourceline, xpath=tool_xml.getpath(param)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be an error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding the change_format
tag tells the tool: depending on the parameter param_name
the generated command line creates output of a different format .. which is then set for the output dataset.
I can't think of a scenario in which the generated command line would create output of a different format without using the parameter param_name
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will make this a warning. Found a use case https://github.com/ARTbio/tools-artbio/blob/667afd5a5408321a31cdebab75b378e694deab07/tools/justgzip/gzip.xml#L20 .. even if this might be better over there: ARTbio/tools-artbio#580
if change_format: | ||
lint_ctx.error(f"Param input [{param_name}] is only used in a change_format tag", line=param.sourceline, xpath=tool_xml.getpath(param)) | ||
else: | ||
lint_ctx.error(f"Param input [{param_name}] not found in command or configfiles.", line=param.sourceline, xpath=tool_xml.getpath(param)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be an info or maybe a warning, I think there are legitimate cases where that's desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can maybe use an action to annotate the source in a PR so this doesn't get lost completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion this is clearly an error .. a really bad one :). There is a parameter in the tool form that is supposed modify the behavior of the tool in a certain way. In the cases that are covered here this is not the case and users will become undesired results.
I manually checked all cases in the linked issues and found only a single case where I had to think twice if this might be OK (this was a boolean to agree on the license .. but it had no effect .. so also an error in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can maybe use an action to annotate the source in a PR so this doesn't get lost completely
This would be a good thing anyway. Do you have any specific ideas or inspiration how this could be implemented.
Yes. The
This could be even stricter: actually we might/should check also if the names of all configfiles appears in the command (or other configfiles). Let me just add this to the new configfile linter :) The galaxy/lib/galaxy/tool_util/linters/inputs.py Line 394 in ee0e5c4
We do the following checks:
|
0918e57
to
4d9d054
Compare
How can input parameters be used via |
@@ -1,5 +1,56 @@ | |||
import re | |||
|
|||
from Cheetah.Compiler import Compiler | |||
from Cheetah.Template import Template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/galaxyproject/galaxy/blob/dev/packages/tool_util/setup.cfg#L34 this needs to be updated to require Cheetah3
or galaxy-util[template]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint @jmchilton added Cheetah3
if change_format: | ||
lint_ctx.error(f"Param input [{param_name}] is only used in a change_format tag", node=param) | ||
else: | ||
lint_ctx.error(f"Param input [{param_name}] not found in command or configfiles.", node=param) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error should only be used if the tool is broken with that change IMO. I don't think unused variables really indicate a tool is broken. I would make these warns or infos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular the last case (the else
branch) is clearly an error in my opinion. The parameter is used nowhere in the tool, i.e. the functionality of this parameter is broken (even if the tool still runs...). Just assume it would be the light switch in a car .. if it does not switch the lights on/off it's broken even if the car is still driving, or?
For the if
branch I just can not imagine how the tool may produce a different format without using the parameter anywhere else except for the change_format
tag.
But I'm fine with downgrading errors and warnings to warnings and infos (resp) if its for the possibility of FP due to getVar
and varExists
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For getVar
and varExists
we could just bail out like for the case when a param file is present
I would love to see this applied to tools-iuc and see if there are any false negatives and if it catches in any actual positives. I also wonder if just compiling the Cheetah is actually going to be the biggest win here - does you get_code throw an exception on invalid cheetah and can we make that a linting we do also? |
Already did this in January, see #12724 (comment) many have been fixed already. Have not found a single FP so far (except for copy paste mistakes .. because I did the check semi automatic .. ). But this is only because tool developers do not use FN are more difficult to answer.
That is an excellent idea. Maybe at least some errors can be caught, e.g. syntax errors. But some errors will only become apparent during execution. Lets see if I can add a test. |
) | ||
if action: | ||
lint_ctx.warn( | ||
f"Param input [{param_name}] {only}found in output actions, better use extended metadata.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was expecting that you "complain" about this one .. :)
considering command, configfiles, filters, ... add linting for configfiles
for finding variables
by searching in compiled cheetah templates
Also note that this is implemented in a separate linter which can be enabled/disabled completely via planemo |
4d9d054
to
81d2ec4
Compare
b2949d8
to
d9655f5
Compare
So I did it again:
For my own reference planemo ci_find_repos --exclude packages --exclude deprecated --exclude_from .tt_skip --output repository_list.txt
while read line
do
>&2 echo $line; planemo lint $line --skip help,tests,inputs,output,general,command,citations,tool_xsd,xml_order,stdio
done < repository_list.txt > lint.tx
cat lint.txt | grep -v Applying | egrep "ERROR|WARNING" -B 1 | grep -v '\-\-'| sed 's@Linting tool /home/berntm/projects/tools-iuc/@- [ ] @; s/^\.\./ - [ ]/' |
This is first attempt to lint for input parameters that are not used:
command
orconfigfile
output/filter
/outputs//change_format/when
Edit: The following text is outdated:
Basic idea is to grep for
"[^\w_]" + param_name + r"([^\w_]|$)"
, i.e. the placeholder delimited by something that can't belong to the placeholder.Certainly this is very (maybe to) simplistic .. but it seems to work quite well :) .. But certainly it might be useful to check via some cheetah python lib if parameters are used...
I also experimented with stricter regular expressions (i.e. more specific delimiters), but there are to many cases to consider. In config files, for instance, the placeholders could be joined by any (non-placeholder) character anyway.
It might help
${param}
syntax (instead of just$param
) in more casesStill there are cases like:
${ ",".join( [ str( platform.platform_entry ) ] ) }
$getVar($sect + "excl_ival_type.excl_ival_type_sel")
TODO:
code
variable for linting@token@
)Additionally: added linting of configfiles
xref galaxyproject/tools-iuc#4085
TODO:
How to test the changes?
(Select all options that apply)
And manually ran it on the IUC repo.
License