-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to exclude test cases #26
Conversation
Note: I took a look at the code coverage report, and it seems like the coverage primarily decreased because of code like this:
For some reason the code coverage report only considers lines 573 and 577 (with the closure) to be hit, not the other ones. I don't think this change meaningfully reduces code coverage. |
Hi Allan, thank you very much for this pull request. This is awesome. :) I nearly thought that the algorithm would be to complicated for someone to comprehend. I haven't thought about how to add negative test cases myself so far, so I'm curious to see how you have accomplished that. I will carefully examine your changes in the next days and will then give you more detailed feedback. Can you perhaps extend the command-line interface so that negative test cases can alternatively be directly passed to the command-line, instead of being forced to put them in a file first? I think this will be more convenient for users with short lists of test cases. Again, thanks a lot. 👍 |
Unfortunately I don't have a better reference than this stack overflow answer off hand, but the algorithm is essentially as listed, implemented in a recursive-descent fashion. The two DFAs are combined into a new DFA that contains the Cartesian product of the two input states, and accepts for any state the "positive" test case DFA would accept for, unless that is also an accept state for the "negative" test case DFA.
Sure thing. Happy to respond to feedback, etc.
Actually I'm not sure about how to do this, because the current command line interface takes positive test cases as a vector. For negative test cases, either there would need to be some in-band signaling (e.g. prefix negative test cases with Exposing Another option is to have a flag for Thoughts? |
Pushed a new version. It resolves the test case issue noted before, by merging graphemes that have repetition in the DFA during a separate pass in the minimization step, instead of attempting to do it during construction. You'll note that two other test cases had to be changed, but that in both cases they accept on equivalent strings. I also added two new property tests to check "not matching other strings" with more restrictive alphabets to hopefully catch similar regressions. |
This is pretty easy, actually. I would do it like this: #[structopt(
name = "exclusions",
value_name = "INPUT",
long,
conflicts_with = "exclusions-file",
help = "One or more test cases separated by blank space"
)]
exclusions: Vec<String>,
#[structopt(
name = "exclusions-file",
value_name = "FILE",
long,
parse(from_os_str),
help = "Reads test cases to be excluded on separate lines from a file"
)]
exclusions_file_path: Option<PathBuf>, What do you think? |
Okay, you were correct. My reading of the structopt documentation implied to me that you couldn't have a flag that took I pushed a new revision with the requested changes. This means you can now run:
which will output I also removed the short opt setting and renamed the option as requested. |
You did a very good job with building the DFA combination algorithm but I hope you don't mind the following question. By looking at the current functionality of your algorithm, instead of creating two DFAs and subtracting them from each other, wouldn't it be much easier to subtract the set of negative test cases from the set of positive test cases first and then to apply the original algorithm that creates only one DFA? I think all of the new integration tests you added would return the same results when doing set subtraction instead of DFA subtraction. This would simplify the algorithm a lot. Am I correct or have I missed anything? I'd like to very much merge your PR but first I want to know what its benefit is compared to set subtraction. If your algorithm lays the ground for features that are not yet implemented, can you please tell me what fields of application it could have beyond the exclusion of test cases? Thanks a lot! |
You're right -- let me come up with some examples. Primarily they concern conversion features, although I just added a proptest right now for conversion features and subtraction and found a new bug, so let me solve that first. |
So -- there's a problem with chracter classes and how to solve it is not obvious to me. I found this issue by adding a new property test that tries random features with exclusions and verifies the exclusions are still valid. Consider this input:
This produces The issue is that the dfa produced for '#' is It seems that the rust regular expressions support character class intersections and subtractions, but I think this is a non-standard feature (not available in, say, PCRE). Right now also, I'm not entirely sure how to test if a Any suggestions on how to solve this? |
Hi Allan, I do not have much time at the moment to dive into the code. I'd like to suggest the following: I'm going to merge your changes now into the feature branch for the next version 1.2.0 because I think you did some marvelous work here regarding the DFA algorithm. Even though it does not yet assort well with character classes and probably some other features, I'm sure it is a solid foundation for various new functionality. As soon as I find the time, I will continue to work on the current state. Are you ok with that? :) |
This adds a new option
--file-negative
, which contains a list of negative test cases. The resulting regex will strictly not matching any of these test cases. This fixes #16.To support negation, a second DFA is built of the negative cases, and then subtracted from the positive case DFA, using the standard DFA combination algorithm. To limit the number of nodes generated, combinations of nodes in the two DFAs are visited in depth-first order. Nodes that only occur in the negative match DFA are not visited.
Because the repetition feature can produce grapheme transitions in the DFA that are variable length, code is added to calculate the overlap of two grapheme ranges.
The generated graphs can contain 'dead ends' so some code is added to remove those. Some bug fixes for corner cases that were previously not hit were needed in the
recreate_graph
function were also necessary. Alsofind_next_state
was written to use the new grapheme overlapping function, to prevent sometimes creating multiple conflicting edges out of a node.As part of this, a bug was fixed that previously caused blank lines the input to not be considered in the final regex, because the "initial" state could never be considered an accept state.
I got rid of
final_state_indices
and moved that information into the node label. I also added descriptive labels to nodes to aid debugging.Adds appropriate tests. All pass. Ran through
cargo fmt
andcargo clippy
.I haven't written much rust before so please let me know if there are any issues.