-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Anagram add tests for graphemes #2445
base: main
Are you sure you want to change the base?
Conversation
…icode code points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructions are explicit about ASCII characters only. We should have a discussion around weather or not we're internationalizing the Anagrams. Currently, they are all English only, and do not contain umlauts or other accents. I don't think this exercise is a good candidate for Unicode as currently specified.
The tests already contain non-ASCII characters. I am pretty sure this "β" is Greek for beta (a non ascii letter). From my perspective, it could be argued that this exercise shouldn't have Unicode characters. But that discussion should have been held when the first uncicode tests were added. |
Here is the pr for reference: #2366 |
Just because we didn't have the discussion then doesn't mean we can't have it now. I don't think those test cases should have been added, given the instructions (I know I approved that, but I shouldn't have without looking at the docs). Even with the scenario flag. I think if we go this route, we need to change/clarify the instructions, and we also need to make sure that any test cases form valid words in the target language. For ref, the Wiki article on Anagrams. |
The reasoning at the time of the original PR was that these tests are added under the
|
Yes - I know. 🙂 But at the time, I didn't realize that the Anagram instructions directed students specifically at ASCII. I think that directive should be removed, and the instructions made more general, or there should be explicit mention of Unicode handling. For example, I think if we're going to include Unicode here, the instructions should follow one or the other of those exercise examples. I also think that we should make sure that any test cases follow the rules of Anagram formation. |
I think I would prefer to just not mention ASCII in the instructions, as we already have non-ascii characters.
What do you mean by this? Just that we double-check if the test cases match the instructions? |
An Anagram needs to be a valid word in a given language (capitalization non withstanding), so as the instructions are written I think all candidates passed in test cases either need to be valid words, or obviously not valid, if that makes sense. |
I feel like the only sensible interpretation of the candidates list is that students can assume them to be valid words. Otherwise, every single solution to this exercise would have to include an actual dicitionary of the English language. Wikipedia may have some definition of the word Anagram, but natural language processing is not the goal of this exerise, right? |
..which is part of the point I am trying to make. In adding in Unicode characters (or any extended ASCII for that matter) we've taken this beyond English. With very few exceptions, English doesn't include any accented characters (nor Greek!) But that's fine! It just means when we craft test cases, we need to make sure that the candidates are valid words -- whatever the language is that the candidates are written in. Conversely, I don't know any languages that use a Euro sign within words, so "€a" (last test case) is "obviously" wrong (and the start word is a single letter followed by a non-word symbol anyways), with I think is fine. We might want to add "assume all candidates are valid words" to the instructions as well tho, just to be safe. And for the test case in Greek, we have And for this proposed test case, we need to pick a word that's valid, and craft candidates that are also valid words in the same language. |
Because it's the lowercase version of I don't see the impact of whether the words are valid in some natural langue on the user experience, but I'm fine with reimplementing the test cases with actual words that test the same thing. |
Note that the test case using |
I don't think considering languages is feasible. The strings If Unicode tests are to be included, I think it would be wise to have the instructions explicitly define what they mean by 'letter' for the purposes of this exercise, but not consider applicability to the World's writing systems. |
D'OH 🤦🏽♀️ Nevermind, then.
Probably. In any case, I don't have the interest to find any. 🙂 If we change the instructions to remove the reference to ASCII and add in a note that the student should assume all candidate words are valid, then I am fine. |
This has been discussed here: https://forum.exercism.org/t/unicode-testing-for-anagram-doesnt-actually-test-for-grapheme/10906