Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Batch load taxon names, unexpected behavior from a newcomer's perspective #4151

Open
weevil-see opened this issue Dec 11, 2024 · 2 comments
Labels
enhancement Suggest an improvement to an existing function. exemplary issue This is a great example of how to write an issue.

Comments

@weevil-see
Copy link

Steps to reproduce the bug

This is not a true bug report, you could read it as an insight into how a newcomer tries to understand the software :)
Let me know if it is useful to report issues like this.

As a new user, the first thing I wanted to try was to upload a taxon list, as I consider it a very basic thing, and I wanted to have a few taxon names in the database to experiment with them afterwards.
1. I started a docker container of TaxonWorks DEVELOPMENT v0.44.3 
2. Clicked "Nomenclature" > "Data" Tab
3. Chose "Taxon Names"
4. Clicked "Batch Load"
5. Simple batch load 
6. Uploaded a .csv file, which should be formatted like the example. It is tab-separated.
7. "Attach names as children of (defaults to Root if none provided)" wrote Curculionoidea, not sure if I have to define Curculionoidea in the database before or if it pulled from somewhere
8. Chose the ICZN code
9. "Also create OTU?" not sure what this does, I clicked it
10. Clicked the "Preview" button. This takes a while.
11. I get some data errors in the "Line breakdown", while the "Input file" beneath looks as expected. Most of the data errors are clear to me and I know how I could resolve them before uploading a refined file.
The only one which is not clear to me is the "children is invalid" error.

Screenshot

This is what the upload looks like:
Screenshot_20241211_202755

Expected behavior

To me, the "Line breakdown" looks very unfamiliar. I would have expected a breakdown per line, but instead every higher taxon is only recorded once, in the first line where it occurs.
I get that this is useful, to make sure that every data error is recorded only once, if it is for example in a family-group name.

As soon as I figured this out, the "children is invalid" error did not seem as strange to me. The thing which befuddled me is that it is called a "line breakdown", so I did not understand that there is an error for a line which seemingly did not contain any errors.
Now I understand that the "children is invalid" is raised in the first line where a higher taxon appears, which has a child in another line with an error.
I am not sure if this is really necessary, as the child error is also raised in the corresponding line.

Additional Screenshots

The two Bruchela species are raising an error, this is why the "children is invalid" error shows up in the first line with the genus, and the first line of the subfamily, and the first line of the family.
Screenshot_20241211_202609

Environment

Development (docker)

Sandbox Used

No response

Version

v0.44.3

Browser Used

firefox

@weevil-see weevil-see added the bug An existing function is broken. label Dec 11, 2024
@weevil-see weevil-see changed the title [Bug]: [Bug]: Batch load taxon names, unexpected behavior from a newcomer's perspective Dec 11, 2024
@mjy mjy added the exemplary issue This is a great example of how to write an issue. label Dec 12, 2024
@mjy
Copy link
Member

mjy commented Dec 12, 2024

@weevil-see First, thanks very much for spending time with TW and taking the time to report your experience, we appreciate that.

I think you came to all the "right" conclusions.

To me, the "Line breakdown" looks very unfamiliar. I would have expected a breakdown per line, but instead every higher taxon is only recorded once, in the first line where it occurs.

You got it. This exactly reflects what happens when you import the data, we "normalize" it into TaxonWorks model.

I get that this is useful, to make sure that every data error is recorded only once, if it is for example in a family-group name.

It's not just that. What you are seeing in the summary is a list of the unique "objects" that will be created in TaxonWorks.

As soon as I figured this out, the "children is invalid" error did not seem as strange to me. The thing which befuddled me is that it is called a "line breakdown", so I did not understand that there is an error for a line which seemingly did not ontain any errors.

"Line breakdown" is a little obscure. It is meant to reflect that we are looking at one line of data, and "breaking it down", or decomposing it into one more "things". We like to say that TaxonWorks takes data "from rows to things". Storing data as "things" gives us major advantages, for example we can annotate all things with a wide range of metadata, etc. It also has some draw-backs, it's more difficult to communicate what's going on in imports like this, precisely what you experienced.

Now I understand that the "children is invalid" is raised in the first line where a higher taxon appears, which has a child in another line with an error.

Precisely! You can see that via the Latinized error downstream. This particular importer will ignore that message and let you create the higher taxon, only failing at the lower taxon (assuming you select a particular mode).

I am not sure if this is really necessary, as the child error is also raised in the corresponding line.

You're right, there are certainly nuances here. To understand this it helps to think a little broader. The "things" (models) in TW have various validations, both hard (preventing addition), and "soft" which allow addition and then subsequently provide feedback on, say, logical inconsistencies, or ways to fix the data. Because all data models act the same we can write "meta" tools, like generic batch loaders, across them. This lets us re-use code, knowning that objects will have errors on them that need to be reported, in a generic fashion. What you are seeing in this type of batch-loader is the application of this generic framework. There are various other contexts in which you would create a name, other forms, batch scripts, etc., validations need to apply to all those contexts. Deciding which validations to show in which context (in batch show these, in script show those, in form show these other) is, basically, not really worth it, what's important is that any data that enters the system does so following the same set of rules.

A couple followups:

  • Keep experimenting. Setup your system to try things and then wipe and restart (or try on sandboxes).
  • Create new projects to quickly give yourself a new "clean" playground.

Action items:

  • Add guidance on the task for interpreting the children error.
  • Move some of the answer/context here to docs.taxonworks.org

Thanks again!

@mjy mjy added enhancement Suggest an improvement to an existing function. and removed bug An existing function is broken. labels Dec 12, 2024
@weevil-see
Copy link
Author

@mjy thanks a lot! I really appreciate the effort!
I'll continue with an export from an existing TaxonWorks and try to load that data into my local docker setup :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Suggest an improvement to an existing function. exemplary issue This is a great example of how to write an issue.
Projects
None yet
Development

No branches or pull requests

2 participants