-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added sample irida_next sample field option #140
Conversation
|
If these tests pass, a sample with the name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work Matthew 😸
I don’t have any specific comments - this sample_name
solution looks solid to me. I tried adding a helper function to simplify the inx_string_suffix
extraction logic in updated_samples within main.nf
, but it ended up making things more complicated than expected, haha!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great Matthew. Thanks so much for your work on including sample names 😄
I have a few suggestions and comments for you (given in-line below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my comment, there needs to be a renaming of meta.id
with meta.external_id
when no sample_name
is provided because it becomes null
and then wants to group everything in the COMBINE_DATA() process. I tried using the map{}
we have used in other pipelines but it wasn't working. I can give it more of a try.
What I tried doing was:
// Track processed IDs
def processedIDs = [] as Set
input = Channel.fromSamplesheet("input")
// and remove non-alphanumeric characters in sample_names (meta.id), whilst also correcting for duplicate sample_names (meta.id)
.map { meta ->
if (!meta.id) {
meta.id = meta.external_id
} else {
// Non-alphanumeric characters (excluding _,-,.) will be replaced with "_"
meta.id = meta.id.replaceAll(/[^A-Za-z0-9_.\-]/, '_')
}
// Ensure ID is unique by appending meta.external_id if needed
while (processedIDs.contains(meta.id)) {
meta.id = "${meta.id}_${meta.external_id}"
}
// Add the ID to the set of processed IDs
processedIDs << meta.id
tuple(meta)}.view()
in the input_check subworkflow but it tells me it cannot perform replaceAll
because it is an ArrayList type.
One last comment! I promise, and a suggestion. Could we use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks so much for all your work @mattheww95 . A few inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for implementing the changes. I will continue to do some testing (i.e. playing with the pipeline) but for the PR I think it looks good to merge. Thanks for working through this rather tedious PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much Matthew Wells for the amazing work you've done with this. And thanks so much Steven Sutcliffe for all your help reviewing 😄
I have tested this out in IRIDA Next starting from both assemblies. The output files are named properly, they are stored properly with the respective sample records, and metadata is written properly.
This all looks great. Approving this PR. I only made note of 2 small typos/fixes to text in-line to change.
Added support for the irida_next sample id.