Skip to content

Commit

Permalink
Merge pull request #78 from phac-nml/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
mattheww95 authored May 14, 2024
2 parents bb93c35 + 90ce217 commit f1efb35
Show file tree
Hide file tree
Showing 121 changed files with 3,912 additions and 1,032 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/linting_comment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Download lint results
uses: dawidd6/action-download-artifact@f6b0bace624032e30a85a8fd9c1a7f8f611f5737 # v3
uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3
with:
workflow: linting.yml
workflow_conclusion: completed
Expand Down
1 change: 1 addition & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
repository_type: pipeline
nf_core_version: "2.14.1"
lint:
files_exist:
- CODE_OF_CONDUCT.md
Expand Down
58 changes: 40 additions & 18 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,62 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v0.1.2 - [2024-05-02]
## v0.2.0 - [2024-05-14]

### `Added`

- Updated documentation for params. See [PR 66](https://github.com/phac-nml/mikrokondo/pull/66)

- Fixed param typos in schema, config and docs. See [PR 66](https://github.com/phac-nml/mikrokondo/pull/66)

- Added parameter to skip length filtering of sequences. See [PR 66](https://github.com/phac-nml/mikrokondo/pull/66)

- Added locidex for allele calling. See [PR 62](https://github.com/phac-nml/mikrokondo/pull/62)

- Updated directory output structure and names. See [PR 66](https://github.com/phac-nml/mikrokondo/pull/66)

- Added tests for Kraken2 contig binning. See [PR 66](https://github.com/phac-nml/mikrokondo/pull/66)

### `Fixed`

- If you select to filter contigs by length, those contigs will now be used for subsequent analysis. See [PR 66](https://github.com/phac-nml/mikrokondo/pull/66)

- Matched ECTyper and SISTR parameters to what is set in the current IRIDA. See [PR 68](https://github.com/phac-nml/mikrokondo/pull/68)

- Updated StarAMR point finder DB selection to resolve error when in db selection when a database is not selected addressing issue. See [PR 74](https://github.com/phac-nml/mikrokondo/pull/74)

### Added
- Fixed calculation of SeqtkBaseCount value include counts for both pairs of paird-end reads. See [PR 65](https://github.com/phac-nml/mikrokondo/pull/65).

## `Changed`

- Changed the specific files and metadata to store within IRIDA Next. See [PR 65](https://github.com/phac-nml/mikrokondo/pull/65)

- Added separate report fields for (PASSED|FAILED|WARNING) values and for the the actual value. See [PR 65](https://github.com/phac-nml/mikrokondo/pull/65)

- Updated StarAMR to version 0.10.0. See [PR 74](https://github.com/phac-nml/mikrokondo/pull/74)

## v0.1.2 - [2024-05-02]

### Changed

- Changed default values for database parameters `--dehosting_idx`, `--mash_sketch`, `--kraken2_db`, and `--bakta_db` to null.
- Enabled checking for existance of database files in JSON Schema to avoid issues with staging non-existent files in Azure.
- Set `--kraken2_db` to be a required parameter for the pipeline.
- Hide bakta parameters from IRIDA Next UI.
- Changed default values for database parameters `--dehosting_idx`, `--mash_sketch`, `--kraken2_db`, and `--bakta_db` to null. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71)
- Enabled checking for existance of database files in JSON Schema to avoid issues with staging non-existent files in Azure. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71).
- Set `--kraken2_db` to be a required parameter for the pipeline. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71)
- Hide bakta parameters from IRIDA Next UI. See [PR 71](https://github.com/phac-nml/mikrokondo/pull/71)

## v0.1.1 - [2024-04-22]

### Added

### Changed

- Switched the resource labels for **parse_fastp**, **select_pointfinder**, **report**, and **parse_kat** from `process_low` to `process_single` as they are all configured to run on the local Nextflow machine.
- Switched the resource labels for **parse_fastp**, **select_pointfinder**, **report**, and **parse_kat** from `process_low` to `process_single` as they are all configured to run on the local Nextflow machine. See [PR 67](https://github.com/phac-nml/mikrokondo/pull/67)

## v0.1.0 - [2024-03-22]

Initial release of phac-nml/mikrokondo. Mikrokondo currently supports: read trimming and quality control, contamination detection, assembly (isolate, metagenomic or hybrid), annotation, AMR detection and subtyping of genomic sequencing data targeting bacterial or metagenomic data.

- Bumped version number to 0.1.0

- Updated docs to include awesome-page plugin and restructured readme.
- Updated docs to include awesome-page plugin and restructured readme.

- Updated coverage defaults for Shigella, Escherichia and Vibrio

Expand All @@ -49,11 +79,3 @@ Initial release of phac-nml/mikrokondo. Mikrokondo currently supports: read trim
- Changed salmonella default default coverage to 40

- Added integration testing using [nf-test](https://www.nf-test.com/).

### `Added`

### `Fixed`

### `Dependencies`

### `Deprecated`
56 changes: 44 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,35 @@
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
<!-- [![Launch on Nextflow Tower](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Nextflow%20Tower-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/mk-kondo/mikrokondo) -->

- [Introduction](#introduction)
* [What is mikrokondo?](#what-is-mikrokondo-)
* [Is mikrokondo right for me?](#is-mikrokondo-right-for-me-)
* [Citation](#citation)
+ [Contact](#contact)
- [Installing mikrokondo](#installing-mikrokondo)
* [Step 1: Installing Nextflow](#step-1--installing-nextflow)
* [Step 2: Choose a Container Engine](#step-2--choose-a-container-engine)
+ [Docker or Singularity?](#docker-or-singularity-)
* [Step 3: Install dependencies](#step-3--install-dependencies)
+ [Dependencies listed](#dependencies-listed)
* [Step 4: Further resources to download](#step-4--further-resources-to-download)
+ [Configuration and settings:](#configuration-and-settings-)
- [Getting Started](#getting-started)
* [Usage](#usage)
+ [Data Input/formats](#data-input-formats)
+ [Output/Results](#output-results)
* [Run example data](#run-example-data)
* [Testing](#testing)
+ [Install nf-test](#install-nf-test)
+ [Run tests](#run-tests)
* [Troubleshooting and FAQs:](#troubleshooting-and-faqs-)
* [References](#references)
* [Legal and Compliance Information:](#legal-and-compliance-information-)
* [Updates and Release Notes:](#updates-and-release-notes-)

<small><i><a href='http://ecotrust-canada.github.io/markdown-toc/'>Table of contents generated with markdown-toc</a></i></small>


# Introduction

## What is mikrokondo?
Expand Down Expand Up @@ -127,18 +156,21 @@ For more information see the [useage docs](https://phac-nml.github.io/mikrokondo

### Output/Results

All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure:

- **annotations** - dir containing all annotation tool output.
- **assembly** - dir containing all assembly tool related output, including quality, 7 gene MLST and taxon determination.
- **pipeline_info** - dir containing all pipeline related information including software versions used and execution reports.
- **ReadQuality** - dir containing all read tool related output, including contamination, fastq, mash, and subsampled read sets (when present)
- **subtyping** - dir containing all subtyping tool related output, including SISTR, ECtyper, etc.
- **SummaryReport** - dir containing collated results files for all tools, including:
- Individual sample flatted json reports
- **final_report** - All tool results for all samples in both .json (including a flattened version) and .tsv format
- **bco.json** - data providence file generated from the nf-prov plug-in
- **manifest.json** - data providence file generated from the nf-prov plug-in
All output files will be written into the `outdir` (specified by the user). More explicit tool results can be found in both the [Workflow](workflows/CleanAssemble/) and [Subworkflow](subworkflows/) sections of the docs. Here is a brief description of the outdir structure (though in brief the further into the structure you head, the further in the workflow the tool has been run):

- **Assembly** - contains all output files generated as a result of read assembly and tools using assembled contigs as input
- **Annotation** - contains output files generated from tools applying annotation and/or gene characterization from assembled contigs
- **Assembling** - contains output files generated as a part of the assembly process in nested order
- **FinalAssembly** - this directory will always contain the final output contig files from the last step in the assembly process (will take into account any skip flags in the process)
- **PostProcessing** - contains output files from intermediary tools that run after assembly but before annotation takes place in the workflow
- **Quality** - contains all output files generated as a result of quality tools after assembly
- **Subtyping** - contains all output files from workflow subtyping tools, based off assembled contigs
- **FinalReports** - contains assorted reports including aggregated and flat reports
- **pipeline_info** - includes tool versions and other pipeline specific information
- **Reads** - contains all output files generated as a result of read processing and tools using reads as input
- **FinalReads** - this directory will contain the final output read files from the last step in read processing (taking into account any skip flags used in the run)
- **Processing** - contains output files from tools run to process reads in nested order
- **Quality** - contains all output files generated from read quality tools

## Run example data

Expand Down
3 changes: 2 additions & 1 deletion bin/kraken2_bin.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from collections import defaultdict
import os
import sys
import re


kraken2_classifiers = frozenset(["U", "R", "D", "K", "P", "C", "O", "F", "G", "S"])
Expand Down Expand Up @@ -355,7 +356,7 @@ def write_fastas(self, sequences):
"""
for k, v in sequences.items():
with open(
f"{k.strip().replace(' ', '_').replace('(', '_').replace(')', '_').replace('.', '_')}_binned.fasta",
"{}.binned.fasta".format(re.sub(r'[^A-Za-z0-9\-_]', '_', k)),
"w",
encoding="utf8",
) as out_file:
Expand Down
Loading

0 comments on commit f1efb35

Please sign in to comment.