Skip to content

Commit

Permalink
Merge pull request #126 from phac-nml/update/docs
Browse files Browse the repository at this point in the history
updated docs
  • Loading branch information
mattheww95 authored Oct 3, 2024
2 parents 7a3231a + 5ac0ceb commit 304bab3
Show file tree
Hide file tree
Showing 5 changed files with 74 additions and 85 deletions.
4 changes: 4 additions & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,7 @@ Samplesheet
TSeemann's
RASUSA
downsampling
Christy
Marinier
Petkau

2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Documentation and workflow diagram has been updated. [PR 123](https://github.com/phac-nml/mikrokondo/pull/123)

- Documentation and Readme has been updated. [PR 126](https://github.com/phac-nml/mikrokondo/pull/126)

## [0.4.2] - 2024-09-25

### `Fixed`
Expand Down
25 changes: 8 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ This workflow will detect what pathogen(s) is present and apply the applicable m

This software (currently unpublished) can be cited as:

- Wells, M. "mikrokondo" Github <https://github.com/phac-nml/mikrokondo/>
- Matthew Wells, James Robertson, Aaron Petkau, Christy-Lynn Peterson, Eric Marinier. "mikrokondo" Github <https://github.com/phac-nml/mikrokondo/>

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

Expand Down Expand Up @@ -103,33 +103,24 @@ The above downloadable resources must be updated in the following places in your

```
// Bakta db path, note the quotation marks
bakta {
db = "/PATH/TO/BAKTA/DB"
}
bakta_db = "/PATH/TO/BAKTA/DB"
// Decontamination minimap2 index, note the quotation marks
r_contaminants {
mega_mm2_idx = "/PATH/TO/DECONTAMINATION/INDEX"
}
dehosting_idx = "/PATH/TO/DECONTAMINATION/INDEX"
// kraken db path, not the quotation marks
kraken {
db = "/PATH/TO/KRAKEN/DATABASE/"
}
kraken2_db = "/PATH/TO/KRAKEN/DATABASE/"
// GTDB Mash sketch, note the quotation marks
mash {
mash_sketch = "/PATH/TO/MASH/SKETCH/"
}
mash_sketch = "/PATH/TO/MASH/SKETCH/"
// STARAMR database path, note the quotation marks
// Passing in a StarAMR database is optional if one is not specified the database in the container will be used. You can just leave the db option as null if you do not wish to pass one
staramr {
db = "/PATH/TO/STARMAR/DB"
}
staramr_db = "/PATH/TO/STARMAR/DB"
```

The above parameters can be accessed for the command line as for passing arguments to the pipeline if not set in the `nextflow.config` file.

# Getting Started
## Usage

Expand Down
120 changes: 56 additions & 64 deletions docs/usage/installation.md
Original file line number Diff line number Diff line change
@@ -1,64 +1,56 @@
# Installation

## Dependencies
- Python (3.10>=)
- Nextflow (22.10.1>=)
- Container service (Docker, Singularity, Apptainer have been tested)
- The source code: `git clone https://github.com/phac-nml/mikrokondo.git`

**Dependencies can be installed with Conda (e.g. Nextflow and Python)**.

## To install mikrokondo
Once all dependencies are installed (see below for instructions), to download the pipeline run:

`git clone https://github.com/phac-nml/mikrokondo.git`

## Installing Nextflow
Nextflow is required to run mikrokondo (requires Linux), and instructions for its installation can be found at either: [Nextflow Home](https://www.nextflow.io/) or [Nextflow Documentation](https://www.nextflow.io/docs/latest/getstarted.html#installation)

## Container Engine
Nextflow and Mikrokondo require the use of containers to run the pipeline, such as: Docker, Singularity (now apptainer), podman, gitpod, sifter and charliecloud.

> **NOTE:** Singularity was adopted by the Linux Foundation and is now called Apptainer. Singularity still exists, however newer installs will likely use Apptainer.
## Docker or Singularity?
Docker requires root privileges which can can make it a hassle to install on computing clusters, while there are work arounds, Apptainer/Singularity does not. Therefore, using Apptainer/Singularity is the recommended method for running the mikrokondo pipeline.

### Issues
Containers are not perfect, below is a list of some issues you may face using containers in mikrokondo, fixes for each issue will be detailed here as they are identified.

- **Exit code 137,** usually means the docker container used to much memory.

## Resources to download
- [GTDB Mash Sketch](https://zenodo.org/record/8408361): required for speciation and determination when sample is metagenomic
- [Decontamination Index](https://zenodo.org/record/8408557): Required for decontamination of reads (this is a minimap2 index)
- [Kraken2 std database](https://benlangmead.github.io/aws-indexes/k2): Required for binning of metagenommic data and is an alternative to using Mash for speciation
- [Bakta database](https://zenodo.org/record/7669534): Running Bakta is optional and there is a light database option, however the full one is recommended. You will have to unzip and un-tar the database for usage.

### Fields to update with resources
It is recommended to store the above resources within the `databases` folder in the mikrokondo folder, this allows for a simple update to the names of the database in `nextflow.config` rather than a need for a full path description.

Below shows where to update database resources in the `params` section of the `nextflow.config` file:

```
// Bakta db path, note the quotation marks
bakta {
db = "/PATH/TO/BAKTA/DB"
}
// Decontamination minimap2 index, note the quotation marks
r_contaminants {
mega_mm2_idx = "/PATH/TO/DECONTAMINATION/INDEX"
}
// kraken db path, not the quotation marks
kraken {
db = "/PATH/TO/KRAKEN/DATABASE/"
}
// GTDB Mash sketch, note the quotation marks
mash {
mash_sketch = "/PATH/TO/MASH/SKETCH/"
}
```
# Installation

## Dependencies
- Python (3.10>=)
- Nextflow (22.10.1>=)
- Container service (Docker, Singularity, Apptainer have been tested)
- The source code: `git clone https://github.com/phac-nml/mikrokondo.git`

**Dependencies can be installed with Conda (e.g. Nextflow and Python)**.

## To install mikrokondo
Once all dependencies are installed (see below for instructions), to download the pipeline run:

`git clone https://github.com/phac-nml/mikrokondo.git`

## Installing Nextflow
Nextflow is required to run mikrokondo (requires Linux), and instructions for its installation can be found at either: [Nextflow Home](https://www.nextflow.io/) or [Nextflow Documentation](https://www.nextflow.io/docs/latest/getstarted.html#installation)

## Container Engine
Nextflow and Mikrokondo require the use of containers to run the pipeline, such as: Docker, Singularity (now apptainer), podman, gitpod, sifter and charliecloud.

> **NOTE:** Singularity was adopted by the Linux Foundation and is now called Apptainer. Singularity still exists, however newer installs will likely use Apptainer.
## Docker or Singularity?
Docker requires root privileges which can can make it a hassle to install on computing clusters, while there are workarounds, Apptainer/Singularity does not. Therefore, using Apptainer/Singularity is the recommended method for running the mikrokondo pipeline.

### Issues
Containers are not perfect, below is a list of some issues you may face using containers in mikrokondo, fixes for each issue will be detailed here as they are identified.

- **Exit code 137,** usually means the docker container used to much memory.

## Resources to download
- [GTDB Mash Sketch](https://zenodo.org/record/8408361): required for speciation and determination when sample is metagenomic
- [Decontamination Index](https://zenodo.org/record/8408557): Required for decontamination of reads (this is a minimap2 index)
- [Kraken2 std database](https://benlangmead.github.io/aws-indexes/k2): Required for binning of metagenomic data and is an alternative to using Mash for speciation
- [Bakta database](https://zenodo.org/record/7669534): Running Bakta is optional and there is a light database option, however the full one is recommended. You will have to unzip and un-tar the database for usage.

### Fields to update with resources
It is recommended to store the above resources within the `databases` folder in the mikrokondo folder, this allows for a simple update to the names of the database in `nextflow.config` rather than a need for a full path description.

Below shows where to update database resources in the `params` section of the `nextflow.config` file:

```
// Bakta db path, note the quotation marks
bakta_db = "/PATH/TO/BAKTA/DB"
// Decontamination minimap2 index, note the quotation marks
dehosting_idx = "/PATH/TO/DECONTAMINATION/INDEX"
// kraken db path, not the quotation marks
kraken2_db = "/PATH/TO/KRAKEN/DATABASE/"
// GTDB Mash sketch, note the quotation marks
mash_sketch = "/PATH/TO/MASH/SKETCH/"
```
8 changes: 4 additions & 4 deletions docs/usage/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Numerous steps within mikrokondo can be turned off without compromising the stab
- `--skip_subtyping`: to turn off automatic triggering of subtyping in the pipeline (useful when target organism does not have a subtyping tool installed within mikrokondo).
- `--skip_version_gathering`: prevents the collation of tool versions. This process generally takes a couple minutes (at worst) but can be useful when during recurrent runs of the pipeline (like when testing settings).
- `--skip_report`: Prevents creation of final report summary report amalgamating outputs of all other files, this will also turnoff the creation of individual sub-reports.
- `--skip_metagenomic_detection`: Skips classification of sample as metagnomic and forces a sample to be analyzed as an isolate.
- `--skip_metagenomic_detection`: Skips classification of sample as metagenomic and forces a sample to be analyzed as an isolate.
- `--skip_raw_read_metrics`: Prevents generation of raw read metrics, e.g. metrics generated about the reads before any trimming or filtering is performed.
- `--skip_mlst`: Skip seven gene MLST.
- `--skip_length_filtering_contigs`: Skip length filtering of contigs based on the `--qt_min_contig_length` parameter.
Expand All @@ -128,7 +128,7 @@ Different databases/pre-computed files are required for usage within mikrokondo.
Allele scheme selection parameters.

- `--override_allele_scheme`: Provide the path to an allele scheme (currently only locidex is supported) that will be used for all samples provided. e.g. no automated allele database selection is performed, this scheme will be applied.
- `--lx_allele_database`: A path to a `manifest.json` file used by locidex for automated allele selection. This option cannot be used along side `--overrided_allele_scheme`.
- `--lx_allele_database`: A path to a `manifest.json` file used by locidex for automated allele selection. This option cannot be used along side `--override_allele_scheme`.
>**Note:** The provide only a path to the `manifest.json` file as `some/directory` **NOT** `some/directory/manifest.json`

Expand Down Expand Up @@ -187,7 +187,7 @@ Top level parameters for Locidex. The currently implemented allele caller, do no
- `--lx_max_dna_len`: Global maximum query length of DNA strand.
- `--lx_max_aa_len`: Global maximum query length of Amino Acid strand.
- `--lx_min_dna_ident`: Global minimum DNA percent identity required for match. (float).
- `--lx_min_aa_ident`: Global minimum Amino Acid percent identiy required for match. (float).
- `--lx_min_aa_ident`: Global minimum Amino Acid percent identity required for match. (float).
- `--lx_min_dna_match_cov`: Global minimum DNA percent hit coverage identity required for match (float).
- `--lx_min_aa_match_cov`: Global minimum Amino Acid hit coverage identity required for match (float).
- `--lx_max_target_seqs`: Maximum number of sequence hits per query.
Expand Down Expand Up @@ -219,7 +219,7 @@ Different container services can be specified from the command line when running
#### Slurm options

- `slurm_p true`: slurm execurtor will be used.
- `slurm_p true`: slurm executor will be used.
- `slurm_profile STRING`: a string to allow the user to specify which slurm partition to use.

## Output
Expand Down

0 comments on commit 304bab3

Please sign in to comment.