Merge pull request #126 from phac-nml/update/docs

updated docs
phac-nml · Oct 3, 2024 · 304bab3 · 304bab3
2 parents 7a3231a + 5ac0ceb
commit 304bab3
Show file tree

Hide file tree

Showing 5 changed files with 74 additions and 85 deletions.
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -171,3 +171,7 @@ Samplesheet
 TSeemann's
 RASUSA
 downsampling
+Christy
+Marinier
+Petkau
+
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Documentation and workflow diagram has been updated. [PR 123](https://github.com/phac-nml/mikrokondo/pull/123)
 
+- Documentation and Readme has been updated. [PR 126](https://github.com/phac-nml/mikrokondo/pull/126)
+
 ## [0.4.2] - 2024-09-25
 
 ### `Fixed`

diff --git a/README.md b/README.md
@@ -53,7 +53,7 @@ This workflow will detect what pathogen(s) is present and apply the applicable m
 
 This software (currently unpublished) can be cited as:
 
-- Wells, M. "mikrokondo" Github <https://github.com/phac-nml/mikrokondo/>
+- Matthew Wells, James Robertson, Aaron Petkau, Christy-Lynn Peterson, Eric Marinier. "mikrokondo" Github <https://github.com/phac-nml/mikrokondo/>
 
 An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
 
@@ -103,33 +103,24 @@ The above downloadable resources must be updated in the following places in your
 
 ```
 // Bakta db path, note the quotation marks
-bakta {
-    db = "/PATH/TO/BAKTA/DB"
-}
+bakta_db = "/PATH/TO/BAKTA/DB"
 
 // Decontamination minimap2 index, note the quotation marks
-r_contaminants {
-    mega_mm2_idx = "/PATH/TO/DECONTAMINATION/INDEX"
-}
+dehosting_idx = "/PATH/TO/DECONTAMINATION/INDEX"
 
 // kraken db path, not the quotation marks
-kraken {
-    db = "/PATH/TO/KRAKEN/DATABASE/"
-}
+kraken2_db = "/PATH/TO/KRAKEN/DATABASE/"
 
 // GTDB Mash sketch, note the quotation marks
-mash {
-    mash_sketch = "/PATH/TO/MASH/SKETCH/"
-}
+mash_sketch = "/PATH/TO/MASH/SKETCH/"
 
 // STARAMR database path, note the quotation marks
 // Passing in a StarAMR database is optional if one is not specified the database in the container will be used. You can just leave the db option as null if you do not wish to pass one
-staramr {
-  db = "/PATH/TO/STARMAR/DB"
-}
-
+staramr_db = "/PATH/TO/STARMAR/DB"
 ```
 
+The above parameters can be accessed for the command line as for passing arguments to the pipeline if not set in the `nextflow.config` file.
+
 # Getting Started
 ## Usage
 

diff --git a/docs/usage/installation.md b/docs/usage/installation.md
@@ -1,64 +1,56 @@
-# Installation
-
-## Dependencies
-- Python (3.10>=)
-- Nextflow (22.10.1>=)
-- Container service (Docker, Singularity, Apptainer have been tested)
-- The source code: `git clone https://github.com/phac-nml/mikrokondo.git`
-
-**Dependencies can be installed with Conda (e.g. Nextflow and Python)**. 
-
-## To install mikrokondo
-Once all dependencies are installed (see below for instructions), to download the pipeline run:
-
-`git clone https://github.com/phac-nml/mikrokondo.git`
-
-## Installing Nextflow
-Nextflow is required to run mikrokondo (requires Linux), and instructions for its installation can be found at either: [Nextflow Home](https://www.nextflow.io/) or  [Nextflow Documentation](https://www.nextflow.io/docs/latest/getstarted.html#installation)
-
-## Container Engine
-Nextflow and Mikrokondo require the use of containers to run the pipeline, such as: Docker, Singularity (now apptainer), podman, gitpod, sifter and charliecloud. 
-
-> **NOTE:** Singularity was adopted by the Linux Foundation and is now called Apptainer. Singularity still exists, however newer installs will likely use Apptainer.
-
-## Docker or Singularity?
-Docker requires root privileges which can can make it a hassle to install on computing clusters, while there are work arounds, Apptainer/Singularity does not. Therefore, using Apptainer/Singularity is the recommended method for running the mikrokondo pipeline.
-
-### Issues
-Containers are not perfect, below is a list of some issues you may face using containers in mikrokondo, fixes for each issue will be detailed here as they are identified. 
-
-- **Exit code 137,** usually means the docker container used to much memory.
-
-## Resources to download
-- [GTDB Mash Sketch](https://zenodo.org/record/8408361): required for speciation and determination when sample is metagenomic
-- [Decontamination Index](https://zenodo.org/record/8408557): Required for decontamination of reads (this is a minimap2 index)
-- [Kraken2 std database](https://benlangmead.github.io/aws-indexes/k2): Required for binning of metagenommic data and is an alternative to using Mash for speciation
-- [Bakta database](https://zenodo.org/record/7669534): Running Bakta is optional and there is a light database option, however the full one is recommended. You will have to unzip and un-tar the database for usage.
-
-### Fields to update with resources
-It is recommended to store the above resources within the `databases` folder in the mikrokondo folder, this allows for a simple update to the names of the database in `nextflow.config` rather than a need for a full path description.
-
-Below shows where to update database resources in the `params` section of the `nextflow.config` file:
-
-```
-// Bakta db path, note the quotation marks
-bakta {
-    db = "/PATH/TO/BAKTA/DB"
-}
-
-// Decontamination minimap2 index, note the quotation marks
-r_contaminants {
-    mega_mm2_idx = "/PATH/TO/DECONTAMINATION/INDEX"
-}
-
-// kraken db path, not the quotation marks
-kraken {
-    db = "/PATH/TO/KRAKEN/DATABASE/"
-}
-
-// GTDB Mash sketch, note the quotation marks
-mash {
-    mash_sketch = "/PATH/TO/MASH/SKETCH/"
-}
-
-```
+# Installation
+
+## Dependencies
+- Python (3.10>=)
+- Nextflow (22.10.1>=)
+- Container service (Docker, Singularity, Apptainer have been tested)
+- The source code: `git clone https://github.com/phac-nml/mikrokondo.git`
+
+**Dependencies can be installed with Conda (e.g. Nextflow and Python)**.
+
+## To install mikrokondo
+Once all dependencies are installed (see below for instructions), to download the pipeline run:
+
+`git clone https://github.com/phac-nml/mikrokondo.git`
+
+## Installing Nextflow
+Nextflow is required to run mikrokondo (requires Linux), and instructions for its installation can be found at either: [Nextflow Home](https://www.nextflow.io/) or  [Nextflow Documentation](https://www.nextflow.io/docs/latest/getstarted.html#installation)
+
+## Container Engine
+Nextflow and Mikrokondo require the use of containers to run the pipeline, such as: Docker, Singularity (now apptainer), podman, gitpod, sifter and charliecloud.
+
+> **NOTE:** Singularity was adopted by the Linux Foundation and is now called Apptainer. Singularity still exists, however newer installs will likely use Apptainer.
+
+## Docker or Singularity?
+Docker requires root privileges which can can make it a hassle to install on computing clusters, while there are workarounds, Apptainer/Singularity does not. Therefore, using Apptainer/Singularity is the recommended method for running the mikrokondo pipeline.
+
+### Issues
+Containers are not perfect, below is a list of some issues you may face using containers in mikrokondo, fixes for each issue will be detailed here as they are identified.
+
+- **Exit code 137,** usually means the docker container used to much memory.
+
+## Resources to download
+- [GTDB Mash Sketch](https://zenodo.org/record/8408361): required for speciation and determination when sample is metagenomic
+- [Decontamination Index](https://zenodo.org/record/8408557): Required for decontamination of reads (this is a minimap2 index)
+- [Kraken2 std database](https://benlangmead.github.io/aws-indexes/k2): Required for binning of metagenomic data and is an alternative to using Mash for speciation
+- [Bakta database](https://zenodo.org/record/7669534): Running Bakta is optional and there is a light database option, however the full one is recommended. You will have to unzip and un-tar the database for usage.
+
+### Fields to update with resources
+It is recommended to store the above resources within the `databases` folder in the mikrokondo folder, this allows for a simple update to the names of the database in `nextflow.config` rather than a need for a full path description.
+
+Below shows where to update database resources in the `params` section of the `nextflow.config` file:
+
+```
+// Bakta db path, note the quotation marks
+bakta_db = "/PATH/TO/BAKTA/DB"
+
+// Decontamination minimap2 index, note the quotation marks
+dehosting_idx = "/PATH/TO/DECONTAMINATION/INDEX"
+
+// kraken db path, not the quotation marks
+kraken2_db = "/PATH/TO/KRAKEN/DATABASE/"
+
+// GTDB Mash sketch, note the quotation marks
+mash_sketch = "/PATH/TO/MASH/SKETCH/"
+
+```
diff --git a/docs/usage/usage.md b/docs/usage/usage.md
@@ -109,7 +109,7 @@ Numerous steps within mikrokondo can be turned off without compromising the stab
 - `--skip_subtyping`: to turn off automatic triggering of subtyping in the pipeline (useful when target organism does not have a subtyping tool installed within mikrokondo).
 - `--skip_version_gathering`: prevents the collation of tool versions. This process generally takes a couple minutes (at worst) but can be useful when during recurrent runs of the pipeline (like when testing settings).
 - `--skip_report`: Prevents creation of final report summary report amalgamating outputs of all other files, this will also turnoff the creation of individual sub-reports.
-- `--skip_metagenomic_detection`: Skips classification of sample as metagnomic and forces a sample to be analyzed as an isolate.
+- `--skip_metagenomic_detection`: Skips classification of sample as metagenomic and forces a sample to be analyzed as an isolate.
 - `--skip_raw_read_metrics`: Prevents generation of raw read metrics, e.g. metrics generated about the reads before any trimming or filtering is performed.
 - `--skip_mlst`: Skip seven gene MLST.
 - `--skip_length_filtering_contigs`: Skip length filtering of contigs based on the `--qt_min_contig_length` parameter.
@@ -128,7 +128,7 @@ Different databases/pre-computed files are required for usage within mikrokondo.
 Allele scheme selection parameters.
 
 - `--override_allele_scheme`: Provide the path to an allele scheme (currently only locidex is supported) that will be used for all samples provided. e.g. no automated allele database selection is performed, this scheme will be applied.
-- `--lx_allele_database`: A path to a `manifest.json` file used by locidex for automated allele selection. This option cannot be used along side `--overrided_allele_scheme`.
+- `--lx_allele_database`: A path to a `manifest.json` file used by locidex for automated allele selection. This option cannot be used along side `--override_allele_scheme`.
   >**Note:** The provide only a path to the `manifest.json` file as `some/directory` **NOT** `some/directory/manifest.json`
 
 
@@ -187,7 +187,7 @@ Top level parameters for Locidex. The currently implemented allele caller, do no
 - `--lx_max_dna_len`: Global maximum query length of DNA strand.
 - `--lx_max_aa_len`: Global maximum query length of Amino Acid strand.
 - `--lx_min_dna_ident`: Global minimum DNA percent identity required for match. (float).
-- `--lx_min_aa_ident`: Global minimum Amino Acid percent identiy required for match. (float).
+- `--lx_min_aa_ident`: Global minimum Amino Acid percent identity required for match. (float).
 - `--lx_min_dna_match_cov`: Global minimum DNA percent hit coverage identity required for match (float).
 - `--lx_min_aa_match_cov`: Global minimum Amino Acid hit coverage identity required for match (float).
 - `--lx_max_target_seqs`: Maximum number of sequence hits per query.
@@ -219,7 +219,7 @@ Different container services can be specified from the command line when running
 
 #### Slurm options
 
-- `slurm_p true`: slurm execurtor will be used.
+- `slurm_p true`: slurm executor will be used.
 - `slurm_profile STRING`: a string to allow the user to specify which slurm partition to use.
 
 ## Output