- Added support for cell hashing and feature barcoding.
- Added
tests/run_tests.py
to verify installation and function. Implemented Travis CI. - Added a full featured vignette in
sample_data
. - Improvements in the
mGSSP
module, including a new script5.5-score_sequences.py
to identify rare mutations. - Added GSSPs from Sheng et al 2017 to
sample_data
. - Improved how
1.0-preprocess.py
and1.3-finalize_assignments.py
run on HPCs.
- SONAR now does UMI detection and consensus generation, with
1.0-preprocess.py
replacing1.0-MiSeq_assembly.pl
. It is specifically designed to be compatible with 2x300 sequencing of cDNA generated using the 10x Chromium platform, but should work with almost any experimental design. See help message for details. All QC functionality in the old script has been ported to the new one. - SONAR is now single-cell aware.
1.1-blast_V.py
will look for acell_id
tag in the input sequences and maintain that information. A new script,1.5-single_cell_statistics.py
, can be used to collate the output information in the rearragnements.tsv file generated by1.3-finalize_assignments.py
and create a cell-level summary. - To accomodate these new workflows, I have changed the mechanism for daisy-chaining Module 1 scripts together. Each script will now directly accept all options that can be passed to downstream scripts, as well as
--runX
flags. Please see help messages for more details. - In
1.4-cluster_sequences.py
, the-f
parameter for specifying a nonstandard input file has changed to--file
. In addition, a new--maxgaps
parameter is available. 1.3-finalize_assignments.py
has been split, with a newparse_blast.py
doing most of the work that was originally in the main script. This allow parallel processing/cluster submission of large datasets in the same way as done for1.1-blast_V.py
and1.2-blast_J.py
.unique
has been deprecated as a readstatus
to avoid creating problems when testing multiple clustering conditions. Instead, look forcentroid
==sequence_id
and/or a none-nullcluster_count
field.1.3-finalize_assignments.py
now does fall-back isotype detection using the first 3 bases of CH1. This is useful for protocols using primers in the 5' region of CH1 such that isotype cannot be found by BLAST. It can be disabled with the--noFallBack
flag.- MacOS-compatible binaries have been added for all third-party programs. Please re-run setup.py to have SONAR autodetect which binary to use.
- I've finally fixed the SONAR/sonar bug, resolving the issue in favor of retaining the all-caps name. If you're like me and you've lazily cloned the repo into lower-case directory, you will need to rename it for things to continue to work.
- Fixed bugs in setup.py and the Dockerfile.
- SONAR now uses Python3. Python2 is not supported.
- All Python scripts now use DocOpt to manage argument parsing. This means that most single dash options are now double dash options.
- Output is now in AIRR format. A 'rearrangements.tsv' file has replaced the old 'all_seq_stats.txt' file and several field names have changed. I've tried to maintain backward compatibility in most cases, but there is also a new
convertToAIRR.py
utility to help pull data over, if necessary. - I've finally removed the double SONAR folder, which was never meant to be there in the first place. If you've added the SONAR module directories to your PATH, you'll need to update those references.
1.1-blast_V.py
now includes an optional dereplication step, and will preserve replicate counts through the pipeline.1.3-finalize_assignments.py
now distinguishes betweennonproductive
junctions and reads with otherindel
s.- USearch has been replaced by VSearch, as the license of the latter allows me to include it in the SONAR distribution. In general, I've included most other programs that SONAR uses in the new
third_party/
folder, so that install is smoother and the user doesn't have to input paths to all sorts of things. - In general, I've really tried to smooth out the install/setup process. Feedback welcome.
- I replaced the old master script
SONAR_master.pl
with a new onesonar
that will is more flexible+portable and therefore hopefully more useful. It is generated bysetup.sh
so as to include a hardcoded path to the SONAR home directory, allowing it be copied/moved to a convenient location and keep the PATH much cleaner. It also allows partial and fuzzy matching for target script names. - IgPhyML replaces DNAML as the phylogenetic engine of choice. It is included in the
third_party/
folder. 1.4_dereplicate sequences.pl
has been replaced by1.4-cluster_sequences.py
and2.1-calculate_id-div.pl
has been replaced by2.1-calculate_id-div.py
.- The mGSSP pipeline has been reworked a little bit to allow for multithreading. It also now accomodates masking primer positions and building GSSPs from nonproductive repertoires. See the mGSSP readme for more details.
- I added another FASTA extraction utility,
getReadsByAnnotation.py
to get more flexible subsets of reads. For instance, all reads assigned to IGHV1-2*02.
- Added a new mGSSP module. See our paper and the mGSSP readme for more details.