Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to representation pangenome using this tools #1

Open
GeorgeBGM opened this issue Nov 18, 2023 · 4 comments
Open

How to representation pangenome using this tools #1

GeorgeBGM opened this issue Nov 18, 2023 · 4 comments

Comments

@GeorgeBGM
Copy link

Hi,
I have built the Graph Pan-genome, how should I use the tool to characterize it and dig deeper into the potential information it contains. Or is there any other analysis tool recommended.

Best, Du

@KiranJavkar
Copy link
Owner

Hi Du,

Thanks a lot for using PRAWNS and reaching out with your query! The output folder generated from PRAWNS should contain a couple of fastq files: metablocks.fastq and retained_blocks.fastq. Based on the type of downstream analysis you may like to pursue, you can use these fastq files, primarily the metablocks.fastq file, to perform the alignments or other related comparisons

The PRAWNS manuscript provides a few instances of these use cases: https://academic.oup.com/bioinformatics/article/39/1/btac844/6965020 (Section 3.4)

For instance, you can identify the conserved_regions of higher interest, say those with a length of at least 500 bp, and BLAST them via NCBI Web's BLAST search, the nr database, or a more specialized database like the antimicrobial genes.

Once you detect some functionally important conserved regions or paired regions, you can go back to the presence-absence and coords (coordinates) CSV files to identify the genomes where these regions exist and the genomic context of their presence.

Please let me know if this helps or have further questions. In case you need specific assistance with your problem statement, please reach out to me over email and we can discuss this elaborately.

Thanks again,
-Regards,
Kiran Javkar

@GeorgeBGM
Copy link
Author

GeorgeBGM commented Dec 7, 2023 via email

@KiranJavkar
Copy link
Owner

Using PRAWNS to analyze the pan-genomic constructs would need some background pre-processing since PRAWNS doesn't take gfa file inputs.
You would need to create separate fasta files each containing the contigs to be compared. For instance, if you like to compare 10 paths within a genome assembly subgraph from the gfa file, each of these 10 paths needs to be saved into 10 different fasta files. These 10 fasta files would then act as 10 "genomes" which can be given as input to PRAWNS (through the appropriate CSV file generated)

There are several gfa to fasta converters available:

The critical challenge would be to decompose the human constructs of your interest into separate fasta files.
Once you get to that point, you can run PRAWNS for these fasta files just like you would do for any collection of genomes.

Hope this answers your question.
Let me know if you have further queries and thanks again for using PRAWNS!

@GeorgeBGM
Copy link
Author

GeorgeBGM commented Dec 8, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants