Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionality not to use BLAST #67

Open
borgesadair1 opened this issue Oct 6, 2023 · 1 comment
Open

Optionality not to use BLAST #67

borgesadair1 opened this issue Oct 6, 2023 · 1 comment

Comments

@borgesadair1
Copy link
Member

BLAST often fails, and also using too many BLAST resources can get you banned from using BLAST. Right now the pipeline fails (in query mode) if it receives no BLAST input. It would be helpful to be able to choose not to use BLAST for certain jobs. Also more generally it would be nice to move away from using the public ncbi BLAST resource but that's probably a bigger change!

@naailkhan28
Copy link
Contributor

As far as I can see, the BLAST results are only used to make a list of RefSeq sequence accessions, which are then mapped to UniProt identifiers for use in the next steps of the pipeline. I can think of a couple of ways of getting such a list of identifiers without having to use BLAST:

  • Allow users to use their own MSA file or BLAST/mmseqs2 hits table, and read the sequence accessions from it. We'd have to specify that users only supply RefSeq or UniProt accessions

  • Use the ColabFold API to make requests to mmseqs2 to get back hits. This is fast and seems to be much more reliable than BLAST requests. See the run_mmseqs2() function from this file in the ColabFold GitHub.

I've been playing around with the latter and it might work well. Although we'd need to reproduce the previous results to see if they're replicable with mmseqs2 vs BLAST (which I would expect them to be)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants