-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tabix returns row from VCF file multiple times #1470
Comments
It's been pointed out to me immediately that this is expected behavior as the length of the reference means that it overlaps the right flanking region in the third BED file, as opposed to my assumption that the VCF indexing would be strictly on the indicated starting position of the variant. |
Correct, VCF indices, like BAM, have overlap calculations rather than point sources, so you can find all variants that overlap a region. In bcftools, if you use a combination of a broad region ( Edit: just remembered this is tabix, which likely doesn't have the same options, but bcftools will - obviously - also view VCF files. :-) |
I'm not really clear on whether multi-region iterators work in general or just for reads in BAM/CRAM files — e.g., there's an Anyway, if they do work in general, then this issue would be solved by |
Thank you both for your helpful suggestions -- using bcftools in place of tabix is an option for me, and I've confirmed that specifying the BED format regions file together with |
Using tabix version 1.9 on CentOS 7
I am querying a public indexed dataset (gnomad annotations from the Broad institute) using tabix together with a regions file in BED format, and found that that in some cases a matching row may be returned multiple time. Two cases in which no duplicates of a particular annotation at position 1007743 are returned are when the BED file contains a single region (
region.bed
)or a disjoint region on the left (
region_flL.bed
):However, adding a disjoint region on the right leads to the issue (
region_flR.bed
):The following script reproduces the issue (assuming AWS credentials have been loaded into the appropriate environment variables):
Output:
The text was updated successfully, but these errors were encountered: