Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid gVCF lines #5

Open
abeconnelly opened this issue Jan 14, 2016 · 1 comment
Open

Invalid gVCF lines #5

abeconnelly opened this issue Jan 14, 2016 · 1 comment

Comments

@abeconnelly
Copy link

Here is a snippet of a CGI-Var file:

1265    2       all     chr1    68316   68543   ref     =       =                                                       
1266    2       all     chr1    68543   68550   no-call =       ?                                                       
1267    2       all     chr1    68550   68640   ref     =       =                                                       
1268    2       all     chr1    68640   68640   no-call =       ?                                                       
1269    2       all     chr1    68640   68893   ref     =       =                                                       
1270    2       1       chr1    68893   68896   no-call TAG     ?                                                       
1270    2       2       chr1    68893   68896   snp     TAG     TAA     96      96      PASS            dbsnp.100:rs2854683             

that, after running cgivar2gvcf produces:

chr1    68317   .       T       .       .       PASS    END=68543       GT      0/0
chr1    68544   .       T       .       .       NOCALL  END=68550       GT      ./.
chr1    68551   .       C       .       .       PASS    END=68640       GT      0/0
chr1    68641   .       T       .       .       NOCALL  END=68640       GT      ./.
chr1    68641   .       T       .       .       PASS    END=68893       GT      0/0
chr1    68894   rs2854683       TAG     TAA     .       NOCALL  .       GT      1/.

As you can see, there are two lines beginning at different start points (68551 and 86641) but ending at the same endpoint (68640). I'm not sure if this is actually an error in the CGI-Var file as the problem looks to have stemmed from the 0-length 'no-call' line in the originating CGI-Var file.

I've attached a small test CGI-Var file will produce the above gVCF when run against cgivar2gvcf.
indel_nstar.cgivar.txt

@madprime
Copy link
Owner

It looks like this is a broader issue, the handling of zero-width positions is generally not handled well in the current gVCF translation.

The complete genomics format represents some types of variations with zero-width reference length, but VCF needs a width of at least one for reference position.
For insertion variants the solution was to back up one position and use that base as reference, and prepend it to the variation. That was fine for VCF, but the addition of reference and no-call lines in gVCF means more needs to be done. (e.g. for an insertion the preceding reference line should also be edited to shift the end backwards.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants