Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferring sex in 0.2.19 #140

Open
fellen31 opened this issue Aug 13, 2024 · 4 comments
Open

Inferring sex in 0.2.19 #140

fellen31 opened this issue Aug 13, 2024 · 4 comments

Comments

@fellen31
Copy link

Hi Brent,

I'm having trouble inferring sex after 7740a33. It should be a high quality sample but n_hom_alt / n_het ~ 0.83, and sex is therefore not updated I think.

# should have fewer hom-alts[2] than hets[1]
result = gt_counts[2][i].float / gt_counts[1][i].float < 0.7

family_id sample_id paternal_id maternal_id sex phenotype original_pedigree_sex gt_depth_mean gt_depth_sd depth_mean depth_sd ab_mean ab_std n_hom_ref n_het n_hom_alt n_unknown p_middling_ab X_depth_mean X_n X_hom_ref X_het X_hom_alt Y_depth_mean Y_n
- -  0 0 0 2 unknown 31.8 8.9 31.8 9.0 0.52 0.39 4691 6696 5556 441 0.010 17.26 349 168 0 181 18.73 15

I also tested another sample reaching a similar ratio (~0.83). Could you help me understand?

Thanks,
Felix

@brentp
Copy link
Owner

brentp commented Aug 13, 2024

Hi, that does seem like a lot of hom-alts.
I suppose we could bump 0.7 to another arbitrary cutoff like 0.85, but I wonder if something else is going on with your sample. When you look at the somalier html output, is it an outlier in the plots?
Perhaps I should separate out the relatedness inference from the sex inference since the sex inference is quite easy.

@fellen31
Copy link
Author

Thanks for getting back so quickly, I ran the same sample together with 11 others, they all have ratios ranging from 0.81 to 0.93 except one outlier with 1.06. I can't really see anything strange in the plots otherwise, but I might be missing something.

If possible I think that would be great. I'd like to be able to make a best guess for samples with unknown sex to be able to start variant calling and other downstream processes, then once finished the user can go back and look at the html and check whether the inference was correct or not.

@brentp
Copy link
Owner

brentp commented Aug 15, 2024

were the counts for the sample extracted directly from the bam files? if not, how was the VCF called?

@fellen31
Copy link
Author

Directly from the bam files, using sites.hg38.vcf.gz and GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz. It's PacBio long-read data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants