Inferring sex in 0.2.19 #140

fellen31 · 2024-08-13T08:30:30Z

Hi Brent,

I'm having trouble inferring sex after 7740a33. It should be a high quality sample but n_hom_alt / n_het ~ 0.83, and sex is therefore not updated I think.

somalier/src/somalierpkg/relate.nim

Lines 460 to 461 in 7740a33

    
           # should have fewer hom-alts[2] than hets[1] 
        
           result = gt_counts[2][i].float / gt_counts[1][i].float < 0.7

family_id	sample_id	paternal_id	maternal_id	sex	phenotype	original_pedigree_sex	gt_depth_mean	gt_depth_sd	depth_mean	depth_sd	ab_mean	ab_std	n_hom_ref	n_het	n_hom_alt	n_unknown	p_middling_ab	X_depth_mean	X_n	X_hom_ref	X_het	X_hom_alt	Y_depth_mean	Y_n
-	-	0	0	0	2	unknown	31.8	8.9	31.8	9.0	0.52	0.39	4691	6696	5556	441	0.010	17.26	349	168	0	181	18.73	15

I also tested another sample reaching a similar ratio (~0.83). Could you help me understand?

Thanks,
Felix

The text was updated successfully, but these errors were encountered:

brentp · 2024-08-13T14:46:32Z

Hi, that does seem like a lot of hom-alts.
I suppose we could bump 0.7 to another arbitrary cutoff like 0.85, but I wonder if something else is going on with your sample. When you look at the somalier html output, is it an outlier in the plots?
Perhaps I should separate out the relatedness inference from the sex inference since the sex inference is quite easy.

fellen31 · 2024-08-15T09:11:57Z

Thanks for getting back so quickly, I ran the same sample together with 11 others, they all have ratios ranging from 0.81 to 0.93 except one outlier with 1.06. I can't really see anything strange in the plots otherwise, but I might be missing something.

If possible I think that would be great. I'd like to be able to make a best guess for samples with unknown sex to be able to start variant calling and other downstream processes, then once finished the user can go back and look at the html and check whether the inference was correct or not.

brentp · 2024-08-15T13:42:43Z

were the counts for the sample extracted directly from the bam files? if not, how was the VCF called?

fellen31 · 2024-08-15T13:45:41Z

Directly from the bam files, using sites.hg38.vcf.gz and GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz. It's PacBio long-read data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inferring sex in 0.2.19 #140

Inferring sex in 0.2.19 #140

fellen31 commented Aug 13, 2024

brentp commented Aug 13, 2024

fellen31 commented Aug 15, 2024

brentp commented Aug 15, 2024

fellen31 commented Aug 15, 2024

Inferring sex in 0.2.19 #140

Inferring sex in 0.2.19 #140

Comments

fellen31 commented Aug 13, 2024

brentp commented Aug 13, 2024

fellen31 commented Aug 15, 2024

brentp commented Aug 15, 2024

fellen31 commented Aug 15, 2024