Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase support for importing and converting data in multiple formats #3

Open
starsyi opened this issue Oct 24, 2023 · 3 comments
Open

Comments

@starsyi
Copy link

starsyi commented Oct 24, 2023

Is it also possible to support importing other formats, such as bed and tsv result files that output methylation modification information from modkit?

Since nanopolish does not support the latest R10.4 chemistry method and dorado/remora is now the standard method for obtaining nanopore methylation calls, it would be great to be able to use meth5 and pycometh with modbams generated by remora.

@snajder-r
Copy link
Contributor

The nanopolish output that is currently supported is a TSV file. If you can format your TSV file to contain the required columns (see below) you can import it as if it were a nanopolish output:

    chromosome    start        end          read_name      log_lik_ratio
    chr1          30012312     30012312     aksdlaksdlas   -4.542

The order of the columns does not matter either, as long as you have a single header line with these column names. Personally I don't have the capacity right now to implement explicit conversion commands for modkit or other tools, but I'll leave the issue open and will be happy to accept pull requests that come with test data.

@PanZiwei
Copy link

PanZiwei commented Dec 14, 2023

Nanopolish calls 5mCs with a log-likelihood ratio and set up a specific cutoff for methylation calling, but other tools like DeepSignal or Guppy predict a methylation calling probablity for each site instead, and these 2 values can't be converted as far as I know. How to solve the issue?

In thse case, is the log_lik_ratio conversion column necessary for the conversion? Does the column support methylation probablity? How does it contribute to the meth5 conversion? Thanks!

@snajder-r
Copy link
Contributor

The column is required you'll need to convert from methylation probability (range 0-1) to log likelihood ratio (range negative infinity - positive infinity).

Assuming an uninformative prior, use the logit function to convert:

log_lik_ratio = logit(p) = ln(p/(1-p)) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants