Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in run diffdomains.py dvsd multiple #28

Open
jinshangkun opened this issue Dec 19, 2024 · 8 comments
Open

error in run diffdomains.py dvsd multiple #28

jinshangkun opened this issue Dec 19, 2024 · 8 comments

Comments

@jinshangkun
Copy link

Thanks for a good tool for HiC data processing.
When I run it, I got an error that can not be solved by myself. Pls help me to solve this problem. Looking forward to your reply.
The script used is as follows:
python /home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/diffdomains.py dvsd multiple A_10dpa_20000.matrix.cool B_10dpa_20000.matrix.cool A_10dpa_20000_domains.bed --reso 20000 --ofile A_B_10dpa --ncore 1

The error information:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/diffdomains.py", line 66, in
comp2domins_by_twtest_parallel(0)
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/diffdomains.py", line 57, in comp2domins_by_twtest_parallel
tmp_res = comp2domins_by_twtest(chrn=tadb.iloc[i, 0], start=tadb.iloc[i, 1],
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/utils.py", line 339, in comp2domins_by_twtest
mat1 = contact_matrix_from_hic(chrn, start, end, reso, fhic1, hicnorm)
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/utils.py", line 194, in contact_matrix_from_hic
mat = c.matrix(balance=False).fetch(region2)
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/cooler/api.py", line 402, in matrix
return RangeSelector2D(field, _slice, _fetch, (self._info["nbins"],) * 2)
KeyError: 'nbins'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/diffdomains.py", line 57, in comp2domins_by_twtest_parallel
tmp_res = comp2domins_by_twtest(chrn=tadb.iloc[i, 0], start=tadb.iloc[i, 1],
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/utils.py", line 339, in comp2domins_by_twtest
mat1 = contact_matrix_from_hic(chrn, start, end, reso, fhic1, hicnorm)
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/utils.py", line 194, in contact_matrix_from_hic
mat = c.matrix(balance=False).fetch(region2)
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/cooler/api.py", line 402, in matrix
return RangeSelector2D(field, _slice, _fetch, (self._info["nbins"],) * 2)
KeyError: 'nbins'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/site-packages/diffdomain_py3/diffdomains.py", line 76, in
result.append(i.get())
File "/home/jsk/miniconda3/envs/py39/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: 'nbins'

@liuyc27
Copy link
Collaborator

liuyc27 commented Dec 19, 2024

@jinshangkun ,Thank you for your question.

The error you're encountering is due to the missing 'nbins' key in the .cool files. Here's a detailed analysis of the error:

  1. KeyError: 'nbins': The 'nbins' key is missing in the cool file. The code expects this key to retrieve the matrix for the given region. If the cool file was not generated correctly or the data is incomplete, this error occurs.

  2. Possible causes:

    • The .cool files may not have been generated correctly or could be corrupted.
    • The cool file's data may be incomplete, missing the 'nbins' field.
    • There could be a mismatch between the expected format or resolution of the input files.
  3. Solution:

    • Check the .cool files: Ensure your cool files (A_10dpa_20000.matrix.cool and B_10dpa_20000.matrix.cool) are correctly formatted and include the necessary metadata. You can use the following code to check the file's metadata:
      import cooler
      c = cooler.Cooler('A_10dpa_20000.matrix.cool')
      print(c.info)  # This should display metadata including 'nbins'
    • Regenerate the .cool files: If the metadata is missing, regenerate the cool files, ensuring they include the 'nbins' field.

By verifying these points, you should be able to resolve the issue. If the cool files are correct and the issue persists, don't hesitate to contact me for further assistance.

@jinshangkun
Copy link
Author

@liuyc27 Thank you for your reply. I checked my cool files and found it including the nbins field.

c = cooler.Cooler('A_10dpa_20000.matrix.cool')
print (c.info)
{'bin-size': 20000, 'bin-type': 'fixed', 'creation-date': '2024-05-30T13:45:24.088781', 'format': 'HDF5::Cooler', 'format-url': 'https://github.com/mirnylab/cooler', 'format-version': 3, 'generated-by': 'HiCMatrix-13', 'generated-by-cooler-lib': 'cooler-0.8.5', 'genome-assembly': 'unknown', 'metadata': {}, 'nbins': 125233, 'nchroms': 12098, 'nnz': 178053405, 'storage-mode': 'symmetric-upper', 'sum': 305740547.0, 'tool-url': 'https://github.com/deeptools/HiCMatrix'}
c = cooler.Cooler('B_10dpa_20000.matrix.cool')
print (c.info)
{'bin-size': 20000, 'bin-type': 'fixed', 'creation-date': '2024-06-08T14:47:30.708465', 'format': 'HDF5::Cooler', 'format-url': 'https://github.com/mirnylab/cooler', 'format-version': 3, 'generated-by': 'HiCMatrix-13', 'generated-by-cooler-lib': 'cooler-0.8.5', 'genome-assembly': 'unknown', 'metadata': {}, 'nbins': 125233, 'nchroms': 12098, 'nnz': 222449502, 'storage-mode': 'symmetric-upper', 'sum': 425445947.0, 'tool-url': 'https://github.com/deeptools/HiCMatrix'}

The cool files were generated by 'hicConvertFormat -m A_10dpa_20000.matrix -bf A_10dpa_20000_abs.bed --inputFormat hicpro --outputFormat cool -o A_10dpa_20000.matrix.cool -r 20000' (maybe there's a problem in this step)

@liuyc27
Copy link
Collaborator

liuyc27 commented Dec 19, 2024

Hi @jinshangkun,

Could you please provide the two .cool files that are causing the issue? This will help us investigate the problem further.

Thank you!

@jinshangkun
Copy link
Author

@liuyc27 Could you provide an e-mail address? The data is too big to upload to github. Thanks!

@liuyc27
Copy link
Collaborator

liuyc27 commented Dec 20, 2024

@jinshangkun You are welcome to send your email to [email protected].

Additionally, the solutions for '.cool file multiple processing errors #18' and 'output file #20' might be useful to you. I recommend checking them out for further insights.

Thank you once again for your support!

@jinshangkun
Copy link
Author

@liuyc27 Thanks! I have sent the cool file to your email.

@liuyc27
Copy link
Collaborator

liuyc27 commented Dec 20, 2024

@jinshangkun Thank you for your data. However, there is an issue with the bed file. Please check the "2. Input format" section in the wiki to ensure the bed file is correct. It should have these columns with a header, no matter what the column names are:

  • chr1: Chromosome name
  • x1: TAD start point
  • x2: TAD end point

Please ensure that the chromosome names in the chr1 column match those in your cool file. You can use the following code to check the chromosome names in the cool file:

import cooler

# Load .cool file
cool_file = cooler.Cooler('A_10dpa_20000.matrix.cool')

# Get chromosome information
chromosomes = cool_file.chroms()

# Get chromosome names
chromosome_names = list(chromosomes)
# Print chromosome names
print("Chromosomes:", chromosome_names)

If you encounter an "unknown label A01" error, follow these steps:

  1. Check if the chromosome names in the bed file's chr1 column match those in the cool file (A_10dpa_20000.matrix_20k_KR.cool). This error may occur because when you load the cool file, diffdomain generates a new cool file with resolution and normalization information in the filename suffix. The chr1 names in the bed file must match the new cool file.
  2. If they do not match, modify the chr1 column in the bed file to match the cool file.

If you have any other questions, feel free to contact us.

@jinshangkun
Copy link
Author

@liuyc27 Thanks!I will try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants