You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the runninfg of SOS_busco.py in process busco4_dist, I got following error,
Command error:
Traceback (most recent call last):
File "/mnt/data/software/TransPi/bin/SOS_busco.py", line 38, in <module>
busco_df = pd.read_csv(input_busco_file, sep=',',header=0,names=['Busco_id','Status','Sequence','Score','Length'])
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 458, in _read
data = parser.read(nrows)
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 905, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 51, saw 8
I think this is a problem for SOS_busco.py input file(In my case, Read_R_all_busco4.tsv).
Most of lines of my Read_R_all_busco4.tsv have 6 commas (7 columns), like this. 0at38820,Duplicated,SOAP.k25.scaffold27258,8202.3,4167,https://www.orthodb.org/v10?query=0at38820,sacsin
However, some lines of my file have 7 or 8 commas ( 8 or 9 columns) like this. 121at38820,Complete,SOAP.k25.scaffold11722,3027.5,1446,https://www.orthodb.org/v10?query=121at38820,Zinc finger, RING-type
I think that this difference in the number of commas (columns) is the cause of this pandas error.
SOS_busco.py doesn't seem to use columns 6 onwards in the input file.
If so, we can remove columns 6 onwards before SOS_busco.py.
No worries. Thanks for finding issues and providing suggestions to TransPi. We appreciate it.
You are right, the last column will cause issues since the name has a comma and SOS_busco.py will fail. I think the easiest solution is what you suggested. I will do a test and modify the code. Thanks!
Hi, I apologize for my frequent contacts.
When the runninfg of SOS_busco.py in process busco4_dist, I got following error,
I think this is a problem for SOS_busco.py input file(In my case, Read_R_all_busco4.tsv).
Most of lines of my Read_R_all_busco4.tsv have 6 commas (7 columns), like this.
0at38820,Duplicated,SOAP.k25.scaffold27258,8202.3,4167,https://www.orthodb.org/v10?query=0at38820,sacsin
However, some lines of my file have 7 or 8 commas ( 8 or 9 columns) like this.
121at38820,Complete,SOAP.k25.scaffold11722,3027.5,1446,https://www.orthodb.org/v10?query=121at38820,Zinc finger, RING-type
I think that this difference in the number of commas (columns) is the cause of this pandas error.
SOS_busco.py doesn't seem to use columns 6 onwards in the input file.
If so, we can remove columns 6 onwards before SOS_busco.py.
TransPi/TransPi.nf
Lines 1591 to 1592 in 899d160
This is an example of my suggestion for revising.
I hope this helps you.
Thank you.
The text was updated successfully, but these errors were encountered: