Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when using Document.text() between RawImageOutput.get() calls #9

Open
ashutoshvarma opened this issue Aug 4, 2020 · 2 comments
Assignees
Labels
bug Something isn't working segfault

Comments

@ashutoshvarma
Copy link
Owner

Steps to Reproduce:-

_test.py

doc = x.Document("samples/nonfree/mandarin.pdf")
iout = x.RawImageOutput(doc)
iout.get(0)
doc.text()
iout.get(0)
python _test.py

Output :-

Syntax Error: Couldn't find a font for 'TimesNewRomanPS-ItalicMT'
[1]    6040 segmentation fault (core dumped)  python _test.py

System:-

  • OS : Clear Linux OS x86_64
  • Python : 3.8.5 (debug)
  • pyxpdf version : v0.2.2
  • pyxpdf_data : v1.0.1
  • Pillow : v7.2.0
@ashutoshvarma ashutoshvarma added bug Something isn't working segfault labels Aug 4, 2020
@ashutoshvarma ashutoshvarma self-assigned this Aug 4, 2020
@ashutoshvarma
Copy link
Owner Author

GDB Stack trace :

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff75817f2 in XRef::fetch (this=this@entry=0x20, num=7, gen=0, obj=obj@entry=0x7fffffffad30, recursion=recursion@entry=0)
    at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/XRef.cc:1039
1039	 if (num < 0 || num >= size) {

Backtrace:-

#0  0x00007ffff75817f2 in XRef::fetch (this=this@entry=0x20, num=7, gen=0, obj=obj@entry=0x7fffffffad30, recursion=recursion@entry=0)
    at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/XRef.cc:1039
#1  0x00007ffff75655bc in Object::fetch (this=this@entry=0x7fffffffad20, xref=xref@entry=0x20, obj=obj@entry=0x7fffffffad30, recursion=recursion@entry=0)
    at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/Object.cc:114
#2  0x00007ffff754a7c8 in GfxFontDict::GfxFontDict (this=0x555555b18340, xref=0x20, fontDictRef=0x0, fontDict=0x555555b16f80)
    at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/GfxFont.cc:2115
#3  0x00007ffff7536253 in GfxResources::GfxResources (this=0x555555b1c630, xref=0x20, resDict=0x555555b16d80, nextA=0x0)
    at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/Object.h:161
#4  0x00007ffff7536a3b in Gfx::Gfx (this=0x555555a60230, docA=<optimized out>, outA=0x555555a9cc70, pageNum=1, resDict=0x555555b16d80, hDPI=150, vDPI=150, 
    box=0x7fffffffaea0, cropBox=0x0, rotate=0, abortCheckCbkA=0x0, abortCheckCbkDataA=0x0) at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/Gfx.cc:509
#5  0x00007ffff756869a in Page::displaySlice (this=0x555555b17290, out=0x555555a9cc70, hDPI=150, vDPI=150, rotate=0, useMediaBox=<optimized out>, crop=<optimized out>, 
    sliceX=<optimized out>, sliceY=0, sliceW=1275, sliceH=1651, printing=0, abortCheckCbk=0x0, abortCheckCbkData=0x0)
    at /home/oreki/Projects/pyxpdf/build/tmp/libxpdf/xpdf-4.02/xpdf/Object.h:161
#6  0x00007ffff74b8de0 in __pyx_f_6pyxpdf_4xpdf_4Page_display_slice (__pyx_v_self=0x7ffff7757050, __pyx_v_out=0x555555a9cc70, __pyx_v_x1=0, __pyx_v_y1=0, __pyx_v_hgt=1275, 
    __pyx_v_wdt=1651, __pyx_optional_args=0x7fffffffb100) at src/pyxpdf/xpdf.cpp:28114
#7  0x00007ffff749429d in __pyx_f_6pyxpdf_4xpdf_14RawImageOutput__get_SplashBitmap (__pyx_v_self=0x7ffff6fd0ad0, __pyx_v_page_no=0, __pyx_v_x=0, __pyx_v_y=0, 
    __pyx_v_w=1275, __pyx_v_h=1651, __pyx_v_page_h=1650.0000000000002, __pyx_v_page_w=1275, __pyx_v_res_x=150, __pyx_v_res_y=150) at src/pyxpdf/xpdf.cpp:19167
#8  0x00007ffff7496671 in __pyx_f_6pyxpdf_4xpdf_14RawImageOutput__get_normalize_SplashBitmap (__pyx_v_self=0x7ffff6fd0ad0, __pyx_v_page_no=0, __pyx_v_crop_x=0, 
    __pyx_v_crop_y=0, __pyx_v_crop_h=0, __pyx_v_crop_w=0, __pyx_v_scale_x=0, __pyx_v_scale_y=0) at src/pyxpdf/xpdf.cpp:19650
#9  0x00007ffff7498258 in __pyx_f_6pyxpdf_4xpdf_14RawImageOutput_get (__pyx_v_self=0x7ffff6fd0ad0, __pyx_v_page_no=0, __pyx_skip_dispatch=1, 
    __pyx_optional_args=0x7fffffffb590) at src/pyxpdf/xpdf.cpp:19954
#10 0x00007ffff7498ed1 in __pyx_pf_6pyxpdf_4xpdf_14RawImageOutput_2get (__pyx_v_self=0x7ffff6fd0ad0, __pyx_v_page_no=0, __pyx_v_crop_box=(0, 0, 0, 0), 
    __pyx_v_scale_pixel_box=None) at src/pyxpdf/xpdf.cpp:20125
#11 0x00007ffff7498cd9 in __pyx_pw_6pyxpdf_4xpdf_14RawImageOutput_3get (__pyx_v_self=<pyxpdf.xpdf.RawImageOutput at remote 0x7ffff6fd0ad0>, __pyx_args=(0,), __pyx_kwds=0x0)
    at src/pyxpdf/xpdf.cpp:20102
#12 0x00007ffff750c00c in __Pyx_CyFunction_CallMethod (func=<cython_function_or_method at remote 0x7ffff6cee2f0>, 
    self=<pyxpdf.xpdf.RawImageOutput at remote 0x7ffff6fd0ad0>, arg=(0,), kw=0x0) at src/pyxpdf/xpdf.cpp:47923
#13 0x00007ffff750c3d0 in __Pyx_CyFunction_CallAsMethod (func=<cython_function_or_method at remote 0x7ffff6cee2f0>, 
    args=(<pyxpdf.xpdf.RawImageOutput at remote 0x7ffff6fd0ad0>, 0), kw=0x0) at src/pyxpdf/xpdf.cpp:47986
#14 0x00005555555d9f77 in _PyObject_MakeTpCall (callable=<cython_function_or_method at remote 0x7ffff6cee2f0>, args=<optimized out>, nargs=<optimized out>, keywords=0x0)
    at Objects/call.c:159
#15 0x00005555557b9a91 in _PyObject_Vectorcall (kwnames=0x0, nargsf=2, args=0x5555559dd630, callable=<cython_function_or_method at remote 0x7ffff6cee2f0>)
    at ./Include/cpython/abstract.h:125
#16 method_vectorcall (method=<optimized out>, args=0x5555559dd638, nargsf=<optimized out>, kwnames=0x0) at Objects/classobject.c:60
#17 0x00005555555c6044 in _PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x5555559dd638, callable=<method at remote 0x7ffff78564d0>)
    at ./Include/cpython/abstract.h:127
#18 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555975250) at Python/ceval.c:4963
#19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3469
#20 0x00005555556b0581 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x5555559dd4c0, for file _test.py, line 275, in <module> ()) at Python/ceval.c:741
#21 _PyEval_EvalCodeWithName (_co=_co@entry=<code at remote 0x7ffff77b9790>, 
    globals=globals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='_test.py') at remote 0x7ffff7

@ashutoshvarma
Copy link
Owner Author

Problem is in Document.display_pages() which wrap the calls to xpdf's PDFDoc.displayPages()

void PDFDoc::displayPages(OutputDev *out, int firstPage, int lastPage,
			  double hDPI, double vDPI, int rotate,
			  GBool useMediaBox, GBool crop, GBool printing,
			  GBool (*abortCheckCbk)(void *data),
			  void *abortCheckCbkData) {
  int page;

  for (page = firstPage; page <= lastPage; ++page) {
    displayPage(out, page, hDPI, vDPI, rotate, useMediaBox, crop, printing,
		abortCheckCbk, abortCheckCbkData);
    catalog->doneWithPage(page);  // ---------------> This unload the Pages
  }
}

text() and text_bytes() methods use Document.display_pages() wrapper method which
wrap the call to xpdf's PDFDoc.displayPages() cpp method which after running the loop
for displayPage() unloads the internal xpdf's Page Class by calling
Catalog.doneWithPage() cpp method.
But our Extension class Page keeps pointing to old pointer of xpdf's Page.
If you then do any operation involving them such as displayPageSlice() it causes
SEGFAULT.

ashutoshvarma added a commit that referenced this issue Aug 16, 2020
Observation:
`text()` and `text_bytes()` methods use `Document.display_pages()` wrapper method which
wrap the call to xpdf's `PDFDoc.displayPages()` cpp method which after running the loop
for `displayPage()` unloads the internal xpdf's `Page` Class by calling
`Catalog.doneWithPage()` cpp method.
But our wrapper Extension class `Page` keeps pointing to old pointer of xpdf's `Page`.
If you then do any operation involving them such as `displayPageSlice()` it causes
SEGFAULT.

Fix:
Changed the `Document.display_pages()` to just do the same as `displayPages()` except
unloading Pages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working segfault
Projects
None yet
Development

No branches or pull requests

1 participant