Is cpptrace support set cache size? #193

xiewajueji · 2024-12-03T03:02:31Z

Problem:
Some stacktrace is very large and it cost much memory in cache(about 2GB) and time to construct (about 4s).
In our system, 2GB memory consume can not be adopted, because 4GB can be utilized.

Although there are 3 cache mode of cpptrace, but two of them except prioritize_speed can't shared lookup tables between trace calls which could slow the system and could(guess?) use more momory when two "large" stacktrace is constructing.

Is there a way that could both share lookup tables between trace and limit the total cache size? thx.

jeremy-rifkin · 2024-12-03T04:19:32Z

Wow, 2GB of cache is huge. I haven't had to deal with anything of that magnitude yet and this may be something cpptrace isn't currently well-equipped to handle. Thanks for opening this issue.

I have a couple quick questions to help me understand your use-case better: Do you generate stack traces multiple times in a program? Do you need a trace generated reasonably quickly?

Is there any context you could provide about how big your application is? Do you happen to know if it utilizes many third-party libraries?

Some initial thoughts: Currently the caching isn't particularly smart, cpptrace just loads all relevant information from a translation unit. This is almost always fine, except for exceptionally large amounts of debug info. It should definitely be possible to implement something smarter. Attempting to load only relevant sections (and remember information needed to continue searches) could prove beneficial though DWARF doesn't make this as easy to do as I'd like. Something along the lines of a LRU cache might also be useful, though the use-case here is challenging as debug symbols are a hierarchical structure.

I will need to put together a test application for myself that's large enough to reproduce these issues, I'll have to look into that later.

xiewajueji · 2024-12-03T10:55:32Z

Thanks for your reply. Information will be served but not in time, as I'm diving into DWARF for more clue.

jeremy-rifkin · 2024-12-07T00:32:08Z

I'm ran some tests with a relatively large binary and massif reported most allocation was done by libdwarf internals pertaining to loading and parsing DWARF from the ELF. I'll do some more testing.

xiewajueji · 2024-12-10T09:39:02Z

sry. I'm really busy recently. I'll profile it this weekend. In my situation, after stacktrace is printed, allocated memory is retained.

xiewajueji · 2024-12-15T02:41:12Z

Do you generate stack traces multiple times in a program?
the problem is caused by one trace.

Do you need a trace generated reasonably quickly?
No, trace is printed when some error happen in most situation.

Is there any context you could provide about how big your application is?
The application itself is less than 2GB with debug build.

Do you happen to know if it utilizes many third-party libraries?
Yes, but the third-party libraries can't be found in trace caused the problem.

I can reproduce this problem.
The stacktrace is like that, Some address can't be symbol resolved are JIT function generated by LLVM:

[2024-12-15 10:22:06.730] [error] [thread-75884] [Exception.cpp:59] <AssertFail> p_assert(false) (/home/crab/WorkSpace/polars/polars-llvm/cpp/core/operator/window/SortedWindowOperator.cpp:62:27)
Stack trace (most recent call first):
#0  (inlined)          in auto polars::getStackTrace<(polars::StackTraceStrategy)1>(int, int) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/StackTrace.h:48
#1  0x000056320bfc315d in polars::Exception::Exception(polars::ErrorCode, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::source_location const&, bool)::$_2::operator()() const at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.cpp:56:27
#2  0x000056320bfc2e1f in polars::Exception::Exception(polars::ErrorCode, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::source_location const&, bool) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.cpp:70
#3  0x000056320bfc2bab in polars::Exception::Exception(polars::ErrorCode, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&, std::__1::source_location const&, bool) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.cpp:32
#4  0x000056320b064ed3 in polars::AssertFailException::AssertFailException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&, std::__1::source_location const&) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.h:54
#5  0x000056320be300ca in polars::core::window_operator::SortedWindowStateFactory::GlobalSinkStateImpl::nextPartition(unsigned char* const*&, unsigned char* const*&, char const*) at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/operator/window/SortedWindowOperator.cpp:62
#6  0x00007fb67abdd09c
#7  0x000056320b61e062 in std::__1::invoke_result<bool (*)(polars::core::RuntimeQueryResources*, long, long), polars::core::RuntimeQueryResources*, int, int>::type polars::codegen::VM::invoke<bool (*)(polars::core::RuntimeQueryResources*, long, long), polars::core::RuntimeQueryResources*, int, int>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, polars::core::RuntimeQueryResources*&&, int&&, int&&) at /home/crab/WorkSpace/polars/polars-llvm/cpp/codegen/VM.h:82
#8  0x000056320b61b21f in polars::core::PipelineTask::execute() at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/execution/PipelineTask.cpp:139
#9  0x000056320b60921a in polars::core::TaskExecutor::WorkForever(polars::core::TaskExecutor::ThreadState*) at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/execution/TaskExecutor.cpp:65
#10 0x000056320b60a88b in polars::core::TaskExecutor::setThreadNum(int)::$_0::operator()() const at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/execution/TaskExecutor.cpp:142
#11 0x000056320b60a844 in decltype(std::declval<polars::core::TaskExecutor::setThreadNum(int)::$_0>()()) std::__1::__invoke[abi:ne180100]<polars::core::TaskExecutor::setThreadNum(int)::$_0>(polars::core::TaskExecutor::setThreadNum(int)::$_0&&) at /usr/lib/llvm-18/bin/../include/c++/v1/__type_traits/invoke.h:344
#12 0x000056320b60a81c in void std::__1::__thread_execute[abi:ne180100]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, polars::core::TaskExecutor::setThreadNum(int)::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, polars::core::TaskExecutor::setThreadNum(int)::$_0>&, std::__1::__tuple_indices<...>) at /usr/lib/llvm-18/bin/../include/c++/v1/__thread/thread.h:193
#13 0x000056320b60a641 in void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, polars::core::TaskExecutor::setThreadNum(int)::$_0>>(void*) at /usr/lib/llvm-18/bin/../include/c++/v1/__thread/thread.h:202
#14 0x00007fb67aa41ac2 in start_thread at ./nptl/pthread_create.c:442
#15 0x00007fb67aad384f at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(stack trace cost: 69616ms)

If cache mode is prioritize_memory, 750MB memory is reserved after stacktrace, while cache mode is prioritize_memory, 2.8GB memory is reserved.

I suspected the JIT code cause the problem(try to scan all over the dwarf to find the no-existing symbol), but it is not as what I thought.

Is it a bug or really expensive? I'm comfusing.

jeremy-rifkin · 2024-12-17T02:37:34Z

Thanks for the additional information. I'll see about testing more this week.

xiewajueji · 2024-12-17T14:07:04Z

https://github.com/user-attachments/assets/c0809d65-1fba-49ba-8fb8-2264f1c03642

jeremy-rifkin added enhancement New feature or request high priority labels Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is cpptrace support set cache size? #193

Is cpptrace support set cache size? #193

xiewajueji commented Dec 3, 2024

jeremy-rifkin commented Dec 3, 2024

xiewajueji commented Dec 3, 2024 •

edited

Loading

jeremy-rifkin commented Dec 7, 2024

xiewajueji commented Dec 10, 2024 •

edited

Loading

xiewajueji commented Dec 15, 2024 •

edited

Loading

jeremy-rifkin commented Dec 17, 2024

xiewajueji commented Dec 17, 2024 •

edited

Loading

Is cpptrace support set cache size? #193

Is cpptrace support set cache size? #193

Comments

xiewajueji commented Dec 3, 2024

jeremy-rifkin commented Dec 3, 2024

xiewajueji commented Dec 3, 2024 • edited Loading

jeremy-rifkin commented Dec 7, 2024

xiewajueji commented Dec 10, 2024 • edited Loading

xiewajueji commented Dec 15, 2024 • edited Loading

jeremy-rifkin commented Dec 17, 2024

xiewajueji commented Dec 17, 2024 • edited Loading

xiewajueji commented Dec 3, 2024 •

edited

Loading

xiewajueji commented Dec 10, 2024 •

edited

Loading

xiewajueji commented Dec 15, 2024 •

edited

Loading

xiewajueji commented Dec 17, 2024 •

edited

Loading