-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is cpptrace support set cache size? #193
Comments
Wow, 2GB of cache is huge. I haven't had to deal with anything of that magnitude yet and this may be something cpptrace isn't currently well-equipped to handle. Thanks for opening this issue. I have a couple quick questions to help me understand your use-case better: Do you generate stack traces multiple times in a program? Do you need a trace generated reasonably quickly? Is there any context you could provide about how big your application is? Do you happen to know if it utilizes many third-party libraries? Some initial thoughts: Currently the caching isn't particularly smart, cpptrace just loads all relevant information from a translation unit. This is almost always fine, except for exceptionally large amounts of debug info. It should definitely be possible to implement something smarter. Attempting to load only relevant sections (and remember information needed to continue searches) could prove beneficial though DWARF doesn't make this as easy to do as I'd like. Something along the lines of a LRU cache might also be useful, though the use-case here is challenging as debug symbols are a hierarchical structure. I will need to put together a test application for myself that's large enough to reproduce these issues, I'll have to look into that later. |
Thanks for your reply. Information will be served but not in time, as I'm diving into DWARF for more clue. |
I'm ran some tests with a relatively large binary and massif reported most allocation was done by libdwarf internals pertaining to loading and parsing DWARF from the ELF. I'll do some more testing. |
sry. I'm really busy recently. I'll profile it this weekend. In my situation, after stacktrace is printed, allocated memory is retained. |
Do you generate stack traces multiple times in a program? Do you need a trace generated reasonably quickly? Is there any context you could provide about how big your application is? Do you happen to know if it utilizes many third-party libraries? I can reproduce this problem. [2024-12-15 10:22:06.730] [error] [thread-75884] [Exception.cpp:59] <AssertFail> p_assert(false) (/home/crab/WorkSpace/polars/polars-llvm/cpp/core/operator/window/SortedWindowOperator.cpp:62:27)
Stack trace (most recent call first):
#0 (inlined) in auto polars::getStackTrace<(polars::StackTraceStrategy)1>(int, int) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/StackTrace.h:48
#1 0x000056320bfc315d in polars::Exception::Exception(polars::ErrorCode, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::source_location const&, bool)::$_2::operator()() const at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.cpp:56:27
#2 0x000056320bfc2e1f in polars::Exception::Exception(polars::ErrorCode, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::source_location const&, bool) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.cpp:70
#3 0x000056320bfc2bab in polars::Exception::Exception(polars::ErrorCode, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&, std::__1::source_location const&, bool) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.cpp:32
#4 0x000056320b064ed3 in polars::AssertFailException::AssertFailException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&&, std::__1::source_location const&) at /home/crab/WorkSpace/polars/polars-llvm/cpp/common/Exception.h:54
#5 0x000056320be300ca in polars::core::window_operator::SortedWindowStateFactory::GlobalSinkStateImpl::nextPartition(unsigned char* const*&, unsigned char* const*&, char const*) at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/operator/window/SortedWindowOperator.cpp:62
#6 0x00007fb67abdd09c
#7 0x000056320b61e062 in std::__1::invoke_result<bool (*)(polars::core::RuntimeQueryResources*, long, long), polars::core::RuntimeQueryResources*, int, int>::type polars::codegen::VM::invoke<bool (*)(polars::core::RuntimeQueryResources*, long, long), polars::core::RuntimeQueryResources*, int, int>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, polars::core::RuntimeQueryResources*&&, int&&, int&&) at /home/crab/WorkSpace/polars/polars-llvm/cpp/codegen/VM.h:82
#8 0x000056320b61b21f in polars::core::PipelineTask::execute() at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/execution/PipelineTask.cpp:139
#9 0x000056320b60921a in polars::core::TaskExecutor::WorkForever(polars::core::TaskExecutor::ThreadState*) at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/execution/TaskExecutor.cpp:65
#10 0x000056320b60a88b in polars::core::TaskExecutor::setThreadNum(int)::$_0::operator()() const at /home/crab/WorkSpace/polars/polars-llvm/cpp/core/execution/TaskExecutor.cpp:142
#11 0x000056320b60a844 in decltype(std::declval<polars::core::TaskExecutor::setThreadNum(int)::$_0>()()) std::__1::__invoke[abi:ne180100]<polars::core::TaskExecutor::setThreadNum(int)::$_0>(polars::core::TaskExecutor::setThreadNum(int)::$_0&&) at /usr/lib/llvm-18/bin/../include/c++/v1/__type_traits/invoke.h:344
#12 0x000056320b60a81c in void std::__1::__thread_execute[abi:ne180100]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, polars::core::TaskExecutor::setThreadNum(int)::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, polars::core::TaskExecutor::setThreadNum(int)::$_0>&, std::__1::__tuple_indices<...>) at /usr/lib/llvm-18/bin/../include/c++/v1/__thread/thread.h:193
#13 0x000056320b60a641 in void* std::__1::__thread_proxy[abi:ne180100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, polars::core::TaskExecutor::setThreadNum(int)::$_0>>(void*) at /usr/lib/llvm-18/bin/../include/c++/v1/__thread/thread.h:202
#14 0x00007fb67aa41ac2 in start_thread at ./nptl/pthread_create.c:442
#15 0x00007fb67aad384f at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(stack trace cost: 69616ms) If cache mode is prioritize_memory, 750MB memory is reserved after stacktrace, while cache mode is prioritize_memory, 2.8GB memory is reserved. I suspected the JIT code cause the problem(try to scan all over the dwarf to find the no-existing symbol), but it is not as what I thought. Is it a bug or really expensive? I'm comfusing. |
Thanks for the additional information. I'll see about testing more this week. |
Problem:
Some stacktrace is very large and it cost much memory in cache(about 2GB) and time to construct (about 4s).
In our system, 2GB memory consume can not be adopted, because 4GB can be utilized.
Although there are 3 cache mode of cpptrace, but two of them except prioritize_speed can't shared lookup tables between trace calls which could slow the system and could(guess?) use more momory when two "large" stacktrace is constructing.
Is there a way that could both share lookup tables between trace and limit the total cache size? thx.
The text was updated successfully, but these errors were encountered: