-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP: Use RunOnTCPIP() for the LwIP TCP EndPoint and Fix memory leak in PacketBufferHandle::New() #36962
base: master
Are you sure you want to change the base?
Conversation
PR #36962: Size comparison from 1b4c56c to 5256657 Increases above 0.2%:
Full report (14 builds for cc13x4_26x4, cc32xx, nrfconnect, qpg, stm32, tizen)
|
PR #36962: Size comparison from 1b4c56c to 267b09d Increases above 0.2%:
Full report (69 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
@wqx6 please fix build errors and add a |
src/system/SystemLayer.cpp
Outdated
CHIP_ERROR Layer::RunOnMatterContext(std::function<CHIP_ERROR()> func) | ||
{ | ||
CHIP_ERROR err = CHIP_NO_ERROR; | ||
PlatformEventing::LockMatterStack(*this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LockMatterStack as implemented in this PR can be a no-op on some platforms, so cannot be used for anything that actually happens cross-platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it to LayerFreeRTOS.
PR #36962: Size comparison from f8d457a to cca01f2 Increases above 0.2%:
Full report (69 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
#else | ||
buffer.mBuffer->tot_len = aDataSize; | ||
#endif | ||
PacketBuffer * currentBuffer = buffer.mBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please give a detailed comment above this logic explaining the copying of the pbuf chain for easy future context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/inet/TCPEndPointImplLwIP.cpp
Outdated
System::Layer & lSystemLayer = listenEP->GetSystemLayer(); | ||
TCPEndPointImplLwIP * listenEP = static_cast<TCPEndPointImplLwIP *>(arg); | ||
TCPEndPointImplLwIP * conEP = nullptr; | ||
System::LayerFreeRTOS & lSystemLayer = static_cast<System::LayerFreeRTOS &>(listenEP->GetSystemLayer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this cast OK? Is it really the case that any time we are using LwIP we are using LayerFreeRTOS?
It really seems like the right thing to do is for any work that needs to happen on the Matter queue to actually happen on the Matter queue, async. If we need to snapshot some data for that, we snapshot that data....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a declared argument in system.gni
chip_system_config_use_lwip = chip_with_lwip && current_os == "freertos"
And we have no such scenario on which we are using LwIP for other OSs. This cast seems OK for now.
It really seems like the right thing to do is for any work that needs to happen on the Matter queue to actually happen on the Matter queue, async. If we need to snapshot some data for that, we snapshot that data....
Yes, I have tried to use ScheduleLamba
to post the EndPoint allocating to Matter queue, but ScheduleLamba
is a async function and we have to use the allocated endpoint immediately after the ScheduleLamba
. Also I have tried to post all the following actions in the LwIPHandleIncomingConnection
to Matter queue but encountered another issue. --- If we use ScheduleLamba
for the whole LwIPHandleIncomingConnection
, the function will ends immediately and the LwIP will finish three-way handshake and start receives TCP packets, the peer will also start sending TCP packets, but the arg and recv function is not set at that time(as the set process is post to Matter queue). This will result the timeout for TCP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the function back to System::Layer
but added a macro for the RunWithMatterContextLock()
function so that it will only be available when the system has lock. And also change here to System::Layer
.
PR #36962: Size comparison from f8d457a to a331147 Increases above 0.2%:
Full report (69 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
src/system/PlatformEventSupport.h
Outdated
@@ -31,6 +31,10 @@ class PlatformEventing | |||
public: | |||
static CHIP_ERROR ScheduleLambdaBridge(System::Layer & aLayer, LambdaBridge && bridge); | |||
static CHIP_ERROR StartTimer(System::Layer & aLayer, System::Clock::Timeout aTimeout); | |||
#if CHIP_SYSTEM_CONFIG_USE_LWIP || CHIP_SYSTEM_CONFIG_USE_OPEN_THREAD_ENDPOINT | |||
static void LockMatterStack(System::Layer & aLayer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use of aLayer here? Can we guarantee that chip stack locking actually exists/is usable before making these available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the aLayer.
Can we guarantee that chip stack locking actually exists/is usable before making these available?
Yes, added #if !CHIP_SYSTEM_CONFIG_NO_LOCKING
for it.
src/system/SystemLayer.cpp
Outdated
@@ -31,5 +31,16 @@ CHIP_ERROR Layer::ScheduleLambdaBridge(LambdaBridge && bridge) | |||
return lReturn; | |||
} | |||
|
|||
#if CHIP_SYSTEM_CONFIG_USE_LWIP || CHIP_SYSTEM_CONFIG_USE_OPEN_THREAD_ENDPOINT | |||
CHIP_ERROR LayerFreeRTOS::RunOnMatterContext(std::function<CHIP_ERROR()> func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe RunWithMatterContextLock
or similar, since RunOn
seems to imply thread loop transfer and this is not doing that.
Some description on why this is safe/does not deadlock would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name of the function and added some comments on the function of using non-blocking function as the input to avoid dead lock.
@wqx6 do we know why this costs 2K of flash? that is a lot |
I am not sure why it increases. I guess the reason might be those examples enable TCP endpoint and this PR changes the TCP codes from Locking to RunOnTCPIP. |
PR #36962: Size comparison from 3f62505 to 410e158 Increases above 0.2%:
Full report (70 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
Changes
RunOnTCPIP()
inTCPEndPointImplLwIP
instead ofLOCK_TCPIP_CORE()/UNLOCK_TCPIP_CORE()
.NewEndPoint()
which is required to be called in Matter context. And the new endpoint will be used immediately after it is created. so we cannot useScheduleLambda()
. This PR creates a new functionRunOnMatterContext()
for it.pbuf_alloc()
might return a pbuf with chained pbufs, but we will set the new allocated pbuf->next to nullptr which might result memory leak. This PR fixes it.Testing
chip_inet_config_enable_tcp_endpoint
.