-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using a large model on a ESP32 S3 N16R8 (TFMIC-37) #94
Comments
Hi @nicklasb ESP32-S3 have cache option and you can explore the same. If you have that much internal RAM underutilised, do increase the Data cache and I cache sizes from As far as moving some of the allocations to internal goes, the current tflite structure is not flexible enough to allow that. You may move some of the critical kernels (from esp-nn) to IRAM to make them always persist in RAM to boost it even further. Please explore 5MB tensor Arena requirement is indeed high and I am not sure if this should be the case. I will give the Yolo model a try and do some experiments myself. |
Hi, thanks for you answer!
I am afraid that they have not done much difference in my case, also, their maximum values aren't very high, I will revisit them and see if have missed something.
It sort of is IMO, but only up to a point, the MicroAllocator can be initialized with a non-persistent area that I think could help to some degree. However, I have not been able to make that work, then again C++ semantics is not my home turf (yet) and they seem to use all the tricks. There are some PR:s that touch this over at TFLite, if Espressif put some of their might behind that I think it would make a difference.
Ok, I will take a look.
It is, but perhaps it is not that strange, the model is > 2 MB. Either way, the size doesn't matter as long it fits well within PSRAM. It is more about that the frequent stuff needs a faster memory.
For your information, when testing out YOLO, its export.py can both quantize to int8 and directly export to .tflite format, I went on a long tangent before realizing that. Also I used YOLOv5. |
I am afraid that made no discernable difference. What I am going to do now is clone this library instead of using it as a stand-alone compinent, and focus on if I can override the memory management in some way by using MicroAllocator. Maybe I can help out in some way. |
@vikramdattu Wierdly, setting compiler option to (x) Optimize for performance (-O2) makes inference about 20 percent slower. |
Did you set |
Hi, I set it from menuconfig and basically the box coordinates ended upp smaller, I had some of them going negative. |
I couldn't as I am now deep into the custom MicroAllocator/planner, but generally, all values became smaller, should not be too hard to replicate with any YOLO model, there is nothing special with mine. |
Hi,
I am inferring a large model (~2 MB) on the ESP32 S3, and it takes about 60 seconds, while taking about 50 ms on my PC.
As the tensor arena seem to have to be about 5 MB to satisfy TF lite, and the RGB image is larger than SRAM (it's a YOLO model, it prefers RGB), obviously everything ends up happening in PSRAM, which slows things down significantly.
However I don't think it does so by a factor of thousand even though the ESP32 S3 is obviously slower as well.
What can I do? Can I override the memory allocator to put just some stuff in SRAM, I have about 300K available that isn't being used?
I saw that the p4 will basically have SRAM as a cache for a faster PSRAM more or less, is there something similar that could be done in the meanwhile? Or something else?
The text was updated successfully, but these errors were encountered: