Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed/Added SPI DMA features in SPI library. #234

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

BriscoeTech
Copy link

Hello Everyone,

This are my fixes and additions to the SPI library. This address issue #230 and #225, as well as an additional undocumented bug.

Let me know what you think :-)

  • Bug Fix. SPI DMA required to wait for both read and write dmas to complete before we can say the entire spi transfer is completed.
  • Bug Fix. A read only spi dma is not possible because both dmas are required for spi signal timing to work properly. Verified on oscilloscope. Just call the function and pass it empty buffers if you dont care about the rx or tx results, and ignore them.
  • Bug Fix. SPI DMA buffers were initialized once, and subsequent calls would not allow for new buffer pointers.
  • Bug/Feature Added. Previous dma was not truly asynchronous. Was poling for dma completion. Callback added to allow for true asynchronous transfers. Dma is now rtos friendly.
  • Added example program to show how to do a spi dma and setup a callback function when completed.

@Timvrakas
Copy link

I will run this on my gauntlet of hardware and report back.

@ladyada ladyada requested a review from PaintYourDragon July 11, 2020 00:25
@PaintYourDragon
Copy link

PaintYourDragon commented Jul 11, 2020

Might best be kept in a fork for projects that require it. This breaks compatibility with several existing behaviors.

  • SPI::transfer() constructor has changed (removing the 'bool block' argument), which will break code that relies on it.
  • Doesn’t correctly handle transfers over 65,535 bytes (sometimes used when updating large color displays, etc.). Handling this in a non-blocking manner gets pretty involved and requires chained DMA descriptors (which is why the current code is "semi-blocking" in that case, it was a compromise to get the code finished).
  • Doesn’t use DMA unless specifically requested (via the added callback argument). Current code uses DMA even if blocking, reason being that DMA transfers are ~10% faster by avoiding small inter-byte delays present in "manual" SPI transfer loops.

Not a compatibility problem, but there were some comments copied-and-pasted from the current code that got truncated and are no longer descriptive of the situation.

@PaintYourDragon
Copy link

A different approach would be to keep the existing transfer code, behaviors and function prototypes, and add a separate function with a distinct prototype, that handles the background-with-callback case (possibly limited to 65,535 bytes unless you want to do chained descriptors).

@BriscoeTech
Copy link
Author

Hello @PaintYourDragon thanks for the quick check :-)

* SPI::transfer() constructor has changed (removing the 'bool block' argument), which will break code that relies on it.
* Doesn’t use DMA unless specifically requested (via the added callback argument). Current code uses DMA even if blocking, reason being that DMA transfers are ~10% faster by avoiding small inter-byte delays present in "manual" SPI transfer loops.

I didnt think about using the dma transfer just for speed reasons but still blocking. That would be easy enough to add back in as an additional overloaded function that also waits for the dma completion flags to be both set. Will add that in.

* Doesn’t correctly handle transfers over 65,535 bytes (sometimes used when updating large color displays, etc.). Handling this in a non-blocking manner gets pretty involved and requires chained DMA descriptors (which is why the current code is "semi-blocking" in that case, it was a compromise to get the code finished).

That makes sense, I did remove that early on to bootstrap my brain around the problem. I will take a look if there is a elegant way to chain another dma from within the dma completed callbacks, this should be doable.

Ill let you when I get those features added back in :-)

@BriscoeTech
Copy link
Author

@PaintYourDragon I have a question for you, where does the 65,535 byte limitation come from?

I am looking at the addDescriptor and changeDescriptor dma functions that accepts the parameter count for transfer size, and count is a uint32_t which suggests transfers can be up to 4,294,967,295 bytes.

@BriscoeTech
Copy link
Author

I added back in support for the original functions, will wait on addressing the chaining of dmas until I know where the 65,535 byte limitation is at.

@PaintYourDragon
Copy link

The BTCNT element of DmacDescriptor is a signed 16-bit type.

There’s shenanigans one could do with 32-bit SPI transfers (rather than bytes) to make this less a constraint, but the edge cases get ugly and it’s probably best just to use linked descriptors.

libraries/SPI/SPI.h Outdated Show resolved Hide resolved
Adafruit_ZeroDMA readChannel;
Adafruit_ZeroDMA writeChannel;
DmacDescriptor *readDescriptor = NULL;
DmacDescriptor *writeDescriptor = NULL;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this doesn't change functionality and the original was correct, probably should not be changed.

Copy link
Author

@BriscoeTech BriscoeTech Jul 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a style choice, I'm usually explicit with variable names on every line

@BriscoeTech
Copy link
Author

Ok I added the automatic firing of multiple dma transfers if the size is over 65535, and fixed my spelling error :-)

I have not run it on hardware yet with my oscilloscope, this push is just so you can get eyes on it and review the architecture changes.

libraries/SPI/SPI.cpp Outdated Show resolved Hide resolved
@Timvrakas
Copy link

Timvrakas commented Aug 3, 2020

Alright, I had time to run this on my sensor test board (Feather M4)
I put together a FreeRTOS wrapper. Code here if you're interested.
The proposed changes are fully functional for:

  • BMI088 (accel and gyro)
  • BMP388
  • ADXL375

Admittedly these sensors have small transfer sizes. An LCD or SD card might be needed to make sure the chained DMA descriptors work.

I have one proposed change: the callback function should take uint8_t, and the when called, the argument should be the result of getSercomIndex() on the SPIClass. I needed this to map the callback to a specific semaphore in a static array.

I'm mixed when it comes to the discussion of forking into a library vs integrating here. I think having to use an external library ("RTOSPI") is an acceptable burden, but unfortunately, the transfer() methods of SPI.h are not virtual, which means in order to replace SPI.h we would need to modify every peripheral library and specify the derived class type. This is an unacceptable burden IMO, given the universality of the SPIClass interface. Could the transfer functions be made virtual? Or is there another way to pull this off? Perhaps a #define that allowed overriding transfer()?

Integrating into the upstream would address the above issue.

Also, it looks like we need a rebase to pass CI

@Timvrakas
Copy link

Timvrakas commented Aug 4, 2020

@PaintYourDragon can you comment the chaining approach? It looks like the proposed changes add their own chaining the the SPIClass Layer, but the ZeroDMA already has code to leverage the DMA-powered chaining. Is there a performance trade-off? It looks like calling addDescriptor several times would be enough to use it. Is that code mature?
What are the typical applications of SPI transfers this large? Is there a popular library that uses DMA chaining I could use as reference? I have a SPI E-Ink display I can dig up and test with at some point.

You said the BTCNT value is signed 16 bit, is that documented somewhere? the datasheet doesn't (that I can find), and it's described as a "count" not an offset, so my guess would be unsigned.

Are the proposed changes to ZeroDMA correct in your opinion? Obviously leaving having it to large won't break anything.

@BriscoeTech
Copy link
Author

Ok I got a change to run the current code on hardware with an oscilloscope. The waveforms look good and I ran all the different #define options in the new sample program to verify that the legacy blocking options are working. I also did some transfers around 64K and the wave forms look good. I could not count the bytes going by but there appears to be twice as much data flying by without any waveform issues.

I made a few typecasting fixes to eliminate compiling errors I was finding on sloeber when compiling the different tests.

I commented out the non dma function as its function signature is ambiguous when being compared to the legacy transfer with the optional boolean for blocking. I don't know of any case where the non dma is beneficial or mandatory, so I have commented it out just in case someone needs it in the future for reference.

I have one proposed change: the callback function should take uint8_t, and the when called, the argument should be the result of getSercomIndex() on the SPIClass. I needed this to map the callback to a specific semaphore in a static array.

The way I set it up is each SpiClass has a pointer for its own callback function, and the dma interrupts automatically calls the corresponding callback for the proper SpiClass. I envisioned the user would register different callbacks for the different spis being used, that way you don't have one callback you then have to arbitrarily decode. Let me know if this works as I have not tried it yet but the plumbing is there from the original implementation and should work.

@BriscoeTech
Copy link
Author

You said the BTCNT value is signed 16 bit, is that documented somewhere? the datasheet doesn't (that I can find), and it's described as a "count" not an offset, so my guess would be unsigned.

Here is what I found when digging deep into the code, the uint16_t register is defined in dmac,h of the cmsis library atmel has priveded for this chip

image

BriscoeTech added 6 commits August 7, 2020 13:10
* Bug Fix. SPI DMA required to wait for both read and write dmas to complete before we can say the entire spi transfer is completed.
* Bug Fix. A read only spi dma is not possible because both dmas are required for spi signal timing to work properly. Verified on oscilloscope. Just call the function and pass it empty buffers if you dont care about the rx or tx results, and ignore them.
* Bug Fix. SPI DMA buffers were initialized once, and subsequent calls would not allow for new buffer pointers.
* Bug/Feature Added. Previous dma was not truly asynchronous. Was poling for dma completion. Callback added to allow for true asynchronous transfers. Dma is now rtos friendly.
* Added example program to show how to do a spi dma and setup a callback function when completed.
…ing for completion.

* changed parameters count to be uint32_t instead of size_t. Dma transfer setup functions accept count as uint32_t. This provides more transparency on max byte size to user calling function.
…er size is over 65535 bytes. This should work for poling or asynchronous dma transfers.

* addDescriptor and changeDescriptor in Adafruit_ZeroDMA function signature changed. Count parameter changed to be a uint16_t instead of uint32_t. Reflects the limitation that desc->BTCNT.reg is a uint16_t and cannot actually accept a uint32_t value.
* Not tested on hardware yet, pushed so it can be read by others. Will test on hardware soon. Need to make sure line 312 and 313 pointer assignment math in checkDmaComplete actually works properly.
…licts with dma transfer with optional boolean for blocking. I dont know of a instance where the non dma is a requirement or better, so it is commented out for now.

* initialized variables so to avoid constructor error message in sloeber
* changed void pointer math to have a explicit type cast to uint8_t, void pointer math was generating compiler error and we know we are doing byte math.
* added explicit typecast of null function pointer, this prevents ambiguity on what function signature we are calling and prevents compiler warnings in sloeber
@BriscoeTech
Copy link
Author

Wooops! In the process of re-basing my pr and getting it in sync, looks like github auto closed this pr. Ill try and fix it soon if I don't get preempted by life here in a second.

@BriscoeTech BriscoeTech reopened this Aug 7, 2020
* removed non dma test from the example since it is commented out in library now
@BriscoeTech
Copy link
Author

At this point I think this pr is good to be merged. Let me know if you have any additional feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants