Improvment of the interface #3

QSOlePettersson · 2017-12-11T09:38:39Z

The decompress interface did not allow getting the necessary information from the JPEG header to determine which parameters to use for the decompression call. As far as I can tell, the JPEG file format supports either 8 bit or 24 bit images. To be able to load an image file "as is", one must be able to determine which of these cases are to be applied for the current image file.
Furthermore, I oppose the usage of the "unsafe" keyword on principle and mapping of managed memory directly to use in an unmanaged context and have therefore refactored the necessary parts.
With the changes from this fork, it is now possible to simply decompress a JPEG bytestream (without conversion) to a System.Drawing.Bitmap, as well as querying the format of the JPEG stream and allocating a suitable Bitmap object (for example from a pool for memory conservation) and decompress to it.

Added wrapper class for unmanaged memory allocated by native tj-functions. Added Compress overload to allow skipping marshalling to managed space. Added static grayscale palette for TJDecompressor. Added overloads for Decompress to allow decoding an image without specifying a pixelformat. Added overloads for GetImageInfo to get actual pixelformat from image header before decoding.

…ing.Imaging.PixelFormat.

qmfrederik · 2017-12-11T10:50:30Z

Thanks! The changes related to fetch the JPEG parameters such as bit depth look good to me.

Unfortunately, I believe I can't take the change related to the unsafe keyword as is. I think it introduces additional allocation and copying of memory. We use the TurboJpegWrapper in a tight loop to decompress a feed of JPEG images and this really impacts performance. I'll comment on the code.
That's one of the reasons we sometimes use unsafe code.

Would you mind splitting this PR in two?

qmfrederik · 2017-12-11T10:55:30Z

LibJpegWrapper/TJDecompressor.cs


-            fixed (byte* bufPtr = buf)


I believe this change introduces a performance regression.

In the original code, one buffer (buf) was allocated in managed code which was pinned and shared with unmanaged code.

In the changed code, you first create a new buffer in unmanaged code (new TJUnmanagedMemory(bufferSize)).
When you return outBuf, you implicitly cast it to a byte[]. That operator allocates a new byte array and returns that one.

Net, in the original code you had one allocation of a large object, in the changed code you have two allocations (one in unmanaged memory and one in managed memory); plus a copy operation.

This code is on a hot path for us so performance is important.

Perhaps using Span<T> may be a solution here, not sure.
If you think there's no performance difference, feel free to send over a BenchmarkDotNet test which compares both ;-).

qmfrederik

I would need to better understand the performance impact of removing the unsafe keyword.

The JPEG-related changes look good; feel free to split them off in a separate PR.

Thanks!

QSOlePettersson · 2017-12-11T11:55:05Z

Hi, I believe you could have a point there. I would personally do that tradeoff, in order to eliminate the unsafe blocks. We try to avoid (at our current stage eliminate) the transfer of the byte data into manged space unless we are actually looking at the raw byte data at that point of the pipeline. I therefore do not use that path you comment on, but instead use PInvoked methods to memory map a view of the file from disk and decompress it directly into the GDI Bitmap to avoid all big allocs. From our experience with a lot of images and resulting big object allocations, I would not be too concerned about the unmanaged alloc. You are of course correct, that a copy operation costs more than nothing. I will look into the benchmark check to get a measurement. How big are your images? We have either 4 MPixel or 9 MPixel (depending on camera) of 8 bit grayscale images (which would be my test material). If the decompress to byte array is your critical path, we could perhaps indeed make a compromise… I basically need the extended interface to do the operation described above. I also believe, that it is not a good idea to use a pinned byte array as backing store for the GDI Bitmap Scan0 parameter. My goal for the pull request is however, that we can use an updated NuGet package. For that I only really NEED the extended interface and I would rather have a NuGet package that fits our needs than enforce (my dogmatic) code policies. BTW, thanks for the tip with Span<T>. I do not think it would fit to this specific issue, but it sounds like a cool language feature for (possibly related) other situations. Best regards, Ole Pettersson. Von: Frederik Carlier Gesendet: Montag, 11. Dezember 2017 11:55 An: qmfrederik/AS.TurboJpegWrapper Cc: QSOlePettersson; Author Betreff: Re: [qmfrederik/AS.TurboJpegWrapper] Improvment of the interface (#3) @qmfrederik commented on this pull request. In LibJpegWrapper/TJDecompressor.cs:

- fixed (byte* bufPtr = buf) I believe this change introduces a performance regression. In the original code, one buffer (buf) was allocated in managed code which was pinned and shared with unmanaged code. In the changed code, you first create a new buffer in unmanaged code (new TJUnmanagedMemory(bufferSize)). When you return outBuf, you implicitly cast it to a byte[]. That operator allocates a new byte array and returns that one. Net, in the original code you had one allocation of a large object, in the changed code you have two allocations (one in unmanaged memory and one in managed memory); plus a copy operation. This code is on a hot path for us so performance is important. Perhaps using Span<T> may be a solution here, not sure. If you think there's no performance difference, feel free to send over a BenchmarkDotNet test which compares both ;-). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

qmfrederik · 2017-12-11T15:03:47Z

In my case, the compressed images contain the framebuffers of Android devices, so resolutions of 1920x1200 at 32-bit pixel depth are not uncommon.
We decompress the images, calculate the delta, and then send incremental updates to clients.
We want to get at least 20fps, so we need to do this 20 times per second (more is better).

So in this example, we'd be allocating and copying an additional 320MB per second.
You notice that in CPU usage and garbage collection statistics ;-).

We don't interop with GDI+ in this case so our scenario is indeed different from yours. If you want to pass the unmanaged memory to GDI you can optimize differently.

I don't mind taking changes that optimize for a specific scenario; just want avoid a regression for other scenarios :-).

QSOlePettersson added 2 commits December 5, 2017 17:24

Changed TJUtils to public to allow converting to and from System.Draw…

83d9ad8

…ing.Imaging.PixelFormat.

qmfrederik reviewed Dec 11, 2017

View reviewed changes

qmfrederik suggested changes Dec 11, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvment of the interface #3

Improvment of the interface #3

QSOlePettersson commented Dec 11, 2017

qmfrederik commented Dec 11, 2017

qmfrederik Dec 11, 2017

qmfrederik left a comment

QSOlePettersson commented Dec 11, 2017 via email

qmfrederik commented Dec 11, 2017

Improvment of the interface #3

Are you sure you want to change the base?

Improvment of the interface #3

Conversation

QSOlePettersson commented Dec 11, 2017

qmfrederik commented Dec 11, 2017

qmfrederik Dec 11, 2017

Choose a reason for hiding this comment

qmfrederik left a comment

Choose a reason for hiding this comment

QSOlePettersson commented Dec 11, 2017 via email

qmfrederik commented Dec 11, 2017