Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some basic questions about usage #47

Closed
jerch opened this issue Jul 1, 2022 · 16 comments
Closed

Some basic questions about usage #47

jerch opened this issue Jul 1, 2022 · 16 comments
Labels
question Further information is requested

Comments

@jerch
Copy link

jerch commented Jul 1, 2022

@Tyriar Wow this lib looks really nice, grats for getting it done that sophisticated, must have been alot of work. I def. will look into it.

First I have a few questions about proper usage:

  • In api.d.ts you state for decodePng:
  • Note that it's best to drop references to both metadata and rawChunks as soon as they are no
  • longer needed as they may take up a significant amount of memory depending on the image.

Does that mean, that the decoder carries the raw data along, thus should be removed manually from the returned IDecodedPng to free some memory?

  • For terminal output I'm only interested in pixel data (rgba) + dimension notion. I see that is contained IImage32 and IImage64. Is the pixel data there always fully RGBA colored, or is there a catch with some image modes (e.g. paletted contains idx, thus needs another indirection to resolve to real colors)?
  • The pixel being RGBA8888 - is that RGBA32 or ABGR32 aligned on LE? Not sure yet, if I'll need 32bit access pattern at all, but I found that being way faster than single channel handling in the sixel lib.
  • Whats the deal with 32 vs. 64, what exactly determines the bit depth of channels and can I somehow tell the decoder to always output IImage32 or would I have to do the color reduction afterwards? Dont want to go the 16bit channel route for now, as it just puts the memory under even more stress.
  • Later on I want to use the encoder for serialization. Are there any hints/pitfalls with PNG encoding in general (see me as a PNG noob here)?
  • About those promises - at which level is encoding/decoding async? Is it timer based/cooperative, or some real worker concurrency?
  • Is it possible to feed the decoder from arbitrary chunks (stream decoding)?

Sorry for so many questions, plz dont feel bugged by that, I dont need detailled answers but just some yes/no or a code pointer here and there.

Btw #15 would be a killer feature - imho APNG is a very nice, but sadly totally underrated format.

@Tyriar
Copy link
Member

Tyriar commented Jul 1, 2022

Thanks, it definitely did take a while 🙂

Does that mean, that the decoder carries the raw data along, thus should be removed manually from the returned IDecodedPng to free some memory?

Yeah, it's just calling out that IDecodedPng contains references to potentially some big buffers, for example:

png-codec/typings/api.d.ts

Lines 572 to 587 in 60576e5

export interface IPngMetadataEmbeddedIccProfile {
/**
* The type of metadata, this is typically the name of the chunk from which is originates.
*/
type: 'iCCP';
/**
* The name of the profile.
*/
name: string;
/**
* The raw bytes of the embedded ICC profile.
*/
data: Uint8Array;
}

For terminal output I'm only interested in pixel data (rgba) + dimension notion. I see that is contained IImage32 and IImage64. Is the pixel data there always fully RGBA colored, or is there a catch with some image modes (e.g. paletted contains idx, thus needs another indirection to resolve to real colors)?

It's always rgba, the palette is used to decode and pass out the image in a standard. The palette is also available in the API as a potentially useful feature for image editors. Luna Paint lets you inspect an image for example:

image

The pixel being RGBA8888 - is that RGBA32 or ABGR32 aligned on LE? Not sure yet, if I'll need 32bit access pattern at all, but I found that being way faster than single channel handling in the sixel lib.

The decoded pixel data gets transferred to a regular ArrayBuffer via UInt8Array initialized here:

const result = new Uint8Array(bplCeiled * header.height);

I think JS uses LE by default?

Whats the deal with 32 vs. 64, what exactly determines the bit depth of channels and can I somehow tell the decoder to always output IImage32 or would I have to do the color reduction afterwards? Dont want to go the 16bit channel route for now, as it just puts the memory under even more stress.

PNGs can be encoded with 64 bits per pixel, if you don't handle 64 bit images, you can use force32:

export function decodePng(data: Readonly<Uint8Array>, options: IDecodePngOptions & { force32: true }): Promise<IDecodedPng<IImage32>>;

I wouldn't expect you to want to do this for terminal images.

Later on I want to use the encoder for serialization. Are there any hints/pitfalls with PNG encoding in general (see me as a PNG noob here)?

I tried to make the encode API as simple as possible to a noob 😉, you will probably just want to encode without any options and some light analysis will happen to pick the right bit depth, color type, etc.

About those promises - at which level is encoding/decoding async? Is it timer based/cooperative, or some real worker concurrency?

The only reason the promises exist is for code splitting:

  • Importing the decoder will not load the encoder into the engine
  • There are a lot of chunks, the decoder will lazily load the optional ones only when needed as based on parseChunkTypes:

    png-codec/typings/api.d.ts

    Lines 216 to 221 in 60576e5

    /**
    * A list of optional chunk types to parse or `'*'` to parse all known chunk types. By default
    * only the chunk types required to extract the image data is parsed for performance reasons, if a
    * chunk type is of use this option can be used to do that.
    */
    parseChunkTypes?: OptionalParsedChunkTypes[] | '*';

Is it possible to feed the decoder from arbitrary chunks (stream decoding)?

It only works off a full UInt8Array currently, I'm not sure stream decoding would give much since pngs are typically tiny?

I considered progressive decoding but it seems really overkill considering the small size of pngs and the large bandwidth of most internet connections (plus my use case works off local or remote without streaming in vscode).

Btw #15 would be a killer feature - imho APNG is a very nice, but sadly totally underrated format.

I'm guessing you mean #10? Yeah I'll find the motivation to do this at some point. I actually don't expect it to be that tough to decode APNGs, the hard parts are making a nice multi-image API that I'll want to use for future libs (gif?) and the UX side in Luna Paint would need a bit of work.

@Tyriar Tyriar added the question Further information is requested label Jul 1, 2022
@jerch
Copy link
Author

jerch commented Jul 1, 2022

It's always rgba, the palette is used to decode and pass out the image in a standard. The palette is also available in the API as a potentially useful feature for image editors.

Ah right, I did it likewise in the sixel lib and exposed the palette for whatever usage. Only issue there - sixels are colored immediately (printer mode), which might be misused by hi-res encoders to shift palette colors to get more colors in. Which makes the palette unstable during decoding and limits its usage afterwards (was not keen to create a history preserving data structure just for that). Thats also one of the reasons, why I dont want to re-encode a former sixel image back to sixel format for serialization (others being bad compression/quality ratio and high CPU consumption).

I think JS uses LE by default?

Well webassembly is always LE, JS uses native endianess. Learnt that the hard way during wasm adoption for the sixel lib, and basically removed all BE stuff, as the wasm-LE <--> js-BE bridging became a code nuisance. Imho not addressing BE arch is a safe bet for xterm.js, as there is not a single desktop grade BE system anymore. I think ibms POWER arch is the last serious BE domain. Though I find it weird, that LE won the race - BE is for example alot easier in dynamic memory handling for vector processing.

PNGs can be encoded with 64 bits per pixel, if you don't handle 64 bit images, you can use force32:

Ah right, overlooked that in the api file.

I tried to make the encode API as simple as possible to a noob, you will probably just want to encode without any options and some light analysis will happen to pick the right bit depth, color type, etc.

Sweet, thats all I need I guess.

@stream decoder
Yes thats no biggie. The stream decoding approach came to as a logical consequence in the sixel lib, because the raw data is already chunked. But sixel images can grow really big (easily into several MBs) compared to PNGs.

@decoding/promise
Ah ic, you are awaiting module imports for different chunk decoders. I think I gonna test first, how things perform on the mainthread, but most likely I'll have to move it over to the worker (have not perf tested your decoder yet). My early tests with browser -API PNG decoding months ago was not as fast as I hoped for - in fact it is slower than the sixel path for the same reduced pixel data, but thats a summation effect of png decoding and the needed base64 decoding on top. (Already started a much faster lib with a base64 variant in C/wasm, which eliminates atob as ugly perf showstopper.)

So thanks for your detailed response.

@jerch
Copy link
Author

jerch commented Jul 2, 2022

@Tyriar I did some first perf tests in a dummy project just testing decoding of a 6000x4000 pixel image 10 times with these results (numbers in msec):

  • png-codec: [2070, 1980, 3114, 3124, 3104, 3082, 3286, 3093, 3101, 3264]
  • upng: [1478, 1382, 1227, 1206, 1234, 1215, 1231, 1215, 1260, 1219]
  • browser: [395, 388, 387, 392, 389, 392, 386, 395, 389, 381]
  • sixel: [210, 198, 201, 242, 191, 191, 186, 184, 187, 185]

The browser method uses the createImageBitmap function on recent chrome (not available on all browser engines for my purpose). The sixel image contains the same pixel information (in fact I created the png from the sixel one to ensure that).

To get abit more insight some profiling data:

Statistical profiling result from isolate-luna.log, (28788 ticks, 225 unaccounted, 0 excluded).

 [JavaScript]:
   ticks  total  nonlib   name
   6849   23.8%   26.7%  LazyCompile: *defilter /home/jerch/test-upng/node_modules/@lunapaint/png-codec/out-dist/decode/chunks/chunk_IDAT.js:39:18
   6475   22.5%   25.2%  LazyCompile: *<anonymous> /home/jerch/test-upng/node_modules/@lunapaint/png-codec/out-dist/decode/chunks/chunk_IDAT.js:119:24
   6137   21.3%   23.9%  LazyCompile: *inflate_fast /home/jerch/test-upng/node_modules/@lunapaint/png-codec/node_modules/pako/lib/zlib/inffast.js:61:39
   3020   10.5%   11.8%  LazyCompile: *mapPackedDataToRgba /home/jerch/test-upng/node_modules/@lunapaint/png-codec/out-dist/decode/chunks/chunk_IDAT.js:204:29
   1894    6.6%    7.4%  LazyCompile: *inflate /home/jerch/test-upng/node_modules/@lunapaint/png-codec/node_modules/pako/lib/zlib/inflate.js:363:17
    774    2.7%    3.0%  LazyCompile: *readChunk /home/jerch/test-upng/node_modules/@lunapaint/png-codec/out-dist/decode/decoder.js:220:19
     91    0.3%    0.4%  LazyCompile: *updateCrc /home/jerch/test-upng/node_modules/@lunapaint/png-codec/out-dist/shared/crc32.js:25:19
     64    0.2%    0.2%  LazyCompile: *inflate_table /home/jerch/test-upng/node_modules/@lunapaint/png-codec/node_modules/pako/lib/zlib/inftrees.js:53:23
     32    0.1%    0.1%  LazyCompile: *adler32 /home/jerch/test-upng/node_modules/@lunapaint/png-codec/node_modules/pako/lib/zlib/adler32.js:26:17
     11    0.0%    0.0%  LazyCompile: *Inflate.push /home/jerch/test-upng/node_modules/@lunapaint/png-codec/node_modules/pako/lib/inflate.js:196:35
      2    0.0%    0.0%  LazyCompile: *readChunks /home/jerch/test-upng/node_modules/@lunapaint/png-codec/out-dist/decode/decoder.js:187:20
      1    0.0%    0.0%  LazyCompile: *normalizeString path.js:52:25
      1    0.0%    0.0%  LazyCompile: *module.exports.flattenChunks /home/jerch/test-upng/node_modules/@lunapaint/png-codec/node_modules/pako/lib/utils/common.js:30:32

vs. upng:

Statistical profiling result from isolate-upng.log, (12588 ticks, 0 unaccounted, 0 excluded).

 [JavaScript]:
   ticks  total  nonlib   name
   5597   44.5%   46.6%  LazyCompile: *H.H.N /home/jerch/test-upng/upng.js:229:58
   3838   30.5%   32.0%  LazyCompile: *UPNG.decode._filterZero /home/jerch/test-upng/upng.js:323:35
   1817   14.4%   15.1%  LazyCompile: *UPNG.toRGBA8.decodeImage /home/jerch/test-upng/upng.js:34:36
    616    4.9%    5.1%  LazyCompile: *UPNG.decode /home/jerch/test-upng/upng.js:99:23
     29    0.2%    0.2%  LazyCompile: *H.H.A /home/jerch/test-upng/upng.js:247:25
     17    0.1%    0.1%  LazyCompile: *H.H.R /home/jerch/test-upng/upng.js:241:15
     10    0.1%    0.1%  LazyCompile: *H.H.V /home/jerch/test-upng/upng.js:243:98
      8    0.1%    0.1%  LazyCompile: *H.H.n /home/jerch/test-upng/upng.js:245:15
      1    0.0%    0.0%  LazyCompile: *H.H.e /home/jerch/test-upng/upng.js:250:98
      1    0.0%    0.0%  LazyCompile: *H.H.b /home/jerch/test-upng/upng.js:251:15

vs. sixel:

Statistical profiling result from isolate-sixel.log, (2081 ticks, 0 unaccounted, 0 excluded).

 [JavaScript]:
   ticks  total  nonlib   name
   1297   62.3%   96.2%  Function: *wasm-function[9]
      2    0.1%    0.1%  Function: wasm-function[9]
      1    0.0%    0.1%  RegExp: ^((?:@[^/\\%]+\/)?[^./\\%][^/\\%]*)(\/.*)?$

For browser decoding I cannot show you more detailed profiling data, as devtools just shows createImageBitmap as a big block of ~360ms on 4 worker threads plus ~30ms for blob conversion on the mainthread.

@Tyriar
Copy link
Member

Tyriar commented Jul 2, 2022

Interesting, good to know! Tweaking perf hasn't been a big focus yet as it's been fast enough in my experience, 3 seconds seems like a lot but that's for 24 megapixels which a png would never be used for in practice, going down to 2.4 megapixels is still a crazy size for a png and that's down to ~300ms. I also have the option of doing this in a worker if needed.

Since I haven't needed to focus on perf yet I expect there's probably some easy wins, like pulling variable declarations to the top of a function to reduce GC here and here, bit math tricks, etc.

I also notice for upng most time was spent in UPNG.decode._filterZero which indicates it's probably not a particularly compressed png file, which would decode faster than when using the different filter types.

@jerch
Copy link
Author

jerch commented Jul 2, 2022

Well I created it in chrome from the canvas back by blob export. Is there a way to check its compression level? This is what imagemagick has to say about the image:

Image: big.png
  Format: PNG (Portable Network Graphics)
  Mime type: image/png
  Class: DirectClass
  Geometry: 6000x4000+0+0
  Units: Undefined
  Type: PaletteAlpha
  Endianess: Undefined
  Colorspace: sRGB
  Depth: 8-bit
  Channel depth:
    red: 8-bit
    green: 8-bit
    blue: 8-bit
    alpha: 1-bit
  Channel statistics:
    Pixels: 24000000
    Red:
      min: 5 (0.0196078)
      max: 245 (0.960784)
      mean: 62.4418 (0.24487)
      standard deviation: 51.2455 (0.200963)
      kurtosis: 0.485273
      skewness: 1.01213
      entropy: 0.859009
    Green:
      min: 0 (0)
      max: 247 (0.968627)
      mean: 62.2639 (0.244172)
      standard deviation: 53.3428 (0.209187)
      kurtosis: -0.652367
      skewness: 0.620157
      entropy: 0.81602
    Blue:
      min: 0 (0)
      max: 232 (0.909804)
      mean: 68.4463 (0.268417)
      standard deviation: 64.3633 (0.252405)
      kurtosis: -1.50305
      skewness: 0.410485
      entropy: 0.836668
    Alpha:
      min: 255 (1)
      max: 255 (1)
      mean: 255 (1)
      standard deviation: 0 (0)
      kurtosis: 0
      skewness: 0
      entropy: 0.065006
  Image statistics:
    Overall:
      min: 0 (0)
      max: 247 (0.968627)
      mean: 48.288 (0.189365)
      standard deviation: 49.026 (0.192259)
      kurtosis: 1.88159
      skewness: 1.53239
      entropy: 0.644176
  Colors: 239
  Histogram:
    673437: (  5,  8, 13,255) #05080DFF srgba(5,8,13,1)
   3025258: (  8,  8,  8,255) #080808FF grey3
    910398: ( 15,  8,  0,255) #0F0800FF srgba(15,8,0,1)
   1065458: ( 15,  8, 15,255) #0F080FFF srgba(15,8,15,1)
     18736: ( 15, 15,  0,255) #0F0F00FF srgba(15,15,0,1)
      2059: ( 15, 15,  8,255) #0F0F08FF srgba(15,15,8,1)
    368105: ( 15, 15, 15,255) #0F0F0FFF grey6
    142238: ( 20, 15, 23,255) #140F17FF srgba(20,15,23,1)
      6462: ( 23,  0,  8,255) #170008FF srgba(23,0,8,1)
    243219: ( 23,  8,  0,255) #170800FF srgba(23,8,0,1)
    151498: ( 23,  8,  8,255) #170808FF srgba(23,8,8,1)
       226: ( 23,  8, 15,255) #17080FFF srgba(23,8,15,1)
    429605: ( 23, 15,  8,255) #170F08FF srgba(23,15,8,1)
     12867: ( 23, 23,  8,255) #171708FF srgba(23,23,8,1)
    507183: ( 23, 23, 15,255) #17170FFF srgba(23,23,15,1)
     41486: ( 23, 28, 33,255) #171C21FF srgba(23,28,33,1)
     53682: ( 23, 41, 79,255) #17294FFF srgba(23,41,79,1)
      5103: ( 23, 43, 84,255) #172B54FF srgba(23,43,84,1)
     50225: ( 23, 48, 89,255) #173059FF srgba(23,48,89,1)
        82: ( 23, 56,105,255) #173869FF srgba(23,56,105,1)
       348: ( 23, 64,232,255) #1740E8FF srgba(23,64,232,1)
      2528: ( 28, 28, 15,255) #1C1C0FFF srgba(28,28,15,1)
      8453: ( 28, 28, 56,255) #1C1C38FF srgba(28,28,56,1)
     23960: ( 28, 36,  0,255) #1C2400FF srgba(28,36,0,1)
     20905: ( 28, 51, 89,255) #1C3359FF srgba(28,51,89,1)
    347957: ( 33,  8,  0,255) #210800FF srgba(33,8,0,1)
    205233: ( 33,  8, 13,255) #21080DFF srgba(33,8,13,1)
    277925: ( 33, 15,  8,255) #210F08FF srgba(33,15,8,1)
    174193: ( 33, 15, 15,255) #210F0FFF srgba(33,15,15,1)
    156484: ( 33, 15, 23,255) #210F17FF srgba(33,15,23,1)
     37470: ( 33, 23,  8,255) #211708FF srgba(33,23,8,1)
     96233: ( 33, 23, 23,255) #211717FF srgba(33,23,23,1)
      5280: ( 33, 41, 84,255) #212954FF srgba(33,41,84,1)
     30940: ( 33, 48, 97,255) #213061FF srgba(33,48,97,1)
    121952: ( 33, 56, 97,255) #213861FF srgba(33,56,97,1)
    161663: ( 33, 56,105,255) #213869FF srgba(33,56,105,1)
       108: ( 33, 56,112,255) #213870FF srgba(33,56,112,1)
     41682: ( 33, 64,107,255) #21406BFF srgba(33,64,107,1)
    104751: ( 36, 28, 13,255) #241C0DFF srgba(36,28,13,1)
     65499: ( 36, 33, 20,255) #242114FF srgba(36,33,20,1)
      9395: ( 36, 61,115,255) #243D73FF srgba(36,61,115,1)
      3139: ( 36, 71,115,255) #244773FF srgba(36,71,115,1)
        92: ( 41,  8,  8,255) #290808FF srgba(41,8,8,1)
    136334: ( 41, 15,  8,255) #290F08FF srgba(41,15,8,1)
    108467: ( 41, 20, 20,255) #291414FF srgba(41,20,20,1)
     13938: ( 41, 33, 13,255) #29210DFF srgba(41,33,13,1)
     37568: ( 41, 33, 36,255) #292124FF srgba(41,33,36,1)
     21913: ( 41, 48, 89,255) #293059FF srgba(41,48,89,1)
     85255: ( 41, 61, 99,255) #293D63FF srgba(41,61,99,1)
     25528: ( 41, 64,105,255) #294069FF srgba(41,64,105,1)
    182137: ( 41, 64,112,255) #294070FF srgba(41,64,112,1)
    256515: ( 41, 64,120,255) #294078FF srgba(41,64,120,1)
       108: ( 41, 64,128,255) #294080FF srgba(41,64,128,1)
    131052: ( 41, 71,120,255) #294778FF srgba(41,71,120,1)
    210256: ( 41, 71,128,255) #294780FF srgba(41,71,128,1)
      5482: ( 41, 79,128,255) #294F80FF srgba(41,79,128,1)
       678: ( 41, 79,135,255) #294F87FF srgba(41,79,135,1)
     28843: ( 48,  5,  8,255) #300508FF srgba(48,5,8,1)
     92039: ( 48, 15,  0,255) #300F00FF srgba(48,15,0,1)
     90452: ( 48, 23,  8,255) #301708FF srgba(48,23,8,1)
    227072: ( 48, 23, 15,255) #30170FFF srgba(48,23,15,1)
    142712: ( 48, 28, 28,255) #301C1CFF srgba(48,28,28,1)
     67866: ( 48, 36, 41,255) #302429FF srgba(48,36,41,1)
       291: ( 48, 64,120,255) #304078FF srgba(48,64,120,1)
     86219: ( 48, 69,112,255) #304570FF srgba(48,69,112,1)
    150246: ( 48, 69,128,255) #304580FF srgba(48,69,128,1)
    104470: ( 48, 71,120,255) #304778FF srgba(48,71,120,1)
    368401: ( 48, 79,128,255) #304F80FF srgba(48,79,128,1)
    116650: ( 48, 79,135,255) #304F87FF srgba(48,79,135,1)
     41461: ( 48, 89,135,255) #305987FF srgba(48,89,135,1)
     50071: ( 48, 92,143,255) #305C8FFF srgba(48,92,143,1)
       232: ( 48, 97,135,255) #306187FF srgba(48,97,135,1)
    106812: ( 51, 20,  8,255) #331408FF srgba(51,20,8,1)
     46347: ( 51, 23, 20,255) #331714FF srgba(51,23,20,1)
     52523: ( 51, 33,  5,255) #332105FF srgba(51,33,5,1)
     60131: ( 51, 77,120,255) #334D78FF srgba(51,77,120,1)
     14758: ( 56, 13,  5,255) #380D05FF srgba(56,13,5,1)
     54288: ( 56, 23,  8,255) #381708FF srgba(56,23,8,1)
    114419: ( 56, 36, 23,255) #382417FF srgba(56,36,23,1)
     88341: ( 56, 41, 41,255) #382929FF srgba(56,41,41,1)
     12755: ( 56, 41, 48,255) #382930FF srgba(56,41,48,1)
     32669: ( 56, 51,  8,255) #383308FF srgba(56,51,8,1)
     40459: ( 56, 56, 23,255) #383817FF srgba(56,56,23,1)
     10724: ( 56, 56, 36,255) #383824FF srgba(56,56,36,1)
     22349: ( 56, 56, 48,255) #383830FF srgba(56,56,48,1)
      7153: ( 56, 64, 77,255) #38404DFF srgba(56,64,77,1)
     68102: ( 56, 79,128,255) #384F80FF srgba(56,79,128,1)
     59798: ( 56, 89,115,255) #385973FF srgba(56,89,115,1)
     22214: ( 56, 89,128,255) #385980FF srgba(56,89,128,1)
    455734: ( 56, 89,135,255) #385987FF srgba(56,89,135,1)
    135385: ( 56, 89,143,255) #38598FFF srgba(56,89,143,1)
      6343: ( 56, 97,135,255) #386187FF srgba(56,97,135,1)
    255353: ( 56, 97,143,255) #38618FFF srgba(56,97,143,1)
     99940: ( 56,105,148,255) #386994FF srgba(56,105,148,1)
     58884: ( 61, 20,  5,255) #3D1405FF srgba(61,20,5,1)
     33663: ( 61, 23, 20,255) #3D1714FF srgba(61,23,20,1)
     34300: ( 61, 71,107,255) #3D476BFF srgba(61,71,107,1)
     48142: ( 61, 89,133,255) #3D5985FF srgba(61,89,133,1)
     67602: ( 61, 92,143,255) #3D5C8FFF srgba(61,92,143,1)
      9072: ( 64, 13,  0,255) #400D00FF srgba(64,13,0,1)
     57158: ( 64, 23,  8,255) #401708FF srgba(64,23,8,1)
    152985: ( 64, 33,  5,255) #402105FF srgba(64,33,5,1)
     88654: ( 64, 36, 33,255) #402421FF srgba(64,36,33,1)
    151721: ( 64, 48, 48,255) #403030FF srgba(64,48,48,1)
     73325: ( 64, 97,128,255) #406180FF srgba(64,97,128,1)
     41855: ( 64, 97,153,255) #406199FF srgba(64,97,153,1)
     40222: ( 64,105,143,255) #40698FFF srgba(64,105,143,1)
    296013: ( 64,105,153,255) #406999FF srgba(64,105,153,1)
     26595: ( 64,112,153,255) #407099FF srgba(64,112,153,1)
     30327: ( 69,107,135,255) #456B87FF srgba(69,107,135,1)
    163230: ( 69,115,161,255) #4573A1FF srgba(69,115,161,1)
     18288: ( 71,105,153,255) #476999FF srgba(71,105,153,1)
    103113: ( 71,107,143,255) #476B8FFF srgba(71,107,143,1)
    191610: ( 71,112,153,255) #477099FF srgba(71,112,153,1)
    106046: ( 71,120,161,255) #4778A1FF srgba(71,120,161,1)
     13870: ( 77, 20, 13,255) #4D140DFF srgba(77,20,13,1)
     65952: ( 77, 36,  0,255) #4D2400FF srgba(77,36,0,1)
    161622: ( 77, 56, 51,255) #4D3833FF srgba(77,56,51,1)
     13630: ( 77, 71, 51,255) #4D4733FF srgba(77,71,51,1)
     66546: ( 77, 84, 92,255) #4D545CFF srgba(77,84,92,1)
     89080: ( 77, 92,115,255) #4D5C73FF srgba(77,92,115,1)
     10527: ( 77,112,148,255) #4D7094FF srgba(77,112,148,1)
     45489: ( 79, 20,  5,255) #4F1405FF srgba(79,20,5,1)
     74590: ( 79, 36, 20,255) #4F2414FF srgba(79,36,20,1)
     65570: ( 79, 41, 23,255) #4F2917FF srgba(79,41,23,1)
     77579: ( 79, 43, 15,255) #4F2B0FFF srgba(79,43,15,1)
      4649: ( 79, 56, 56,255) #4F3838FF srgba(79,56,56,1)
     51704: ( 79, 61, 41,255) #4F3D29FF srgba(79,61,41,1)
     37031: ( 79, 64, 15,255) #4F400FFF srgba(79,64,15,1)
     27979: ( 79,112,161,255) #4F70A1FF srgba(79,112,161,1)
     58649: ( 79,115,148,255) #4F7394FF srgba(79,115,148,1)
    141164: ( 79,120,161,255) #4F78A1FF srgba(79,120,161,1)
     95220: ( 79,125,163,255) #4F7DA3FF srgba(79,125,163,1)
    159620: ( 84, 61, 61,255) #543D3DFF srgba(84,61,61,1)
     81047: ( 84,107,133,255) #546B85FF srgba(84,107,133,1)
     49945: ( 84,125,148,255) #547D94FF srgba(84,125,148,1)
     72519: ( 89, 36,  0,255) #592400FF srgba(89,36,0,1)
     55758: ( 89, 84, 64,255) #595440FF srgba(89,84,64,1)
     30253: ( 89,112,143,255) #59708FFF srgba(89,112,143,1)
     67905: ( 89,120,168,255) #5978A8FF srgba(89,120,168,1)
     63514: ( 89,128,161,255) #5980A1FF srgba(89,128,161,1)
    214032: ( 89,128,168,255) #5980A8FF srgba(89,128,168,1)
     83453: ( 92, 89, 99,255) #5C5963FF srgba(92,89,99,1)
     59335: ( 92,107,115,255) #5C6B73FF srgba(92,107,115,1)
      1127: ( 92,133,161,255) #5C85A1FF srgba(92,133,161,1)
     86574: ( 97,128,161,255) #6180A1FF srgba(97,128,161,1)
     77805: ( 97,128,168,255) #6180A8FF srgba(97,128,168,1)
     12501: ( 97,135,161,255) #6187A1FF srgba(97,135,161,1)
    104639: ( 97,135,168,255) #6187A8FF srgba(97,135,168,1)
     86741: ( 97,135,176,255) #6187B0FF srgba(97,135,176,1)
    116828: ( 99, 64, 61,255) #63403DFF srgba(99,64,61,1)
     32936: ( 99, 77, 43,255) #634D2BFF srgba(99,77,43,1)
     15986: ( 99,115, 99,255) #637363FF srgba(99,115,99,1)
     61151: ( 99,120,133,255) #637885FF srgba(99,120,133,1)
     42044: (105, 41, 23,255) #692917FF srgba(105,41,23,1)
     61648: (105, 71, 15,255) #69470FFF srgba(105,71,15,1)
     79891: (105,128,148,255) #698094FF srgba(105,128,148,1)
     57581: (105,133,156,255) #69859CFF srgba(105,133,156,1)
     70901: (105,135,168,255) #6987A8FF srgba(105,135,168,1)
    121233: (105,135,176,255) #6987B0FF srgba(105,135,176,1)
       163: (105,143,168,255) #698FA8FF srgba(105,143,168,1)
     55879: (105,143,176,255) #698FB0FF srgba(105,143,176,1)
     22555: (107, 28,  0,255) #6B1C00FF srgba(107,28,0,1)
    109215: (107, 43,  0,255) #6B2B00FF srgba(107,43,0,1)
     75612: (107, 69, 61,255) #6B453DFF srgba(107,69,61,1)
       287: (107,133,156,255) #6B859CFF srgba(107,133,156,1)
     63753: (112, 71, 64,255) #704740FF srgba(112,71,64,1)
     68177: (112, 84, 64,255) #705440FF srgba(112,84,64,1)
     59716: (112,115,112,255) #707370FF srgba(112,115,112,1)
     25925: (112,125, 51,255) #707D33FF srgba(112,125,51,1)
     23877: (112,135,156,255) #70879CFF srgba(112,135,156,1)
     22892: (112,143,156,255) #708F9CFF srgba(112,143,156,1)
     72838: (112,143,168,255) #708FA8FF srgba(112,143,168,1)
    135048: (112,143,176,255) #708FB0FF srgba(112,143,176,1)
     90973: (115,140,161,255) #738CA1FF srgba(115,140,161,1)
     32534: (115,140,171,255) #738CABFF srgba(115,140,171,1)
     60246: (115,148,176,255) #7394B0FF srgba(115,148,176,1)
     79896: (120,128,133,255) #788085FF srgba(120,128,133,1)
     79702: (120,153,176,255) #7899B0FF srgba(120,153,176,1)
     15770: (125, 69, 89,255) #7D4559FF srgba(125,69,89,1)
     65300: (125, 77, 61,255) #7D4D3DFF srgba(125,77,61,1)
     95285: (125, 79, 69,255) #7D4F45FF srgba(125,79,69,1)
     16086: (125, 92, 64,255) #7D5C40FF srgba(125,92,64,1)
    104694: (125,107, 84,255) #7D6B54FF srgba(125,107,84,1)
     38079: (125,120,112,255) #7D7870FF srgba(125,120,112,1)
    101682: (125,135,143,255) #7D878FFF srgba(125,135,143,1)
     54942: (125,148,161,255) #7D94A1FF srgba(125,148,161,1)
    212598: (125,156,181,255) #7D9CB5FF srgba(125,156,181,1)
     34351: (128, 48, 20,255) #803014FF srgba(128,48,20,1)
     28761: (128,120, 97,255) #807861FF srgba(128,120,97,1)
     39151: (128,143,156,255) #808F9CFF srgba(128,143,156,1)
     87599: (128,153,168,255) #8099A8FF srgba(128,153,168,1)
      3592: (128,161,168,255) #80A1A8FF srgba(128,161,168,1)
     78572: (128,161,176,255) #80A1B0FF srgba(128,161,176,1)
     70110: (133, 79, 23,255) #854F17FF srgba(133,79,23,1)
     45823: (133, 84, 64,255) #855440FF srgba(133,84,64,1)
     68925: (133, 92, 69,255) #855C45FF srgba(133,92,69,1)
     41493: (133,120,112,255) #857870FF srgba(133,120,112,1)
     62094: (135, 43,  0,255) #872B00FF srgba(135,43,0,1)
     74313: (135,156,176,255) #879CB0FF srgba(135,156,176,1)
     43442: (135,161,168,255) #87A1A8FF srgba(135,161,168,1)
     71106: (140,125,105,255) #8C7D69FF srgba(140,125,105,1)
     89642: (140,143,140,255) #8C8F8CFF srgba(140,143,140,1)
     48689: (140,148,156,255) #8C949CFF srgba(140,148,156,1)
     17402: (140,153,163,255) #8C99A3FF srgba(140,153,163,1)
     22736: (140,161,156,255) #8CA19CFF srgba(140,161,156,1)
     69745: (143,133,128,255) #8F8580FF srgba(143,133,128,1)
     29012: (143,161,168,255) #8FA1A8FF srgba(143,161,168,1)
    143737: (143,161,176,255) #8FA1B0FF srgba(143,161,176,1)
    121562: (148, 92, 69,255) #945C45FF srgba(148,92,69,1)
     43242: (148,148,140,255) #94948CFF srgba(148,148,140,1)
    119580: (153, 77, 20,255) #994D14FF srgba(153,77,20,1)
     73652: (153,112, 89,255) #997059FF srgba(153,112,89,1)
     41377: (153,115, 41,255) #997329FF srgba(153,115,41,1)
     30459: (153,115, 77,255) #99734DFF srgba(153,115,77,1)
     46101: (153,125,105,255) #997D69FF srgba(153,125,105,1)
     22151: (153,156,156,255) #999C9CFF srgba(153,156,156,1)
     56411: (153,161,143,255) #99A18FFF srgba(153,161,143,1)
     63733: (153,161,161,255) #99A1A1FF srgba(153,161,161,1)
    152660: (153,168,163,255) #99A8A3FF srgba(153,168,163,1)
     64071: (156,140,120,255) #9C8C78FF srgba(156,140,120,1)
     35262: (156,148,140,255) #9C948CFF srgba(156,148,140,1)
    108005: (161,115, 84,255) #A17354FF srgba(161,115,84,1)
     12536: (161,133,120,255) #A18578FF srgba(161,133,120,1)
     50710: (161,153,135,255) #A19987FF srgba(161,153,135,1)
    104509: (168,128, 99,255) #A88063FF srgba(168,128,99,1)
    152640: (168,148,140,255) #A8948CFF srgba(168,148,140,1)
    142578: (171,140,115,255) #AB8C73FF srgba(171,140,115,1)
     67266: (181, 77, 23,255) #B54D17FF srgba(181,77,23,1)
     22518: (181,148, 51,255) #B59433FF srgba(181,148,51,1)
     19509: (196,196,120,255) #C4C478FF srgba(196,196,120,1)
     38063: (204,115, 51,255) #CC7333FF srgba(204,115,51,1)
     17746: (204,196,199,255) #CCC4C7FF srgba(204,196,199,1)
     49788: (204,199,135,255) #CCC787FF srgba(204,199,135,1)
     36897: (204,204, 84,255) #CCCC54FF srgba(204,204,84,1)
    124757: (212,115,  5,255) #D47305FF srgba(212,115,5,1)
     41941: (227,209, 28,255) #E3D11CFF srgba(227,209,28,1)
     78305: (232,148, 41,255) #E89429FF srgba(232,148,41,1)
     94959: (245,247,199,255) #F5F7C7FF srgba(245,247,199,1)
  Rendering intent: Perceptual
  Gamma: 0.454545
  Chromaticity:
    red primary: (0.64,0.33)
    green primary: (0.3,0.6)
    blue primary: (0.15,0.06)
    white point: (0.3127,0.329)
  Background color: white
  Border color: srgba(223,223,223,1)
  Matte color: grey74
  Transparent color: none
  Interlace: None
  Intensity: Undefined
  Compose: Over
  Page geometry: 6000x4000+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: Zip
  Orientation: Undefined
  Properties:
    date:create: 2021-03-26T17:18:52+01:00
    date:modify: 2021-03-26T15:43:37+01:00
    png:IHDR.bit-depth-orig: 8
    png:IHDR.bit_depth: 8
    png:IHDR.color-type-orig: 6
    png:IHDR.color_type: 6 (RGBA)
    png:IHDR.interlace_method: 0 (Not interlaced)
    png:IHDR.width,height: 6000, 4000
    png:sRGB: intent=0 (Perceptual Intent)
    signature: 839909d68908dc5b7809d46f5f6349af761bbddfffc7cfa34b0bcbd3588e8b67
  Artifacts:
    filename: big.png
    verbose: true
  Tainted: False
  Filesize: 19.19MB
  Number pixels: 24M
  Pixels per second: 32.88MB
  User time: 0.720u
  Elapsed time: 0:01.730
  Version: ImageMagick 6.9.7-4 Q16 x86_64 20170114 http://www.imagemagick.org

Thats alot of stats, dunno where the real compression hides here. And yes, the image is quite big with 19MB, the sixel original is 25MB.

@jerch
Copy link
Author

jerch commented Jul 2, 2022

Well imagemagick does not contain compression level, as stated here https://legacy.imagemagick.org/discourse-server/viewtopic.php?t=32574

pngcheck prints this:

...
    zlib: deflated, 32K window, fast compression
...

Does that mean, it is uncompressed? The image has highly different colors, could it be that the browser decided not to compress it because of high entropy?

Edit:
Wow - just tried to re-encode it with gimp and level 9 - took like 2min and ended up at 25MB. What a fail 😆

Edit2:
If you want to try yourself, the png file is this one: https://github.com/jerch/xterm.js/raw/daf3a58cc1611f15610388d8b597d21e4c5dcb66/addons/xterm-addon-image/fixture/big.png

Edit3:
Ok managed to get it properly converted with gimp by indexing it first - 8.3MB, png indexed. New runtime numbers:

  • png-codec: [1173, 1025, 925, 929, 916, 918, 918, 939, 921, 926]
  • upng: [745, 698, 564, 641, 555, 648, 542, 657, 598, 537]

@jerch
Copy link
Author

jerch commented Jul 3, 2022

A first optimization possibility I have found is convert16BitTo8BitData in decoder/array.ts. If rewritten with a rshift plus some loop unrolling, it runs way faster (only tested in javascript and no tail handling yet):

function convert16BitTo8BitData(data) {
  const result = new Uint8Array(data.length);
  for (let i = 0; i < result.length; i += 4) {
      result[i+0] = data[i+0] >> 8;
      result[i+1] = data[i+1] >> 8;
      result[i+2] = data[i+2] >> 8;
      result[i+3] = data[i+3] >> 8;
  }
  return result;
}

conversion throughput

  • current variant: 250 MB/s
  • shift+unroll: 360 MB/s

Furthermore this task would highly benefit from a wasm rewrite, as I'd expect it to boost throughput above 1GB/s (scalar), or with wasm-simd its a single _mm_shuffle_epi8 picking the high bytes of 8 elements at once (runs in one cycle), which should give >>6GB/s in wasm. Thats more than 20 times faster.

To make this more explicit - conversion of 100M channel values (== 25M pixels in RGBA) takes:

  • current: 410 ms
  • shift+unroll: 270 ms
  • wasm scalar: <100 ms
  • wasm-simd: <16 ms

Edit: Found another faster version in JS:

function convert16BitTo8BitData(data, buffer) {
  // FIXME: add proper tail handling
  const result = buffer
    ? new Uint32Array(buffer, 0, data.length / 4)
    : new Uint32Array(data.length / 4);
  const view = new Uint32Array(data.buffer);
  let t1, t2;
  for (let i = 0, j = 0; i < result.length; i += 4, j += 8) {
    t1 = view[j+0], t2 = view[j+1];
    result[i+0] = (t1 & 0xFF00) >> 8 | (t1 & 0xFF000000) >> 16 | (t2 & 0xFF00) << 8 | (t2 & 0xFF000000);
    t1 = view[j+2], t2 = view[j+3];
    result[i+1] = (t1 & 0xFF00) >> 8 | (t1 & 0xFF000000) >> 16 | (t2 & 0xFF00) << 8 | (t2 & 0xFF000000);
    t1 = view[j+4], t2 = view[j+5];
    result[i+2] = (t1 & 0xFF00) >> 8 | (t1 & 0xFF000000) >> 16 | (t2 & 0xFF00) << 8 | (t2 & 0xFF000000);
    t1 = view[j+6], t2 = view[j+7];
    result[i+3] = (t1 & 0xFF00) >> 8 | (t1 & 0xFF000000) >> 16 | (t2 & 0xFF00) << 8 | (t2 & 0xFF000000);
  }
  return new Uint8Array(result.buffer, 0, data.length);
}

Runs the test above in 170 ms (~580 MB/s), or in 125 ms (~800 MB/s) with inplace writing (avoids new memory alloc by setting the original data buffer as buffer, can only be used if original gets discarded anyway).

Edit2 - some wasm numbers:

  • wasm-scalar: 105 ms (~950 MB/s) with alloc, 55 ms (~1800 MB/s) inplace
  • wasm-simd: 70 ms (~1400 MB/s) with alloc, 23 ms (~4300 MB/s) inplace - cannot get it any faster, as it is already dominated by typedarray.set speed (wasm code itself takes less than 5 ms) - so final wasm-simd speedup is ~18x compared to current impl and ~5x compared to fastest JS variant.

@Tyriar
Copy link
Member

Tyriar commented Jul 4, 2022

The image magick output doesn't say anything about filters, the filter selection is the hardest part with optimizing pngs and if it's using filter none primarily (ie. just raw pixel data), it's probably not very optimized.

convert16BitTo8BitData

This is arguably not too important generally, I find 16-bit (ie. 64-bit per channel) png files are fairly rare. The TS change looks great though, want to make a PR? Testing is very comprehensive so if the tests pass I'm very confident nothing broke 🙂

Ok managed to get it properly converted with gimp by indexing it first - 8.3MB, png indexed. New runtime numbers:

Looks like it comes out to 8.7mb when saving with Luna Paint which seems pretty good, not sure how many colors it has as Luna Paint is struggling to inspect the file (lunapaint/vscode-luna-paint#145). Do you mean png indexed as in you needed to reduce it to 256 colors?

some wasm numbers

I haven't done much with wasm but I worry about the complexity it will bring both in the implementation and in consuming the library (where do the wasm files go, do they place nicely with bundlers, etc.), I always had issues when I tried to setup emscripten for example.

@jerch
Copy link
Author

jerch commented Jul 4, 2022

The image magick output doesn't say anything about filters, the filter selection is the hardest part with optimizing pngs and if it's using filter none primarily (ie. just raw pixel data), it's probably not very optimized.

Ic. Well I used the browser to create it, and it was pretty fast compared to what gimp did (<2s vs. >20s), prolly it just dumped pixel data with little additional effort. With is nice later on for term serialization because its fast, but at the same time created pngs are overly big which is bad when loading a serialize state back into the terminal. Hmm.

This is arguably not too important generally, I find 16-bit (ie. 64-bit per channel) png files are fairly rare. The TS change looks great though, want to make a PR? Testing is very comprehensive so if the tests pass I'm very confident nothing broke

Oh I know, its just the first function I stumbled over, and was a quite low hanging fruit, too. I did not even bother yet to look at more involved code paths as from the profiling data (those take much more time than a sunday afternoon). Sure I can create a PR (but need to write the tail handling first 😸).

Looks like it comes out to 8.7mb when saving with Luna Paint which seems pretty good, not sure how many colors it has as Luna Paint is struggling to inspect the file (lunapaint/vscode-luna-paint#145). Do you mean png indexed as in you needed to reduce it to 256 colors?

Yes 8.7MB is okish, that what I got from image magick's convert as well. With indexed I mean - the image is already indexed from its sixel origin and contains only 239 colors (thats what the libsixel's quantisation algo made of the original RGB image). But when converting it to PNG in the browser, its was not created as indexedPNG but as sRGB. Gimp works likewise - loading that PNG in Gimp and just re-exporting it assumes RGB mode, thus I had to tell Gimp explicitly to do a palette reduction first. Since the PNG was built from a paletted sixel image in the first place, it contains exactly the same pixel information - Gimp just collects all colors and finds <256 colors, thus can directly create the palette from all image colors (thus no color reduction/quantization is needed). The indexedPNG file takes only 8.3 MB from Gimp, and 8.7 MB from image magick.

I haven't done much with wasm but I worry about the complexity it will bring both in the implementation and in consuming the library (where do the wasm files go, do they place nicely with bundlers, etc.), I always had issues when I tried to setup emscripten for example.

Can set up a toy project with the wasm impl for convert16BitTo8BitData if you like. I kinda managed to do emscripten and npm bundling fully automatic - works under linux, prolly also macos, no clue about windows though. The gh action of node-sixel uses it (https://github.com/jerch/node-sixel/runs/6261370627?check_suite_focus=true).

@Tyriar
Copy link
Member

Tyriar commented Jul 4, 2022

Luna Paint uses the recommended quick filter selection per line as I think is recommended by the spec. More involved methods try to guess which filters would be better encoded by zlib by hopefully doing less than brute forcing every option

works under linux, prolly also macos, no clue about windows though

Yeah, Windows is always the tricky part 😄

Can set up a toy project with the wasm impl for convert16BitTo8BitData if you like.

Sure, with that I can answer those questions about setup/bundling.

@jerch
Copy link
Author

jerch commented Jul 4, 2022

Oh well, this might explain why gimp took so long maybe brute forcing things.

Early repo is up here: https://github.com/jerch/wasm-dummy (def. not yet working on windows, macos might work though).

Gonna close the issue, as I have enough info to get something rolling with the lib.

@jerch jerch closed this as completed Jul 4, 2022
@jerch
Copy link
Author

jerch commented Jul 5, 2022

@Tyriar Added an early wasm generator, which should have proper TS typing and creates an inline wasm module. Works already for very simple source code. 😸 (see https://github.com/jerch/wasm-dummy/blob/a77bf141cb2720037ffbc4259ad520dc2f362bc3/src/inline.ts#L26)

@Tyriar
Copy link
Member

Tyriar commented Jul 6, 2022

Never knew inline wasm was a thing. I'll have to play around with this some upcoming weekend, thanks 🙂

@jerch
Copy link
Author

jerch commented Jul 6, 2022

Oh well - real in-lining across all stages is not possible, as wasm needs (foreign) compilation. The generator is a source compiler, that has to run at a certain build step and replaces things with wasm bootstrap logic, which is fully embedded in the final bundle though.

@Tyriar
Copy link
Member

Tyriar commented Jul 6, 2022

Thought that's how it was working after having a little look. Still might be nice to keep the wasm source close if it's just a small function.

@jerch
Copy link
Author

jerch commented Jul 6, 2022

Still might be nice to keep the wasm source close if it's just a small function.

Yes I think so, too. It makes things much more straight forward for small c helpers for speed reasons. It prolly would also works for bigger sources, but no one really likes to code within a string of another language. Thus I dont see this for bigger wasm projects/additions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants