wip: 16bit shader conversions #1581

Julusian · 2024-10-07T16:51:39Z

This is something I started late last year, but haven't had a motivation to finish it. So I am pushing it here, in case someone wants to use it as inspiration or to copy pieces.

The work here was focussed on SDR 16bit compositing. At some point that would have evolved to HDR, but it wasn't considered yet. The intention being to get lossless SDR 10bit yuv through the system, rather than the slightly lossy flow that we have today.

The basic design was to on the producer side, to replace the point where we tell opengl to copy a buffer into a texture, with an opengl compute shader. This would allow us to do yuv->rgb conversion, and even to unpack certain common and packed formats, such as the decklink yuv10 packing. This was not implemented yet.

The hope was that doing it here (where opengl is likely already doing a copy, and rearranging the bytes) would have minimal cost on memory, and minimal cost on gpu power. I was trying to avoid doing this on the cpu, as in my experience that is typically under higher pressure (decoding video and deinterlacing). compute shaders are supported in our current minimum opengl version.
On the consumer side, the intention was to do something similar and using a compute shader to do the final copy from the composited texture into the buffer that is copied into cpu memory.
This would also mean that the composite shader could have the existing colour format handling code removed.

This consumer portion is fairly implemented, with a working (but not verified for accuracy) decklink v210 implementation. This does carry risk of doing more downloads from the gpu than before, but I think having more than a couple of consumers on a channel is uncommon, so being slightly more costly on pcie bandwidth and less so on cpu than a simd implementation is reasonable.
To support this, when constructing a consumer, it is passed a frame_converter, which it can use to convert the const_frame into whatever format it prefers. Additionally, the intention is that the key_only and subregion options in the decklink consumer would make their way into this converter, so that only the subregion needs to be converted and downloaded from the gpu.

For the status of this, it is possible to play yuv ffmpeg clips, or 16bit pngs, and output then in gpu generated yuv10 out of a decklink. The decklink consumer doesn't support k+f when fed yuv10 frames, but can be done with a second port set to key-only using the sync-group added previously. (I wanted to explore using the 3D api to support k+f on the 4k extreme cards)

A lot of things are hardcoded in testing setups, as this didn't progress far.

…face

This reverts commit f22f6a2.

Julusian and others added 30 commits December 30, 2023 14:50

wip: basic format mapping

a85012c

wip

8cd4cfc

wip

bf12120

wip

8a2a85b

wip

73fab17

things are hooked up, but has no output

c58ad0e

nope

90a6ecd

wip: something happens!

efd99f6

hack a mess

467062e

Add 16bit support to ogl texture

1f07c7f

add 16bit support to ogl device

a940412

Add create_frame override to specify bit_depth in frame_factory inter…

1332a46

…face

add native_depth property to caspar::array

8f87171

add 16bits support to image_mixer

723419d

wip: correct colour

b96d98e

simplify

79cde82

add 16bit yuv, untested

8e934cd

wip: propogate frame_converter type to consumers

0c69bb8

wip

57a3624

wip: boilerplate for frame conversion

f6a30be

wip: broke

111e8f7

fix: remove bit_depth property from array

95e30ec

wip: hackily expose composited texture inside const_frame

6618c4b

fix

1e3b38a

wip: incorrect conversion, but something semi identifiable

25fad45

fix colour and 8bit texture support

e7fc480

fix: rgba8 download was incorrectly 16bit packed

030308f

fix: remove unused windows only gl code

d6704df

wip: interleave shader and remove clamp

54e42bc

chore: remove some dead code

6155975

Julusian added 20 commits December 30, 2023 15:23

chore: format

a60827e

chore: add todos to ndi producer

9cb0282

wip: tidy

98419f6

wip: reimplement decklink key-only flag

4cf8a2a

feat: minimise cpu image conversions for image producer

4cc8a05

wip: start of 16bit png writing

219ada3

fix: image consumer 16bit generation

8cfc286

fix: image consumer 16bit defined by amcp

f22f6a2

Revert "fix: image consumer 16bit defined by amcp"

7476448

This reverts commit f22f6a2.

fix: typo

a99522e

feat: image producer can work in 64bit

c7984a4

fix: 64bit freeimage endianness

544b516

fix: propogate parameters from print command to image consumer

b9baeca

wip: tidy

60fca06

wip: tidying

69c29f5

wip: tidying

4f58d1a

wip: generic key-only implementation

e0047af

wip: fixes

f07b281

fix: allow 16bit from ffmpeg

bfcedf4

wip: boilerplate for decklink 12bit, but nothing happens

7adf098

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: 16bit shader conversions #1581

wip: 16bit shader conversions #1581

Julusian commented Oct 7, 2024 •

edited

Loading

wip: 16bit shader conversions #1581

Are you sure you want to change the base?

wip: 16bit shader conversions #1581

Conversation

Julusian commented Oct 7, 2024 • edited Loading

Julusian commented Oct 7, 2024 •

edited

Loading