From 06df1712e17ecc6f18505504762e995c37193b58 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 10:01:15 +0000
Subject: [PATCH 01/93] Create Case_study.md
First entry
---
Doc/Case_study.md | 8 ++++++++
1 file changed, 8 insertions(+)
create mode 100644 Doc/Case_study.md
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
new file mode 100644
index 00000000..3c34344f
--- /dev/null
+++ b/Doc/Case_study.md
@@ -0,0 +1,8 @@
+# RAWcookedCase Study
+## BFI National Archive
+
+At the BFI National Archive we have been encoding DPX sequences to FFV1 Matroska since late 2019.
+
+```
+rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
+```
From 91d425189695c3d3c86b46504d94fde722838729 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 10:01:51 +0000
Subject: [PATCH 02/93] Update Case_study.md
---
Doc/Case_study.md | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 3c34344f..c7ed2f0d 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,5 +1,7 @@
-# RAWcookedCase Study
-## BFI National Archive
+# RAWcooked Case Study
+
+## BFI National Archive
+
At the BFI National Archive we have been encoding DPX sequences to FFV1 Matroska since late 2019.
From 57189d4adea1b6fbd0110337c523187f7ab2e138 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 10:05:59 +0000
Subject: [PATCH 03/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index c7ed2f0d..bc976622 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -3,7 +3,7 @@
## BFI National Archive
-At the BFI National Archive we have been encoding DPX sequences to FFV1 Matroska since late 2019.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019.
```
rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
From ed30cb7e016a77900c6582384b1613a8f271dfa0 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 10:19:20 +0000
Subject: [PATCH 04/93] Update Case_study.md
---
Doc/Case_study.md | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index bc976622..e0e10e90 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,9 +1,24 @@
# RAWcooked Case Study
## BFI National Archive
+### Joanna White, Knowledge & Collections Developer
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has developed and evolved with DPX resolutions and flavours and changes in project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff DPX. This workflow is built on some of the flags developed by the Media Area. This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019.
+This case study is broken into the stages our encoding workflow follows and the Media Area tools we use to achieve our goals. We have four distinct stages:
+* Image sequence assessment
+* Encoding the image sequence
+* Encoding log assessments
+* FFV1 MKV validation
+
+## Image sequence assessment
+
+
+
+
+
+
+## Encoding the image sequence
```
rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
From 7b67a828fc5b6c36e5b7eeb0f44f5b95fe663c06 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 11:33:51 +0000
Subject: [PATCH 05/93] Update Case_study.md
---
Doc/Case_study.md | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index e0e10e90..09edd276 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -3,7 +3,19 @@
## BFI National Archive
### Joanna White, Knowledge & Collections Developer
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has developed and evolved with DPX resolutions and flavours and changes in project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff DPX. This workflow is built on some of the flags developed by the Media Area. This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with DPX resolutions and flavours and changes in project priorities - Ev. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff DPX. This workflow is built on some of the flags developed by the Media Area. This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
+
+To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths.
+
+Intel Xeon Gold 5218 @ 2.30GHz
+It's 32-core with 64 threads so can manage a LOT of processes concurrently. We have read/write to NAS storage devices that are pretty fast across a 100GB network.
+
+Previously we were running speedy RAWcooking (only RAWcooking) on a Virtual Machine of a NAS storage device and we read from/wrote to it at times:
+AMD Opteron 22xx (Gen 2 Class Opteron)
+8-core @ 3 GHz (estimated)
+8 threads
+
+When encoding 2K RGB we generally reach 4-5 FPS from FFmpeg encoding on the lower machine. Now running 4K scans it's often 1 FPS or less on the top machine.
This case study is broken into the stages our encoding workflow follows and the Media Area tools we use to achieve our goals. We have four distinct stages:
* Image sequence assessment
From 70f60ca98a55c8041b2d063f1208e111077e987c Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 11:46:29 +0000
Subject: [PATCH 06/93] Update Case_study.md
---
Doc/Case_study.md | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 09edd276..2098164c 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,10 +1,18 @@
# RAWcooked Case Study
-
+
## BFI National Archive
### Joanna White, Knowledge & Collections Developer
-
+
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with DPX resolutions and flavours and changes in project priorities - Ev. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff DPX. This workflow is built on some of the flags developed by the Media Area. This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
-
+
+This case study is broken into the stages our encoding workflow follows and the Media Area tools we use to achieve our goals. We have four distinct stages:
+* Server configuration
+* Image sequence assessment
+* Encoding the image sequence
+* Encoding log assessments
+* FFV1 MKV validation
+
+### Server configuration
To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths.
Intel Xeon Gold 5218 @ 2.30GHz
@@ -15,16 +23,10 @@ AMD Opteron 22xx (Gen 2 Class Opteron)
8-core @ 3 GHz (estimated)
8 threads
-When encoding 2K RGB we generally reach 4-5 FPS from FFmpeg encoding on the lower machine. Now running 4K scans it's often 1 FPS or less on the top machine.
-
-This case study is broken into the stages our encoding workflow follows and the Media Area tools we use to achieve our goals. We have four distinct stages:
-* Image sequence assessment
-* Encoding the image sequence
-* Encoding log assessments
-* FFV1 MKV validation
-
+When encoding 2K RGB we generally reach between 4 and 5 FPS from FFmpeg encoding on the lower machine. Now running 4K scans it's often 1 FPS or less on the top machine.
+
## Image sequence assessment
-
+
From c1009bce6c50e36f42bba8eb27d35c84310f453e Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 12:48:47 +0000
Subject: [PATCH 07/93] Update Case_study.md
---
Doc/Case_study.md | 66 +++++++++++++++++++++++++++++++++++++----------
1 file changed, 53 insertions(+), 13 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 2098164c..34dc1299 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,29 +1,40 @@
# RAWcooked Case Study
## BFI National Archive
-### Joanna White, Knowledge & Collections Developer
+#### Joanna White, Knowledge & Collections Developer
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with DPX resolutions and flavours and changes in project priorities - Ev. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff DPX. This workflow is built on some of the flags developed by the Media Area. This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. To find out more take a look at my [NTTW5 presentation about the BFI's RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
-This case study is broken into the stages our encoding workflow follows and the Media Area tools we use to achieve our goals. We have four distinct stages:
+This case study is broken into the following sections:
* Server configuration
* Image sequence assessment
* Encoding the image sequence
* Encoding log assessments
* FFV1 MKV validation
+* Conclusion & helpful test approaches
+* Additional resources
-### Server configuration
-To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths.
-
-Intel Xeon Gold 5218 @ 2.30GHz
-It's 32-core with 64 threads so can manage a LOT of processes concurrently. We have read/write to NAS storage devices that are pretty fast across a 100GB network.
-
-Previously we were running speedy RAWcooking (only RAWcooking) on a Virtual Machine of a NAS storage device and we read from/wrote to it at times:
+### Server configurations
+
+To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths in parallel.
+
+Our current configuration:
+Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
+252 GB RAM
+32-core with 64 CPU threads
+Ubuntu 20.04 LTS
+40Gbps Network card
+NAS storage on 10GB network
+
+Our previous 2K film encoding configuration:
+Virtual Machine of a NAS storage device
AMD Opteron 22xx (Gen 2 Class Opteron)
+12GB RAM
8-core @ 3 GHz (estimated)
8 threads
-
-When encoding 2K RGB we generally reach between 4 and 5 FPS from FFmpeg encoding on the lower machine. Now running 4K scans it's often 1 FPS or less on the top machine.
+Ubuntu 18.04 LTS
+
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding on the lower machine. Now running 4K scans it's generally 1 fps or less.
## Image sequence assessment
@@ -31,9 +42,38 @@ When encoding 2K RGB we generally reach between 4 and 5 FPS from FFmpeg encoding
-
## Encoding the image sequence
+This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
```
rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
```
+
+## Encoding log assessment
+
+## FFV1 Matroska validation
+
+## Conclusion & some helpful test approaches
+
+The workflow covers most of the the areas we think are essential for safe automated encoding of the DPX sequences. There is a need for manual intervention when repeated errors are encountered and an image sequences never makes it to our Digital Preservation Infrastructure. Most often this indicates a different image sequence flavour we do not have covered in our licence, but sometimes it can indicate a larger issue with either RAWcooked of FFmpeg encoding. Where errors are found these are reported to an error log named after the image seqeuence, for easier monitoring.
+
+When any upgrades occur we like to run some select reversibility test to ensure RAWcooked is still operating as we would expect. This can be for RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are muxed using our usual ```--all``` command, and then demuxed again fully. The image sequences of both the original and demuxed version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
+
+When we encounter an error there are a few commands I use that make reporting the issue a little easier at the Media Area RAWcooked GitHub issue tracker.
+```
+rawcooked -d -y -all --accept-gaps
+```
+The -d flag returns the command sent to FFmpeg instead of launching the command. This flag also leaves the reversibility data available to view as a text file and this is useful for finding errors.
+```
+head -c 1048576 sequence_name.mkv > dump_file.mkv
+```
+This command uses UNIX ```head``` software to cut the first 120KB of data from a supplied file, copying it to a new file which is easier to forward to Media Area for review. This contains the file's header data, often requested when a problem has occurred.
+```
+echo $?
+```
+This command should be run directly after a failed RAWcooked encoding, and it will tell you the exit code returned from that terminated run.
+
+The results of these three enquiries is always a brilliant way to open an Issue enquiry for Media Area and will help ensure swift diagnose for your problem. It may also be necessary to supply a DPX sequence, and your ```head``` command can be used again to extract the header data.
+
+
+## Additional resources
From d560eec15c347b7dcbcd9150f0fcebbdf4a97a81 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 13:38:47 +0000
Subject: [PATCH 08/93] Update Case_study.md
---
Doc/Case_study.md | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 34dc1299..0ca1e993 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,16 +1,17 @@
# RAWcooked Case Study
## BFI National Archive
-#### Joanna White, Knowledge & Collections Developer
+By Joanna White, Knowledge & Collections Developer
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. To find out more take a look at my [NTTW5 presentation about the BFI's RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
This case study is broken into the following sections:
* Server configuration
* Image sequence assessment
* Encoding the image sequence
* Encoding log assessments
-* FFV1 MKV validation
+* FFV1 Matroska validation
+* FFV1 Matroska demux to image sequence
* Conclusion & helpful test approaches
* Additional resources
@@ -53,6 +54,8 @@ rawcooked -y --all --no-accept-gaps -s 5281680 -o
Date: Mon, 19 Feb 2024 13:39:23 +0000
Subject: [PATCH 09/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 0ca1e993..0c5da83b 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,7 +1,7 @@
# RAWcooked Case Study
-## BFI National Archive
-By Joanna White, Knowledge & Collections Developer
+**BFI National Archive**
+**By Joanna White, Knowledge & Collections Developer**
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
From 7a1518f4ae31719adff07311dabb9434f474bbe1 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 13:40:13 +0000
Subject: [PATCH 10/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 0c5da83b..2eb24087 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -81,5 +81,5 @@ The results of these three enquiries is always a brilliant way to open an Issue
## Additional resources
-['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
-[RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
+*['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
+*[RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
From 6af0e101e43b506e164aeadd1fe4a4657b91a9ab Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 13:40:35 +0000
Subject: [PATCH 11/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 2eb24087..ae112971 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -81,5 +81,5 @@ The results of these three enquiries is always a brilliant way to open an Issue
## Additional resources
-*['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
-*[RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
+* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
+* [RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
From ab6094e401949bf4569266ee839978db1e97eb90 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 13:42:00 +0000
Subject: [PATCH 12/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index ae112971..b1f1ff0e 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -6,7 +6,7 @@
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
This case study is broken into the following sections:
-* Server configuration
+* [Server configuration](Server configurations)
* Image sequence assessment
* Encoding the image sequence
* Encoding log assessments
From 10cdb86f05543b041470897c798553fb0e99e799 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 13:43:32 +0000
Subject: [PATCH 13/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index b1f1ff0e..e4e90c93 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -6,7 +6,7 @@
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
This case study is broken into the following sections:
-* [Server configuration](Server configurations)
+* [Server configuration](#server_config)
* Image sequence assessment
* Encoding the image sequence
* Encoding log assessments
@@ -15,7 +15,7 @@ This case study is broken into the following sections:
* Conclusion & helpful test approaches
* Additional resources
-### Server configurations
+### Server configurations
To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths in parallel.
From e5086e287f8d61b5906faf1747fee51b08eb22ce Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 13:47:13 +0000
Subject: [PATCH 14/93] Update Case_study.md
Adding anchor points
---
Doc/Case_study.md | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index e4e90c93..76df8c23 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -7,13 +7,13 @@ At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we ha
This case study is broken into the following sections:
* [Server configuration](#server_config)
-* Image sequence assessment
-* Encoding the image sequence
-* Encoding log assessments
-* FFV1 Matroska validation
-* FFV1 Matroska demux to image sequence
-* Conclusion & helpful test approaches
-* Additional resources
+* [Image sequence assessment](#assessment)
+* [Encoding the image sequence](#muxing)
+* [Encoding log assessments](#muxing_log)
+* [FFV1 Matroska validation](#ffv1_valid)
+* [FFV1 Matroska demux to image sequence](#ffv1_demux)
+* [Conclusion & helpful test approaches](#conclusion)
+* [Additional resources](#links)
### Server configurations
@@ -37,26 +37,26 @@ Ubuntu 18.04 LTS
When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding on the lower machine. Now running 4K scans it's generally 1 fps or less.
-## Image sequence assessment
+## Image sequence assessment
-## Encoding the image sequence
+## Muxing the image sequence
This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
```
rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
```
-## Encoding log assessment
+## Muxing log assessment
-## FFV1 Matroska validation
+## FFV1 Matroska validation
-## FFV1 Matroska demux to image sequence
+## FFV1 Matroska demux to image sequence
-## Conclusion & some helpful test approaches
+## Conclusion & some helpful test approaches
The workflow covers most of the the areas we think are essential for safe automated encoding of the DPX sequences. There is a need for manual intervention when repeated errors are encountered and an image sequences never makes it to our Digital Preservation Infrastructure. Most often this indicates a different image sequence flavour we do not have covered in our licence, but sometimes it can indicate a larger issue with either RAWcooked of FFmpeg encoding. Where errors are found these are reported to an error log named after the image seqeuence, for easier monitoring.
@@ -79,7 +79,7 @@ This command should be run directly after a failed RAWcooked encoding, and it wi
The results of these three enquiries is always a brilliant way to open an Issue enquiry for Media Area and will help ensure swift diagnose for your problem. It may also be necessary to supply a DPX sequence, and your ```head``` command can be used again to extract the header data.
-## Additional resources
+## Additional resources
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
From ce94d40224cfbb94f988d993c10de24cc7f58536 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 14:38:45 +0000
Subject: [PATCH 15/93] Update Case_study.md
1 year study data
---
Doc/Case_study.md | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 76df8c23..7364fd09 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -3,13 +3,13 @@
**BFI National Archive**
**By Joanna White, Knowledge & Collections Developer**
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our encoding processes do not include any alpha channels or audio file processing, but RAWcooked is capable of muxing both into the completed FFV1 Matroska dependent upon your licence.
This case study is broken into the following sections:
-* [Server configuration](#server_config)
+* [Server configuration and throughput](#server_config)
* [Image sequence assessment](#assessment)
-* [Encoding the image sequence](#muxing)
-* [Encoding log assessments](#muxing_log)
+* [Muxing the image sequence](#muxing)
+* [Muxing log assessments](#log_assessment)
* [FFV1 Matroska validation](#ffv1_valid)
* [FFV1 Matroska demux to image sequence](#ffv1_demux)
* [Conclusion & helpful test approaches](#conclusion)
@@ -35,7 +35,20 @@ AMD Opteron 22xx (Gen 2 Class Opteron)
8 threads
Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding on the lower machine. Now running 4K scans it's generally 1 fps or less.
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding on the lower machine. Now running 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+
+Between Febraury 2023 and February 2024 the BFI encoded 1020 DPX sequences to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
+* 140 of the 1020 were 2K or smaller / 880 4K or larger
+* 222 were Luma Y / 798 were RGB
+* 143 were 10-bit / 279 12-bit / 598 16-bit
+* The largest reduction in size of any FFV1 from the DPX was 88%
+* The smallest reduction was just .3%
+* The largest reductions were from sequences both 10/12-bit, with RGB colorspace that had black and white filters applied
+* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
+* Across all 1020 muxed sequences the average size reduction was 71%
+
+Total time taken for the RAWcooked muxing start to finish was captured for a small group of sequences, and showed an average of 24 hours per sequence. All sequences MKV durations were between 5 and 10 minutes, with some taking just 7 hours to 46 hours. There appears to be no cause for this and so must deduce that network activity and amount of parallel processes (unknown) would have impacted these.
+
## Image sequence assessment
From 0709d92fdfa4b6dfecaf502f24a8820a2ca78f0a Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 14:40:17 +0000
Subject: [PATCH 16/93] Update Case_study.md
---
Doc/Case_study.md | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 7364fd09..73515517 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -6,7 +6,8 @@
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our encoding processes do not include any alpha channels or audio file processing, but RAWcooked is capable of muxing both into the completed FFV1 Matroska dependent upon your licence.
This case study is broken into the following sections:
-* [Server configuration and throughput](#server_config)
+* [Server configuration](#server_config)
+* [One year study of throughput](#findings)
* [Image sequence assessment](#assessment)
* [Muxing the image sequence](#muxing)
* [Muxing log assessments](#log_assessment)
@@ -37,7 +38,11 @@ Ubuntu 18.04 LTS
When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding on the lower machine. Now running 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+
+# One year study of throughput
+
Between Febraury 2023 and February 2024 the BFI encoded 1020 DPX sequences to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
+
* 140 of the 1020 were 2K or smaller / 880 4K or larger
* 222 were Luma Y / 798 were RGB
* 143 were 10-bit / 279 12-bit / 598 16-bit
From 91eb2ef72e2d2a8cf9262d5d0333c5f58a89baf4 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 14:58:51 +0000
Subject: [PATCH 17/93] Update Case_study.md
Image sequence assessment
---
Doc/Case_study.md | 46 ++++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 73515517..32b967a0 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -8,11 +8,11 @@ At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we ha
This case study is broken into the following sections:
* [Server configuration](#server_config)
* [One year study of throughput](#findings)
-* [Image sequence assessment](#assessment)
-* [Muxing the image sequence](#muxing)
-* [Muxing log assessments](#log_assessment)
-* [FFV1 Matroska validation](#ffv1_valid)
-* [FFV1 Matroska demux to image sequence](#ffv1_demux)
+* [Workflow: Image sequence assessment](#assessment)
+* [Workflow: Muxing the image sequence](#muxing)
+* [Workflow: Muxing log assessments](#log_assessment)
+* [Workflow: FFV1 Matroska validation](#ffv1_valid)
+* [Workflow: FFV1 Matroska demux to image sequence](#ffv1_demux)
* [Conclusion & helpful test approaches](#conclusion)
* [Additional resources](#links)
@@ -36,10 +36,10 @@ AMD Opteron 22xx (Gen 2 Class Opteron)
8 threads
Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding on the lower machine. Now running 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
-
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+
-# One year study of throughput
+### One year study of throughput
Between Febraury 2023 and February 2024 the BFI encoded 1020 DPX sequences to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
@@ -50,31 +50,41 @@ Between Febraury 2023 and February 2024 the BFI encoded 1020 DPX sequences to FF
* The smallest reduction was just .3%
* The largest reductions were from sequences both 10/12-bit, with RGB colorspace that had black and white filters applied
* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
-* Across all 1020 muxed sequences the average size reduction was 71%
+* Across all 1020 muxed sequences the average size reduction was 71%
-Total time taken for the RAWcooked muxing start to finish was captured for a small group of sequences, and showed an average of 24 hours per sequence. All sequences MKV durations were between 5 and 10 minutes, with some taking just 7 hours to 46 hours. There appears to be no cause for this and so must deduce that network activity and amount of parallel processes (unknown) would have impacted these.
+A small group of sequences had their total RAWcooked muxing time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodings took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
-## Image sequence assessment
+# Workflow
+### Image sequence assessment
+
+For each image sequence processed the metadata of the first DPX or TIFF is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence.
+
+Next the first file within the image sequence is checked against a MediaConch policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)) and if it passes then we know it can be encoded by RAWcooked and by our current licence.
+
+The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the file based on previous encoding experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequence to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place.
+
+| RAWcooked 2K RGB | RAWcooked Luma & RAWcooked 4K |
+| -------------------- | ----------------------------- |
+| 1.3TB reduces to 1TB | 1.0TB may only reduce to 1TB |
-
-
-## Muxing the image sequence
+### Muxing the image sequence
This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
```
rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
```
-## Muxing log assessment
+### Muxing log assessment
-## FFV1 Matroska validation
+### FFV1 Matroska validation
-## FFV1 Matroska demux to image sequence
+### FFV1 Matroska demux to image sequence
-## Conclusion & some helpful test approaches
+# Conclusion
+### Conclusion & some helpful test approaches
The workflow covers most of the the areas we think are essential for safe automated encoding of the DPX sequences. There is a need for manual intervention when repeated errors are encountered and an image sequences never makes it to our Digital Preservation Infrastructure. Most often this indicates a different image sequence flavour we do not have covered in our licence, but sometimes it can indicate a larger issue with either RAWcooked of FFmpeg encoding. Where errors are found these are reported to an error log named after the image seqeuence, for easier monitoring.
From 23b656e2f8faaf1e08198d5912e746dca7762ef6 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 15:04:51 +0000
Subject: [PATCH 18/93] Update Case_study.md
---
Doc/Case_study.md | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 32b967a0..144edb69 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -21,20 +21,22 @@ This case study is broken into the following sections:
To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths in parallel.
Our current configuration:
-Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
-252 GB RAM
-32-core with 64 CPU threads
-Ubuntu 20.04 LTS
-40Gbps Network card
-NAS storage on 10GB network
+- Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
+- 252 GB RAM
+- 32-core with 64 CPU threads
+- Ubuntu 20.04 LTS
+- 40Gbps Network card
+- NAS storage on 10GB network
+
+The more CPU threads you have the better your FFmpeg encoding to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpbu```.
Our previous 2K film encoding configuration:
-Virtual Machine of a NAS storage device
-AMD Opteron 22xx (Gen 2 Class Opteron)
-12GB RAM
-8-core @ 3 GHz (estimated)
-8 threads
-Ubuntu 18.04 LTS
+- Virtual Machine of a NAS storage device
+- AMD Opteron 22xx (Gen 2 Class Opteron)
+- 12GB RAM
+- 8-core @ 3 GHz (estimated)
+- 8 threads
+- Ubuntu 18.04 LTS
When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
From aef832f07843499b9a447c76d2c2c0d757925de4 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 15:06:53 +0000
Subject: [PATCH 19/93] Update Case_study.md
---
Doc/Case_study.md | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 144edb69..3ca428af 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -26,7 +26,7 @@ Our current configuration:
- 32-core with 64 CPU threads
- Ubuntu 20.04 LTS
- 40Gbps Network card
-- NAS storage on 10GB network
+- NAS storage on 40GB network
The more CPU threads you have the better your FFmpeg encoding to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpbu```.
@@ -43,9 +43,10 @@ When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps)
### One year study of throughput
-Between Febraury 2023 and February 2024 the BFI encoded 1020 DPX sequences to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
-
-* 140 of the 1020 were 2K or smaller / 880 4K or larger
+Between Febraury 2023 and February 2024 the BFI encoded **1020 DPX sequences** to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
+
+From 1020 total DPX sequences successfully muxed to FFV1 Matroska:
+* 140 were 2K or smaller / 880 were 4K
* 222 were Luma Y / 798 were RGB
* 143 were 10-bit / 279 12-bit / 598 16-bit
* The largest reduction in size of any FFV1 from the DPX was 88%
From f374a7452e4a89cb4791b33e3511ad6b216b8486 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 15:31:16 +0000
Subject: [PATCH 20/93] Update Case_study.md
---
Doc/Case_study.md | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 3ca428af..ad0761e7 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -28,7 +28,7 @@ Our current configuration:
- 40Gbps Network card
- NAS storage on 40GB network
-The more CPU threads you have the better your FFmpeg encoding to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpbu```.
+The more CPU threads you have the better your FFmpeg encoding to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
Our previous 2K film encoding configuration:
- Virtual Machine of a NAS storage device
@@ -61,11 +61,11 @@ A small group of sequences had their total RAWcooked muxing time recorded, revea
# Workflow
### Image sequence assessment
-For each image sequence processed the metadata of the first DPX or TIFF is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence.
+For each image sequence processed the metadata of the first DPX or TIFF is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using Media Area's MediaInfo software and capture the output into script variables.
-Next the first file within the image sequence is checked against a MediaConch policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)) and if it passes then we know it can be encoded by RAWcooked and by our current licence.
+Next the first file within the image sequence is checked against a Media Area MediaConch policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion, or possible anomalies in the DPX resulting.
-The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the file based on previous encoding experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequence to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place.
+The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the mux based on previous muxing experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB.
| RAWcooked 2K RGB | RAWcooked Luma & RAWcooked 4K |
| -------------------- | ----------------------------- |
@@ -75,11 +75,31 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
### Muxing the image sequence
-This includes the '--all' flag, originally an idea suggested by [NYPL](https://www.nypl.org/), more details in the [Encoding]() section below.
+To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when demuxed the retrieved sequence can be varified as bit-identical to the original source sequence.
+
+Our encoding command:
+```
+rawcooked -y --all --no-accept-gaps -s 5281680 -o >> 2>&1
+```
+| Command | Description |
+| ---------------------- | ------------------------------------ |
+| ```rawcooked``` | Calls the software |
+| ```-y``` | Answers 'yes' to software questions |
+| ```-all``` | Preservation command with checksums |
+| ```--no-accept-gaps``` | Exit with warning if sequence gaps |
+| ```-s 5281680``` | Set max attachment size to 5MB |
+| ```-o``` | Use output path for FFV1 MKV |
+| ```>>``` | Capture console output to text file |
+| ```2>&1``` | stderr and stdout messages captured |
+
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodings in parallel. This software makes it very simple to run multiple encodings, just by writing all the image sequence paths to one text file you can launch a parallel command like this:
```
-rawcooked -y --all --no-accept-gaps -s 5281680 -o &>>
+cat ${sequence_list.txt} | parallel --jobs 10 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
+We always capture our console logs for every encoding. The ```2>&1``` ensures any error messages are output alongside the usual standard messages. These are essential for us to review if a problem is found with an encoding. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. In recent years these logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment generation. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection.
+
+
### Muxing log assessment
### FFV1 Matroska validation
From 191c5b85deb1345bc6a37225ccedfb20b0839fa4 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 15:34:53 +0000
Subject: [PATCH 21/93] Update Case_study.md
Add horizontal lines
---
Doc/Case_study.md | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index ad0761e7..c59f5e80 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -15,7 +15,8 @@ This case study is broken into the following sections:
* [Workflow: FFV1 Matroska demux to image sequence](#ffv1_demux)
* [Conclusion & helpful test approaches](#conclusion)
* [Additional resources](#links)
-
+
+---
### Server configurations
To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths in parallel.
@@ -39,8 +40,8 @@ Our previous 2K film encoding configuration:
- Ubuntu 18.04 LTS
When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
-
-
+
+---
### One year study of throughput
Between Febraury 2023 and February 2024 the BFI encoded **1020 DPX sequences** to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
@@ -56,8 +57,8 @@ From 1020 total DPX sequences successfully muxed to FFV1 Matroska:
* Across all 1020 muxed sequences the average size reduction was 71%
A small group of sequences had their total RAWcooked muxing time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodings took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
-
-
+
+---
# Workflow
### Image sequence assessment
@@ -72,7 +73,7 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
| 1.3TB reduces to 1TB | 1.0TB may only reduce to 1TB |
-
+---
### Muxing the image sequence
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when demuxed the retrieved sequence can be varified as bit-identical to the original source sequence.
From 0d1450b5d71ca7be807d53807bb5244543b613a1 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 15:52:02 +0000
Subject: [PATCH 22/93] Update Case_study.md
---
Doc/Case_study.md | 53 +++++++++++++++++++++++++++++++++--------------
1 file changed, 37 insertions(+), 16 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index c59f5e80..f6f8530e 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -82,27 +82,48 @@ Our encoding command:
```
rawcooked -y --all --no-accept-gaps -s 5281680 -o >> 2>&1
```
-| Command | Description |
-| ---------------------- | ------------------------------------ |
-| ```rawcooked``` | Calls the software |
-| ```-y``` | Answers 'yes' to software questions |
-| ```-all``` | Preservation command with checksums |
-| ```--no-accept-gaps``` | Exit with warning if sequence gaps |
-| ```-s 5281680``` | Set max attachment size to 5MB |
-| ```-o``` | Use output path for FFV1 MKV |
-| ```>>``` | Capture console output to text file |
-| ```2>&1``` | stderr and stdout messages captured |
-
-This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodings in parallel. This software makes it very simple to run multiple encodings, just by writing all the image sequence paths to one text file you can launch a parallel command like this:
+
+| Command | Description |
+| ---------------------- | ------------------------------------------ |
+| ```rawcooked``` | Calls the software |
+| ```-y``` | Answers 'yes' to software questions |
+| ```-all``` | Preservation command with checksums |
+| ```--no-accept-gaps``` | Exit with warning if sequence gaps found |
+| ```-s 5281680``` | Set max attachment size to 5MB |
+| ```-o``` | Use output path for FFV1 MKV |
+| ```>>``` | Capture console output to text file |
+| ```2>&1``` | stderr and stdout messages captured in log |
+
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodings in parallel. This software makes it very simple to run multiple encodings specified by the ```--job``` flag. By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel encodings:
```
-cat ${sequence_list.txt} | parallel --jobs 10 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
+cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
-
-We always capture our console logs for every encoding. The ```2>&1``` ensures any error messages are output alongside the usual standard messages. These are essential for us to review if a problem is found with an encoding. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. In recent years these logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment generation. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection.
-
+We always capture our console logs for every encoding. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an encoding. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
+
+---
### Muxing log assessment
+The muxing logs are critical for the automated assessment of the muxing process. Each log consists of four blocks of data:
+* The RAWcooked assessment of the sequence
+* The FFmpeg encoding data
+* The post-muxing RAWcooked assessment of the FFV1 Matroska
+* Text review of the success of the muxed sequence
+
+The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg muxing data contains sequence and FFV1 metadata, along with choices made by the software for the muxing process and logs of the fps for the muxing of the sequence. All this information is really important when there's an issue with the muxing. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you can be given different types of human readable message including:
+* Warnings about the image sequence files
+* Errors experienced during muxing
+* Information about the RAWcooked mux (RAWcooked version, if checksum hashes included)
+* Completion success or failure statement
+
+The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. Once found this FFV1 Matroska is deleted and the sequence is queued for a repeated encoding attempt. Similarly if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encoding.
+
+There is one error message that triggers a specific type of re-encoding:
+```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
+
+For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible.
+
+---
### FFV1 Matroska validation
### FFV1 Matroska demux to image sequence
From 07df4546823a4727b77a4ac6fc77fced202c4768 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 16:02:51 +0000
Subject: [PATCH 23/93] Update Case_study.md
---
Doc/Case_study.md | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index f6f8530e..2d027f7e 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -80,7 +80,7 @@ To encode our image sequences we use the ```--all``` flag released in RAWcooked
Our encoding command:
```
-rawcooked -y --all --no-accept-gaps -s 5281680 -o >> 2>&1
+rawcooked -y --all --no-accept-gaps -s 5281680 *path/sequence_name/* -o *path/sequence_name.mkv* >> *path/sequence_name.mkv.txt* 2>&1
```
| Command | Description |
@@ -116,15 +116,22 @@ The RAWcooked assessments themselves are lines of repeated data, counting from 0
* Information about the RAWcooked mux (RAWcooked version, if checksum hashes included)
* Completion success or failure statement
-The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. Once found this FFV1 Matroska is deleted and the sequence is queued for a repeated encoding attempt. Similarly if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encoding.
+The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encoding attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encoding. A successful completion statement should always read:
+```Reversibility was checked, no issues detected.```
-There is one error message that triggers a specific type of re-encoding:
+There is one error message that triggers a specific type of remux:
```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible.
+For this error we know that we need to remux our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matoskas encoding using this ```--output-version 2``` flag are not backward compatible with RAWcooked version before V21.
---
### FFV1 Matroska validation
+
+When the logs have been assessed and the message ```Reversibility was checked, no issue detected``` was found, then the FFV1 Matroska has metadata validation using the [BFI's MediaConch policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_mkv_policy.xml). This policy ensures that the FFV1 Matroska is whole by looking for duration field entries, checks for reversibility data, and that the correct FFV1 and Matroska formats are being used. It also ensures that all the FFV1 error detection features are present, that slices are included, bit rate is over 300 and pixel aspect ratio is 1.000.
+
+If the policy passes then the FFV1 Matroska is moved onto the final stage, where the RAWcooked flag ```--check``` is used to ensure that the FFV1 Matroska is correctly formed.
+```rawcooked --check
+
### FFV1 Matroska demux to image sequence
From 084925b510f637a4521905857e746920fd84d86b Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 16:17:43 +0000
Subject: [PATCH 24/93] Update Case_study.md
Add demux section
---
Doc/Case_study.md | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 2d027f7e..ce434bf6 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -80,7 +80,7 @@ To encode our image sequences we use the ```--all``` flag released in RAWcooked
Our encoding command:
```
-rawcooked -y --all --no-accept-gaps -s 5281680 *path/sequence_name/* -o *path/sequence_name.mkv* >> *path/sequence_name.mkv.txt* 2>&1
+rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/sequence_name.mkv >> path/sequence_name.mkv.txt 2>&1
```
| Command | Description |
@@ -130,11 +130,22 @@ For this error we know that we need to remux our image sequence with the additio
When the logs have been assessed and the message ```Reversibility was checked, no issue detected``` was found, then the FFV1 Matroska has metadata validation using the [BFI's MediaConch policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_mkv_policy.xml). This policy ensures that the FFV1 Matroska is whole by looking for duration field entries, checks for reversibility data, and that the correct FFV1 and Matroska formats are being used. It also ensures that all the FFV1 error detection features are present, that slices are included, bit rate is over 300 and pixel aspect ratio is 1.000.
If the policy passes then the FFV1 Matroska is moved onto the final stage, where the RAWcooked flag ```--check``` is used to ensure that the FFV1 Matroska is correctly formed.
-```rawcooked --check
-
+```rawcooked --check path/sequence_name.mkv >> path/sequence_name.mkv.txt 2>&1```
+Again the stderr and strout messages are captured to a log, and this log is checked for the message ```Reversibility was checked, no issues detected.``` When this check completes the FFV1 Matroska is moved to our Digital Preservation Infrastructure and the original image sequence is deleted under automation.
+
+---
### FFV1 Matroska demux to image sequence
+We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select demux when an FFV1 Matroska is supplied.
+
+This simple script runs this command:
+```
+rawcooked -y --all path/sequence_name.dpx -o path/demux_sequence >> path/sequence_name.txt 2>&1
+```
+It demuxes the FFV1 Matroska back to image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
+
+---
# Conclusion
### Conclusion & some helpful test approaches
From af21ebad8c1d02a791232c09a5d5e524bd422867 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 16:29:10 +0000
Subject: [PATCH 25/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index ce434bf6..917930f6 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -149,9 +149,9 @@ It demuxes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
# Conclusion
### Conclusion & some helpful test approaches
-The workflow covers most of the the areas we think are essential for safe automated encoding of the DPX sequences. There is a need for manual intervention when repeated errors are encountered and an image sequences never makes it to our Digital Preservation Infrastructure. Most often this indicates a different image sequence flavour we do not have covered in our licence, but sometimes it can indicate a larger issue with either RAWcooked of FFmpeg encoding. Where errors are found these are reported to an error log named after the image seqeuence, for easier monitoring.
+We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our Unlocking Film Heritage projet. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file muxing. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
-When any upgrades occur we like to run some select reversibility test to ensure RAWcooked is still operating as we would expect. This can be for RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are muxed using our usual ```--all``` command, and then demuxed again fully. The image sequences of both the original and demuxed version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
+When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are muxed using our usual ```--all``` command, and then demuxed again fully. The image sequences of both the original and demuxed version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
When we encounter an error there are a few commands I use that make reporting the issue a little easier at the Media Area RAWcooked GitHub issue tracker.
```
From 0afbf395dfa4e95e6e7d96eef714c4cf3d838063 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 16:47:39 +0000
Subject: [PATCH 26/93] Update Case_study.md
---
Doc/Case_study.md | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 917930f6..6ff5f23b 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -3,7 +3,7 @@
**BFI National Archive**
**By Joanna White, Knowledge & Collections Developer**
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our encoding processes do not include any alpha channels or audio file processing, but RAWcooked is capable of muxing both into the completed FFV1 Matroska dependent upon your licence.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been muxing DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our muxing project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our muxing processes do not include any alpha channels or audio file processing, but RAWcooked is capable of muxing both into the completed FFV1 Matroska dependent upon your licence.
This case study is broken into the following sections:
* [Server configuration](#server_config)
@@ -29,9 +29,9 @@ Our current configuration:
- 40Gbps Network card
- NAS storage on 40GB network
-The more CPU threads you have the better your FFmpeg encoding to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
+The more CPU threads you have the better your FFmpeg mux to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
-Our previous 2K film encoding configuration:
+Our previous server configuration:
- Virtual Machine of a NAS storage device
- AMD Opteron 22xx (Gen 2 Class Opteron)
- 12GB RAM
@@ -39,7 +39,7 @@ Our previous 2K film encoding configuration:
- 8 threads
- Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg encoding, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+When muxing 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
---
### One year study of throughput
@@ -56,7 +56,7 @@ From 1020 total DPX sequences successfully muxed to FFV1 Matroska:
* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
* Across all 1020 muxed sequences the average size reduction was 71%
-A small group of sequences had their total RAWcooked muxing time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodings took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+A small group of sequences had their total RAWcooked muxing time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest muxes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
---
# Workflow
@@ -78,7 +78,7 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when demuxed the retrieved sequence can be varified as bit-identical to the original source sequence.
-Our encoding command:
+Our RAWcooked mux command:
```
rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/sequence_name.mkv >> path/sequence_name.mkv.txt 2>&1
```
@@ -94,19 +94,19 @@ rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/seque
| ```>>``` | Capture console output to text file |
| ```2>&1``` | stderr and stdout messages captured in log |
-This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodings in parallel. This software makes it very simple to run multiple encodings specified by the ```--job``` flag. By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel encodings:
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple muxes in parallel. This software makes it very simple to run multiple muxes specified by the ```--job``` flag. By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel muxes:
```
cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
-We always capture our console logs for every encoding. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an encoding. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
+We always capture our console logs for every mux. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with a mux. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX muxes, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
---
### Muxing log assessment
The muxing logs are critical for the automated assessment of the muxing process. Each log consists of four blocks of data:
* The RAWcooked assessment of the sequence
-* The FFmpeg encoding data
+* The FFmpeg muxing data
* The post-muxing RAWcooked assessment of the FFV1 Matroska
* Text review of the success of the muxed sequence
@@ -116,13 +116,13 @@ The RAWcooked assessments themselves are lines of repeated data, counting from 0
* Information about the RAWcooked mux (RAWcooked version, if checksum hashes included)
* Completion success or failure statement
-The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encoding attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encoding. A successful completion statement should always read:
+The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated mux attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat mux. A successful completion statement should always read:
```Reversibility was checked, no issues detected.```
There is one error message that triggers a specific type of remux:
```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-For this error we know that we need to remux our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matoskas encoding using this ```--output-version 2``` flag are not backward compatible with RAWcooked version before V21.
+For this error we know that we need to remux our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once mux is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are muxed using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.01.
---
### FFV1 Matroska validation
@@ -149,28 +149,33 @@ It demuxes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
# Conclusion
### Conclusion & some helpful test approaches
-We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our Unlocking Film Heritage projet. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file muxing. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
+We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our Unlocking Film Heritage projet. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated muxing of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file muxing. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are muxed using our usual ```--all``` command, and then demuxed again fully. The image sequences of both the original and demuxed version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
-When we encounter an error there are a few commands I use that make reporting the issue a little easier at the Media Area RAWcooked GitHub issue tracker.
+When we encounter an error there are a few commands I use that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
```
rawcooked -d -y -all --accept-gaps
```
-The -d flag returns the command sent to FFmpeg instead of launching the command. This flag also leaves the reversibility data available to view as a text file and this is useful for finding errors.
+Adding the ```-d``` flag doesn't run the muxing, but returns the command sent to FFmpeg. This flag also leaves the reversibility data available to view as a text file and this is useful for finding errors.
```
head -c 1048576 sequence_name.mkv > dump_file.mkv
```
-This command uses UNIX ```head``` software to cut the first 120KB of data from a supplied file, copying it to a new file which is easier to forward to Media Area for review. This contains the file's header data, often requested when a problem has occurred.
+This command uses UNIX ```head``` command to cut the first 1MB of data from a supplied file, copying it to a new file which is easier to forward to Media Area for review. This contains the file's header data, often requested when a problem has occurred.
```
echo $?
```
-This command should be run directly after a failed RAWcooked encoding, and it will tell you the exit code returned from that terminated run.
+This command should be run directly after a failed RAWcooked mux, and it will tell you the exit code returned from that terminated run.
The results of these three enquiries is always a brilliant way to open an Issue enquiry for Media Area and will help ensure swift diagnose for your problem. It may also be necessary to supply a DPX sequence, and your ```head``` command can be used again to extract the header data.
## Additional resources
+* [RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
+* [Further conference presentations about BFI National Archive use of RAWcooked](https://github.com/MediaArea/RAWcooked/issues)
+* [DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
+* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
+* [RAWCooking With Gas: A Film Digitization and QC Workflow-in-progress by Genevieve Havemeyer-King](https://youtu.be/-cJxq7Vr3Nk?si=BjPWzsZ7LRKMVZNF)
From 32d1077c81b7379a66f3f3d2d378152134a4d444 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 16:58:37 +0000
Subject: [PATCH 27/93] Update Case_study.md
---
Doc/Case_study.md | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 6ff5f23b..e55096ba 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -94,7 +94,9 @@ rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/seque
| ```>>``` | Capture console output to text file |
| ```2>&1``` | stderr and stdout messages captured in log |
-This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple muxes in parallel. This software makes it very simple to run multiple muxes specified by the ```--job``` flag. By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel muxes:
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple muxes in parallel. This software makes it very simple to run a number of muxes specified by the ```--job``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work to save time. If not run in parallel a computer will usually process jobs one after another. As well as parallelisation, FFmpeg usinges multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (from 64 slices per RAWcooked frame) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so four our server that 64 separate tasks per thread, one slice per frame of the FFV1 file.
+
+By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel muxes:
```
cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
@@ -174,7 +176,7 @@ The results of these three enquiries is always a brilliant way to open an Issue
* [RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
-* [RAWcooked cheat sheet](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
+* [RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
* [Further conference presentations about BFI National Archive use of RAWcooked](https://github.com/MediaArea/RAWcooked/issues)
* [DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
From b31ccda05ac0c94c4fb0f705cbcb9b6300432c04 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:11:04 +0000
Subject: [PATCH 28/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index e55096ba..3fd202d9 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -124,7 +124,7 @@ The automation scripts used a the the BFI National Archive largely ignore the wa
There is one error message that triggers a specific type of remux:
```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-For this error we know that we need to remux our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once mux is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are muxed using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.01.
+For this error we know that we need to remux our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once mux is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are muxed using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
---
### FFV1 Matroska validation
From 615f78dcc38ff2b171d1e5d38c62e76b2c2403f9 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:30:43 +0000
Subject: [PATCH 29/93] Update Case_study.md
Update mux to encode
---
Doc/Case_study.md | 72 +++++++++++++++++++++++------------------------
1 file changed, 36 insertions(+), 36 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 3fd202d9..17660561 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,18 +1,18 @@
# RAWcooked Case Study
**BFI National Archive**
-**By Joanna White, Knowledge & Collections Developer**
+**Joanna White, Knowledge & Collections Developer**
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been muxing DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our muxing project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our muxing processes do not include any alpha channels or audio file processing, but RAWcooked is capable of muxing both into the completed FFV1 Matroska dependent upon your licence.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our encoding processes do not include any alpha channels or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska dependent upon your licence.
This case study is broken into the following sections:
* [Server configuration](#server_config)
* [One year study of throughput](#findings)
* [Workflow: Image sequence assessment](#assessment)
-* [Workflow: Muxing the image sequence](#muxing)
-* [Workflow: Muxing log assessments](#log_assessment)
+* [Workflow: Encoding the image sequence](#muxing)
+* [Workflow: Encoding log assessments](#log_assessment)
* [Workflow: FFV1 Matroska validation](#ffv1_valid)
-* [Workflow: FFV1 Matroska demux to image sequence](#ffv1_demux)
+* [Workflow: FFV1 Matroska decode to image sequence](#ffv1_demux)
* [Conclusion & helpful test approaches](#conclusion)
* [Additional resources](#links)
@@ -29,7 +29,7 @@ Our current configuration:
- 40Gbps Network card
- NAS storage on 40GB network
-The more CPU threads you have the better your FFmpeg mux to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
+The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
Our previous server configuration:
- Virtual Machine of a NAS storage device
@@ -39,14 +39,14 @@ Our previous server configuration:
- 8 threads
- Ubuntu 18.04 LTS
-When muxing 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
---
### One year study of throughput
-Between Febraury 2023 and February 2024 the BFI encoded **1020 DPX sequences** to FFV1 Matroska. A Python script was written to capture certain data about these muxed files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
+Between Febraury 2023 and February 2024 the BFI encoded **1020 DPX sequences** to FFV1 Matroska. A Python script was written to capture certain data about these encoded files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
-From 1020 total DPX sequences successfully muxed to FFV1 Matroska:
+From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 140 were 2K or smaller / 880 were 4K
* 222 were Luma Y / 798 were RGB
* 143 were 10-bit / 279 12-bit / 598 16-bit
@@ -54,9 +54,9 @@ From 1020 total DPX sequences successfully muxed to FFV1 Matroska:
* The smallest reduction was just .3%
* The largest reductions were from sequences both 10/12-bit, with RGB colorspace that had black and white filters applied
* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
-* Across all 1020 muxed sequences the average size reduction was 71%
+* Across all 1020 encoded sequences the average size reduction was 71%
-A small group of sequences had their total RAWcooked muxing time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest muxes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+A small group of sequences had their total RAWcooked encoding time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
---
# Workflow
@@ -66,7 +66,7 @@ For each image sequence processed the metadata of the first DPX or TIFF is colle
Next the first file within the image sequence is checked against a Media Area MediaConch policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion, or possible anomalies in the DPX resulting.
-The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the mux based on previous muxing experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB.
+The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the encode based on previous encoding experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB.
| RAWcooked 2K RGB | RAWcooked Luma & RAWcooked 4K |
| -------------------- | ----------------------------- |
@@ -76,9 +76,9 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
---
### Muxing the image sequence
-To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when demuxed the retrieved sequence can be varified as bit-identical to the original source sequence.
+To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be varified as bit-identical to the original source sequence.
-Our RAWcooked mux command:
+Our RAWcooked encode command:
```
rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/sequence_name.mkv >> path/sequence_name.mkv.txt 2>&1
```
@@ -94,37 +94,37 @@ rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/seque
| ```>>``` | Capture console output to text file |
| ```2>&1``` | stderr and stdout messages captured in log |
-This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple muxes in parallel. This software makes it very simple to run a number of muxes specified by the ```--job``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work to save time. If not run in parallel a computer will usually process jobs one after another. As well as parallelisation, FFmpeg usinges multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (from 64 slices per RAWcooked frame) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so four our server that 64 separate tasks per thread, one slice per frame of the FFV1 file.
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodes in parallel. This software makes it very simple to run a number of encodes specified by the ```--job``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work to save time. If not run in parallel a computer will usually process jobs one after another. As well as parallelisation, FFmpeg usinges multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (from 64 slices per RAWcooked frame) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so four our server that 64 separate tasks per thread, one slice per frame of the FFV1 file.
-By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel muxes:
+By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel encodes:
```
cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
-We always capture our console logs for every mux. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with a mux. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX muxes, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
+We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with a encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodes, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
---
-### Muxing log assessment
+### Encoding log assessment
-The muxing logs are critical for the automated assessment of the muxing process. Each log consists of four blocks of data:
+The encoding logs are critical for the automated assessment of the encoding process. Each log consists of four blocks of data:
* The RAWcooked assessment of the sequence
-* The FFmpeg muxing data
-* The post-muxing RAWcooked assessment of the FFV1 Matroska
-* Text review of the success of the muxed sequence
+* The FFmpeg encoding data
+* The post-encoding RAWcooked assessment of the FFV1 Matroska
+* Text review of the success of the encoded sequence
-The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg muxing data contains sequence and FFV1 metadata, along with choices made by the software for the muxing process and logs of the fps for the muxing of the sequence. All this information is really important when there's an issue with the muxing. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you can be given different types of human readable message including:
+The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg encoding data contains sequence and FFV1 metadata, along with choices made by the software for the encoding process and logs of the fps for the encoding of the sequence. All this information is really important when there's an issue with the encoding. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you can be given different types of human readable message including:
* Warnings about the image sequence files
-* Errors experienced during muxing
-* Information about the RAWcooked mux (RAWcooked version, if checksum hashes included)
+* Errors experienced during encoding
+* Information about the RAWcooked encode (RAWcooked version, if checksum hashes included)
* Completion success or failure statement
-The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated mux attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat mux. A successful completion statement should always read:
+The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
```Reversibility was checked, no issues detected.```
-There is one error message that triggers a specific type of remux:
+There is one error message that triggers a specific type of re-encode:
```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-For this error we know that we need to remux our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once mux is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are muxed using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
+For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encode is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
---
### FFV1 Matroska validation
@@ -137,29 +137,29 @@ If the policy passes then the FFV1 Matroska is moved onto the final stage, where
Again the stderr and strout messages are captured to a log, and this log is checked for the message ```Reversibility was checked, no issues detected.``` When this check completes the FFV1 Matroska is moved to our Digital Preservation Infrastructure and the original image sequence is deleted under automation.
---
-### FFV1 Matroska demux to image sequence
+### FFV1 Matroska decode to image sequence
-We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select demux when an FFV1 Matroska is supplied.
+We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select decode when an FFV1 Matroska is supplied.
This simple script runs this command:
```
-rawcooked -y --all path/sequence_name.dpx -o path/demux_sequence >> path/sequence_name.txt 2>&1
+rawcooked -y --all path/sequence_name.dpx -o path/decode_sequence >> path/sequence_name.txt 2>&1
```
-It demuxes the FFV1 Matroska back to image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
+It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
---
# Conclusion
### Conclusion & some helpful test approaches
-We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our Unlocking Film Heritage projet. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated muxing of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file muxing. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
+We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our Unlocking Film Heritage projet. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
-When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are muxed using our usual ```--all``` command, and then demuxed again fully. The image sequences of both the original and demuxed version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
+When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
When we encounter an error there are a few commands I use that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
```
rawcooked -d -y -all --accept-gaps
```
-Adding the ```-d``` flag doesn't run the muxing, but returns the command sent to FFmpeg. This flag also leaves the reversibility data available to view as a text file and this is useful for finding errors.
+Adding the ```-d``` flag doesn't run the encoding, but returns the command sent to FFmpeg. This flag also leaves the reversibility data available to view as a text file and this is useful for finding errors.
```
head -c 1048576 sequence_name.mkv > dump_file.mkv
```
@@ -167,7 +167,7 @@ This command uses UNIX ```head``` command to cut the first 1MB of data from a su
```
echo $?
```
-This command should be run directly after a failed RAWcooked mux, and it will tell you the exit code returned from that terminated run.
+This command should be run directly after a failed RAWcooked encode, and it will tell you the exit code returned from that terminated run.
The results of these three enquiries is always a brilliant way to open an Issue enquiry for Media Area and will help ensure swift diagnose for your problem. It may also be necessary to supply a DPX sequence, and your ```head``` command can be used again to extract the header data.
From a785a2939d9ee18987f457ceb715337162ea5c23 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:46:47 +0000
Subject: [PATCH 30/93] Update Case_study.md
Spaces and conclusion
---
Doc/Case_study.md | 50 +++++++++++++++++++++++------------------------
1 file changed, 24 insertions(+), 26 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 17660561..6e89197c 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -13,7 +13,8 @@ This case study is broken into the following sections:
* [Workflow: Encoding log assessments](#log_assessment)
* [Workflow: FFV1 Matroska validation](#ffv1_valid)
* [Workflow: FFV1 Matroska decode to image sequence](#ffv1_demux)
-* [Conclusion & helpful test approaches](#conclusion)
+* [Conclusion](#conclusion)
+* [Useful test approaches](#tests)
* [Additional resources](#links)
---
@@ -41,23 +42,6 @@ Our previous server configuration:
When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
----
-### One year study of throughput
-
-Between Febraury 2023 and February 2024 the BFI encoded **1020 DPX sequences** to FFV1 Matroska. A Python script was written to capture certain data about these encoded files, including sequence pixel size, colourspace, bits, and byte size of the image sequence and completed FFV1 Matroska. We captured the following data:
-
-From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
-* 140 were 2K or smaller / 880 were 4K
-* 222 were Luma Y / 798 were RGB
-* 143 were 10-bit / 279 12-bit / 598 16-bit
-* The largest reduction in size of any FFV1 from the DPX was 88%
-* The smallest reduction was just .3%
-* The largest reductions were from sequences both 10/12-bit, with RGB colorspace that had black and white filters applied
-* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
-* Across all 1020 encoded sequences the average size reduction was 71%
-
-A small group of sequences had their total RAWcooked encoding time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
-
---
# Workflow
### Image sequence assessment
@@ -73,7 +57,6 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
| 1.3TB reduces to 1TB | 1.0TB may only reduce to 1TB |
----
### Muxing the image sequence
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be varified as bit-identical to the original source sequence.
@@ -103,7 +86,7 @@ cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gap
We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with a encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodes, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
----
+
### Encoding log assessment
The encoding logs are critical for the automated assessment of the encoding process. Each log consists of four blocks of data:
@@ -126,7 +109,7 @@ There is one error message that triggers a specific type of re-encode:
For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encode is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
----
+
### FFV1 Matroska validation
When the logs have been assessed and the message ```Reversibility was checked, no issue detected``` was found, then the FFV1 Matroska has metadata validation using the [BFI's MediaConch policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_mkv_policy.xml). This policy ensures that the FFV1 Matroska is whole by looking for duration field entries, checks for reversibility data, and that the correct FFV1 and Matroska formats are being used. It also ensures that all the FFV1 error detection features are present, that slices are included, bit rate is over 300 and pixel aspect ratio is 1.000.
@@ -136,7 +119,7 @@ If the policy passes then the FFV1 Matroska is moved onto the final stage, where
Again the stderr and strout messages are captured to a log, and this log is checked for the message ```Reversibility was checked, no issues detected.``` When this check completes the FFV1 Matroska is moved to our Digital Preservation Infrastructure and the original image sequence is deleted under automation.
----
+
### FFV1 Matroska decode to image sequence
We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select decode when an FFV1 Matroska is supplied.
@@ -146,13 +129,28 @@ This simple script runs this command:
rawcooked -y --all path/sequence_name.dpx -o path/decode_sequence >> path/sequence_name.txt 2>&1
```
It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
-
+
---
# Conclusion
-### Conclusion & some helpful test approaches
-
-We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our Unlocking Film Heritage projet. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
+We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
+
+In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths. Between Febraury 2023 and February 2024 the BFI collected data about it's business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
+
+From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
+* 140 were 2K or smaller / 880 were 4K
+* 222 were Luma Y / 798 were RGB
+* 143 were 10-bit / 279 12-bit / 598 16-bit
+* The largest reduction in size of any FFV1 from the DPX was 88%
+* The smallest reduction was just 0.3%
+* The largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied
+* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
+* Across all 1020 encoded sequences the average size of the finished FFV1 was 71% of the original image sequence
+
+A small group of sequences had their total RAWcooked encoding time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+
+### Some useful test approaches
+
When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
When we encounter an error there are a few commands I use that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
From b07018877dfee75fdf88e88fc1751453766232a6 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:50:24 +0000
Subject: [PATCH 31/93] Update Case_study.md
Link to MediaArea
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 6e89197c..a3bbccb5 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -46,9 +46,9 @@ When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps)
# Workflow
### Image sequence assessment
-For each image sequence processed the metadata of the first DPX or TIFF is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using Media Area's MediaInfo software and capture the output into script variables.
+For each image sequence processed the metadata of the first DPX or TIFF is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using [Media Area's MediaInfo software](https://mediaarea.net/MediaInfo) and capture the output into script variables.
-Next the first file within the image sequence is checked against a Media Area MediaConch policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion, or possible anomalies in the DPX resulting.
+Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion, or possible anomalies in the DPX resulting.
The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the encode based on previous encoding experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB.
From 070657246524d8a6bd34e98b2a4123f48b09e29e Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:51:09 +0000
Subject: [PATCH 32/93] Update Case_study.md
---
Doc/Case_study.md | 1 -
1 file changed, 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index a3bbccb5..bc2814c5 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -7,7 +7,6 @@ At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we ha
This case study is broken into the following sections:
* [Server configuration](#server_config)
-* [One year study of throughput](#findings)
* [Workflow: Image sequence assessment](#assessment)
* [Workflow: Encoding the image sequence](#muxing)
* [Workflow: Encoding log assessments](#log_assessment)
From 1c1d2e75595fd644786e0f6208c964ba2e466d07 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:52:47 +0000
Subject: [PATCH 33/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index bc2814c5..3e843af9 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -39,7 +39,7 @@ Our previous server configuration:
- 8 threads
- Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans it's generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans is generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
---
# Workflow
@@ -134,7 +134,7 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
-In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths. Between Febraury 2023 and February 2024 the BFI collected data about it's business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
+In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 140 were 2K or smaller / 880 were 4K
From e5f50c9cb017bfcd6dc035f77c5eed42a9820ceb Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:57:15 +0000
Subject: [PATCH 34/93] Update Case_study.md
---
Doc/Case_study.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 3e843af9..2df294f1 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -56,7 +56,7 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
| 1.3TB reduces to 1TB | 1.0TB may only reduce to 1TB |
-### Muxing the image sequence
+### Encoding the image sequence
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be varified as bit-identical to the original source sequence.
@@ -148,19 +148,19 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
A small group of sequences had their total RAWcooked encoding time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
-### Some useful test approaches
+### Useful test approaches
-When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generating and saving to a manifest. These manifests are then ```diff``` checked to ensure that every single image file is identical.
+When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generated for every and saved into one manifest for the source and one for the decoded version. These manifests are then ```diff``` checked to ensure that every single image file is identical.
When we encounter an error there are a few commands I use that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
```
rawcooked -d -y -all --accept-gaps
```
-Adding the ```-d``` flag doesn't run the encoding, but returns the command sent to FFmpeg. This flag also leaves the reversibility data available to view as a text file and this is useful for finding errors.
+Adding the ```-d``` flag doesn't run the encoding but returns the command that would be sent to FFmpeg. This flag also leaves the reversibility data available as a text file and this is useful for sending to Media Area to help with finding errors.
```
head -c 1048576 sequence_name.mkv > dump_file.mkv
```
-This command uses UNIX ```head``` command to cut the first 1MB of data from a supplied file, copying it to a new file which is easier to forward to Media Area for review. This contains the file's header data, often requested when a problem has occurred.
+This command uses Linux ```head``` command to cut the first 1MB of data from a supplied file, copying it to a new file which is easier to forward to Media Area for review. This contains the file's header data, often requested when a problem has occurred.
```
echo $?
```
@@ -171,7 +171,7 @@ The results of these three enquiries is always a brilliant way to open an Issue
## Additional resources
-* [RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
+* [RAWcooked GitHub page](https://github.com/Media Area/RAWcooked)
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
* [Further conference presentations about BFI National Archive use of RAWcooked](https://github.com/MediaArea/RAWcooked/issues)
From 152b3340adcfbf2464fc5f443045f527d4c2347c Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 17:57:53 +0000
Subject: [PATCH 35/93] Update Case_study.md
---
Doc/Case_study.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 2df294f1..a0139dba 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,7 +1,5 @@
-# RAWcooked Case Study
-
-**BFI National Archive**
-**Joanna White, Knowledge & Collections Developer**
+# BFI National Archive RAWcooked Case Study
+**by Joanna White, Knowledge & Collections Developer**
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our encoding processes do not include any alpha channels or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska dependent upon your licence.
From 70ad59f8489813cbee6159f008063f9a3168495e Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:04:05 +0000
Subject: [PATCH 36/93] Update Case_study.md
Spell checking, review.
---
Doc/Case_study.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index a0139dba..626f81d2 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -130,9 +130,9 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
# Conclusion
-We began using RAWcooked to convert 3PB of 2K image sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space. Our workflows run 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals a different image sequence 'flavour' that we do not have in our licence, but sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up will indicate repeated problems.
+We began using RAWcooked to convert 3PB of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which is estimated to be around £45,000 savings in tape media for this collection. Undoubtedly this software offers amazing financial and preservation incentives, while also making a viewable video file of an otherwise invisible DPX scan. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals some permission issues from different scan suppliers, a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
-In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
+In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 140 were 2K or smaller / 880 were 4K
@@ -144,7 +144,7 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
* Across all 1020 encoded sequences the average size of the finished FFV1 was 71% of the original image sequence
-A small group of sequences had their total RAWcooked encoding time recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations were between 5 and 10 minutes. The fastest encodes took just 7 hours with some taking 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequebces. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
### Useful test approaches
@@ -165,10 +165,10 @@ echo $?
This command should be run directly after a failed RAWcooked encode, and it will tell you the exit code returned from that terminated run.
The results of these three enquiries is always a brilliant way to open an Issue enquiry for Media Area and will help ensure swift diagnose for your problem. It may also be necessary to supply a DPX sequence, and your ```head``` command can be used again to extract the header data.
-
-
+
+
## Additional resources
-
+
* [RAWcooked GitHub page](https://github.com/Media Area/RAWcooked)
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
From e18e266c069f8a509acd597b27c47a310f4369e6 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:10:56 +0000
Subject: [PATCH 37/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 626f81d2..2c0dc84c 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,7 +1,7 @@
# BFI National Archive RAWcooked Case Study
**by Joanna White, Knowledge & Collections Developer**
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K, 4K, RGB, Luma, DPX and Tiff image sequences. This workflow is built on some of the flags developed by the Media Area and written in a mix of BASH shell scripts and Python3 scripts and is available to view from the [BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding). In addition to our RAWcooked use I will also consider how we use other Media Area tools alongside RAWcooked to complete necessary stages of this workflow. Our encoding processes do not include any alpha channels or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska dependent upon your licence.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K and 4K image sequences. This workflow is built on some of the flags developed in RAWcooked by Media Area and written in a mix of Bash shell and Python3 scripts ([BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding)). In addition to RAWcooked we use other Media Area tools to complete necessary stages of this workflow. Our encoding processes do not include any alpha channel or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska.
This case study is broken into the following sections:
* [Server configuration](#server_config)
From de03f1694878244cff5533dba97dc58e8a36c46e Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:13:56 +0000
Subject: [PATCH 38/93] Update Case_study.md
---
Doc/Case_study.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 2c0dc84c..591d0c9f 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -4,7 +4,7 @@
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K and 4K image sequences. This workflow is built on some of the flags developed in RAWcooked by Media Area and written in a mix of Bash shell and Python3 scripts ([BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding)). In addition to RAWcooked we use other Media Area tools to complete necessary stages of this workflow. Our encoding processes do not include any alpha channel or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska.
This case study is broken into the following sections:
-* [Server configuration](#server_config)
+* [Server configurations](#server_config)
* [Workflow: Image sequence assessment](#assessment)
* [Workflow: Encoding the image sequence](#muxing)
* [Workflow: Encoding log assessments](#log_assessment)
@@ -17,11 +17,11 @@ This case study is broken into the following sections:
---
### Server configurations
-To encode our DPX and TIFF sequences we have a single server that completes this work for all our different NAS storage paths in parallel.
+To encode our DPX sequences we have a single server that completes this work against 6 different Network Accessed Storage (NAS) in parallel.
-Our current configuration:
+Our current server configuration:
- Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
-- 252 GB RAM
+- 252GB RAM
- 32-core with 64 CPU threads
- Ubuntu 20.04 LTS
- 40Gbps Network card
@@ -37,13 +37,13 @@ Our previous server configuration:
- 8 threads
- Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps) from FFmpeg, 4K scans is generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps), but 4K scans are generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
---
# Workflow
### Image sequence assessment
-For each image sequence processed the metadata of the first DPX or TIFF is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using [Media Area's MediaInfo software](https://mediaarea.net/MediaInfo) and capture the output into script variables.
+For each image sequence processed the metadata of the first DPX is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using [Media Area's MediaInfo software](https://mediaarea.net/MediaInfo) and capture the output into script variables.
Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion, or possible anomalies in the DPX resulting.
From 2f071ede206c6eb0f35421607840359d64db5b66 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:17:58 +0000
Subject: [PATCH 39/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 591d0c9f..5dce19bb 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -130,7 +130,7 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
# Conclusion
-We began using RAWcooked to convert 3PB of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which is estimated to be around £45,000 savings in tape media for this collection. Undoubtedly this software offers amazing financial and preservation incentives, while also making a viewable video file of an otherwise invisible DPX scan. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this signals some permission issues from different scan suppliers, a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which is approximately £45,000 saved. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From d3726839e55b78915b6339b217dac42f6ff034ca Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:18:28 +0000
Subject: [PATCH 40/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 5dce19bb..5185d145 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -40,7 +40,7 @@ Our previous server configuration:
When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps), but 4K scans are generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
---
-# Workflow
+## Workflow
### Image sequence assessment
For each image sequence processed the metadata of the first DPX is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using [Media Area's MediaInfo software](https://mediaarea.net/MediaInfo) and capture the output into script variables.
@@ -128,7 +128,7 @@ rawcooked -y --all path/sequence_name.dpx -o path/decode_sequence >> path/sequen
It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
---
-# Conclusion
+## Conclusion
We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which is approximately £45,000 saved. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
From dc6d9a80f16bd862238bfc918d912a8d1a42038e Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:19:32 +0000
Subject: [PATCH 41/93] Update Case_study.md
Fix link
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 5185d145..34221ebf 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -169,7 +169,7 @@ The results of these three enquiries is always a brilliant way to open an Issue
## Additional resources
-* [RAWcooked GitHub page](https://github.com/Media Area/RAWcooked)
+* [RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
* [Further conference presentations about BFI National Archive use of RAWcooked](https://github.com/MediaArea/RAWcooked/issues)
From 182f3b1e0d7c518c4b37ba0ef85b3583c1174a4a Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Mon, 19 Feb 2024 18:21:12 +0000
Subject: [PATCH 42/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 34221ebf..e7e912b0 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -172,7 +172,7 @@ The results of these three enquiries is always a brilliant way to open an Issue
* [RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
-* [Further conference presentations about BFI National Archive use of RAWcooked](https://github.com/MediaArea/RAWcooked/issues)
+* [Further conference presentations about BFI National Archive use of RAWcooked](https://youtu.be/4cG5RL_CZqg?si=w-iEICSfXqBco5NB)
* [DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
* [RAWCooking With Gas: A Film Digitization and QC Workflow-in-progress by Genevieve Havemeyer-King](https://youtu.be/-cJxq7Vr3Nk?si=BjPWzsZ7LRKMVZNF)
From 05cfc86c9e5c67bc33e90a926e3f33637dfc4e04 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 09:44:59 +0000
Subject: [PATCH 43/93] Update Case_study.md
Spelling error
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index e7e912b0..76d6a9eb 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -27,7 +27,7 @@ Our current server configuration:
- 40Gbps Network card
- NAS storage on 40GB network
-The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our congiguration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
+The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our configuration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
Our previous server configuration:
- Virtual Machine of a NAS storage device
From 52a1dd9a0e0eba3ebe4ef3afa3f095c65860cc5a Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:04:17 +0000
Subject: [PATCH 44/93] Update Case_study.md
Re read
---
Doc/Case_study.md | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 76d6a9eb..4ed33bc8 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -45,9 +45,9 @@ When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps)
For each image sequence processed the metadata of the first DPX is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using [Media Area's MediaInfo software](https://mediaarea.net/MediaInfo) and capture the output into script variables.
-Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion, or possible anomalies in the DPX resulting.
+Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX.
-The pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the encode based on previous encoding experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions will take place. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB.
+The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.
| RAWcooked 2K RGB | RAWcooked Luma & RAWcooked 4K |
| -------------------- | ----------------------------- |
@@ -56,7 +56,7 @@ The pixel size and colourspace of the sequence are used to calculate the potenti
### Encoding the image sequence
-To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be varified as bit-identical to the original source sequence.
+To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.
Our RAWcooked encode command:
```
@@ -67,54 +67,55 @@ rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/seque
| ---------------------- | ------------------------------------------ |
| ```rawcooked``` | Calls the software |
| ```-y``` | Answers 'yes' to software questions |
-| ```-all``` | Preservation command with checksums |
+| ```-all``` | Preservation command with CRC-32 hashes |
| ```--no-accept-gaps``` | Exit with warning if sequence gaps found |
+| | --all command defaults to accepting gaps |
| ```-s 5281680``` | Set max attachment size to 5MB |
| ```-o``` | Use output path for FFV1 MKV |
| ```>>``` | Capture console output to text file |
| ```2>&1``` | stderr and stdout messages captured in log |
-This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodes in parallel. This software makes it very simple to run a number of encodes specified by the ```--job``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work to save time. If not run in parallel a computer will usually process jobs one after another. As well as parallelisation, FFmpeg usinges multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (from 64 slices per RAWcooked frame) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so four our server that 64 separate tasks per thread, one slice per frame of the FFV1 file.
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodes at the same time. This software makes it very simple to fix a specific number of encodes specified by the ```--jobs``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work to save time. If not run in parallel a computer will usually process jobs serially, one after another. As well as parallelisation, FFmpeg usinges multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (64 slice minimum in RAWcooked frame) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so for our server that works as 64 separate tasks per thread, one slice per frame of the FFV1 file.
By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel encodes:
```
cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
-We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with a encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and data of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodes, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation metadata collection for an image sequence.
+We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and valuable metadata of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation.
### Encoding log assessment
The encoding logs are critical for the automated assessment of the encoding process. Each log consists of four blocks of data:
* The RAWcooked assessment of the sequence
-* The FFmpeg encoding data
+* The FFmpeg console output with encoding data
* The post-encoding RAWcooked assessment of the FFV1 Matroska
-* Text review of the success of the encoded sequence
+* Text review of the success/failures of the encoded sequence
-The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg encoding data contains sequence and FFV1 metadata, along with choices made by the software for the encoding process and logs of the fps for the encoding of the sequence. All this information is really important when there's an issue with the encoding. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you can be given different types of human readable message including:
+The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg encoding data contains sequence and FFV1 metadata, along with choices made by the software for the encoding process and logs of the fps for the encoding of the sequence. All this information is really important when there's an issue with the encoding. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you will see different types of human readable message including:
* Warnings about the image sequence files
* Errors experienced during encoding
-* Information about the RAWcooked encode (RAWcooked version, if checksum hashes included)
+* Information about the RAWcooked encode (RAWcooked version, if checksum hashes are included)
* Completion success or failure statement
-The automation scripts used a the the BFI National Archive largely ignore the warning messages, but look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
+The automation scripts used at the BFI National Archive look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
```Reversibility was checked, no issues detected.```
There is one error message that triggers a specific type of re-encode:
```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encode is completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
+For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding has completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
### FFV1 Matroska validation
-When the logs have been assessed and the message ```Reversibility was checked, no issue detected``` was found, then the FFV1 Matroska has metadata validation using the [BFI's MediaConch policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_mkv_policy.xml). This policy ensures that the FFV1 Matroska is whole by looking for duration field entries, checks for reversibility data, and that the correct FFV1 and Matroska formats are being used. It also ensures that all the FFV1 error detection features are present, that slices are included, bit rate is over 300 and pixel aspect ratio is 1.000.
+When the logs have been assessed and the message ```Reversibility was checked, no issue detected``` is found, then the FFV1 Matroska is compared against the [BFI's MediaConch policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_mkv_policy.xml). This policy ensures that the FFV1 Matroska is whole by looking for duration field entries, checks for reversibility data, and that the correct FFV1 and Matroska formats are being used. It also ensures that all the FFV1 error detection features are present, that slices are included, bit rate is over 300 and pixel aspect ratio is 1.000.
If the policy passes then the FFV1 Matroska is moved onto the final stage, where the RAWcooked flag ```--check``` is used to ensure that the FFV1 Matroska is correctly formed.
```rawcooked --check path/sequence_name.mkv >> path/sequence_name.mkv.txt 2>&1```
-Again the stderr and strout messages are captured to a log, and this log is checked for the message ```Reversibility was checked, no issues detected.``` When this check completes the FFV1 Matroska is moved to our Digital Preservation Infrastructure and the original image sequence is deleted under automation.
+Again the stderr and stdout messages are captured to a log, and this log is checked for the same confirmation message ```Reversibility was checked, no issues detected.``` When this check completes the FFV1 Matroska is moved to our Digital Preservation Infrastructure and the original DPX sequence is deleted under automation.
### FFV1 Matroska decode to image sequence
@@ -148,9 +149,11 @@ A small group of sequences had their RAWcooked encoding times recorded, revealin
### Useful test approaches
-When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generated for every and saved into one manifest for the source and one for the decoded version. These manifests are then ```diff``` checked to ensure that every single image file is identical.
+When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generated for every and saved into one manifest for the source and one for the decoded version. These manifests are then ```diff``` checked to ensure that every single image file is identical.
-When we encounter an error there are a few commands I use that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
+To have confidence in the --check feature, which confirms for us a DPX sequence can be deleted, we ran several --check command tests that included editing test FFV1 Matroska metadata using hexeditor software, and altering test DPX files in the same way while partially encoded. The encoding/check features always identified these data breakages correctly which helped build our confidence in the --all and --check flags.
+
+When we encounter an error there are a few commands used that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
```
rawcooked -d -y -all --accept-gaps
```
From 1874ef04592ccbcbe1c6ce29061b86590f3c6ba3 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:23:37 +0000
Subject: [PATCH 45/93] Update Case_study.md
Examples from logs
---
Doc/Case_study.md | 64 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 4ed33bc8..d6bcc3e8 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -82,17 +82,79 @@ By listing all the image sequence paths in one text file you can launch a parall
cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
-We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and valuable metadata of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation.
+We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and valuable metadata of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We would definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation of their RAWcooked sequences.
### Encoding log assessment
The encoding logs are critical for the automated assessment of the encoding process. Each log consists of four blocks of data:
* The RAWcooked assessment of the sequence
+```
+Analyzing files (0%)
+Analyzing files (0.01%), 1 files/s
+Analyzing files (0.02%), 1 files/s
+Analyzing files (0.03%), 1 files/s
+Analyzing files (0.04%), 1 files/s
+Analyzing files (0.05%), 1 files/s
+Analyzing files (0.06%), 1 files/s
+Analyzing files (0.07%), 1 files/s
+...
+```
* The FFmpeg console output with encoding data
+```
+Track 1:
+ Scan01/2150x1820/%08d.dpx
+ (00000000 --> 00033766)
+ DPX/Raw/Y/16bit/U/BE
+
+Attachments:
+ Scan01/N_9623089_01of04_00000000.dpx_metadata.txt
+ Scan01/N_9623089_01of04_directory_contents.txt
+ Scan01/N_9623089_01of04_directory_total_byte_size.txt
+
+ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
+ built with gcc 11 (Ubuntu 11.4.0-1ubuntu1~22.04)
+ configuration: --prefix=/home/linuxbrew/.linuxbrew/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=gcc-11 --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack
+ libavutil 58. 2.100 / 58. 2.100
+ libavcodec 60. 3.100 / 60. 3.100
+ libavformat 60. 3.100 / 60. 3.100
+ libavdevice 60. 1.100 / 60. 1.100
+ libavfilter 9. 3.100 / 9. 3.100
+ libswscale 7. 1.100 / 7. 1.100
+ libswresample 4. 10.100 / 4. 10.100
+ libpostproc 57. 1.100 / 57. 1.100
+Input #0, image2, from 'N_9623089_01of04/Scan01/2150x1820/%08d.dpx':
+ Duration: 00:23:26.96, start: 0.000000, bitrate: N/A
+ Stream #0:0: Video: dpx, gray16be, 2150x1820 [SAR 1:1 DAR 215:182], 24 fps, 24 tbr, 24 tbn
+Stream mapping:
+ Stream #0:0 -> #0:0 (dpx (native) -> ffv1 (native))
+ File N_9623089_01of04/Scan01/N_9623089_01of04_00000000.dpx_metadata.txt -> Stream #0:1
+ File N_9623089_01of04/Scan01/N_9623089_01of04_directory_contents.txt -> Stream #0:2
+ File N_9623089_01of04/Scan01/N_9623089_01of04_directory_total_byte_size.txt -> Stream #0:3
+ File ../encoded/mkv_cooked/N_9623089_01of04.mkv.rawcooked_reversibility_data -> Stream #0:4
+Press [q] to stop, [?] for help
+Output #0, matroska, to '../encoded/mkv_cooked/N_9623089_01of04.mkv':
+```
* The post-encoding RAWcooked assessment of the FFV1 Matroska
+```
+...
+Time=00:23:22 (99.93%), 3.0 MiB/s, 0.03x realtime
+Time=00:23:22 (99.94%), 1.0 MiB/s, 0.04x realtime
+Time=00:23:23 (99.95%), 1.3 MiB/s, 0.05x realtime
+Time=00:23:24 (99.96%), 1.2 MiB/s, 0.04x realtime
+Time=00:23:25 (99.97%), 1.1 MiB/s, 0.05x realtime
+Time=00:23:25 (99.98%), 1.2 MiB/s, 0.04x realtime
+Time=00:23:26 (99.99%), 1.6 MiB/s, 0.04x realtime
+3.3 MiB/s, 0.02x realtime
+```
* Text review of the success/failures of the encoded sequence
+```
+Info: Reversibility data created by RAWcooked 23.12.
+Info: Uncompressed file hashes (used by reversibility check) present.
+Reversibility was checked, no issue detected.
+```
+
The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg encoding data contains sequence and FFV1 metadata, along with choices made by the software for the encoding process and logs of the fps for the encoding of the sequence. All this information is really important when there's an issue with the encoding. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you will see different types of human readable message including:
* Warnings about the image sequence files
* Errors experienced during encoding
From 4db031c3c465ab00695c0f66eefb4854ca7505f6 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:32:29 +0000
Subject: [PATCH 46/93] Update Case_study.md
Log updates
---
Doc/Case_study.md | 30 ++++++++++++++++++++++++++----
1 file changed, 26 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index d6bcc3e8..319375a6 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -154,13 +154,35 @@ Info: Uncompressed file hashes (used by reversibility check) present.
Reversibility was checked, no issue detected.
```
-
-The RAWcooked assessments themselves are lines of repeated data, counting from 0% to 100%. The FFmpeg encoding data contains sequence and FFV1 metadata, along with choices made by the software for the encoding process and logs of the fps for the encoding of the sequence. All this information is really important when there's an issue with the encoding. The final text review is generated by the RAWcooked assessment of the image sequence and the FFV1 Matroska. In this last section you will see different types of human readable message including:
+
+
+If an encoding has completed then in this last section you might see different types of human readable message including:
* Warnings about the image sequence files
* Errors experienced during encoding
-* Information about the RAWcooked encode (RAWcooked version, if checksum hashes are included)
-* Completion success or failure statement
+* Information about the RAWcooked encode (shown above)
+* Completion success or failure statement (shown above)
+Error example:
+```
+Reversibility was checked, issues detected, see below.
+
+Error: undecodable files are not same.
+ N_7192293_01of06/Scan01/2150x1582/00014215.dpx
+ N_7192293_01of06/Scan01/2150x1582/00014216.dpx
+ ...
+Error: undecodable files from output are not same as files from source.
+ N_7192293_01of06/Scan01/2150x1582/00014215.dpx
+ N_7192293_01of06/Scan01/2150x1582/00014216.dpx
+ ...
+At least 1 file is not conform to specifications.
+```
+
+Sometimes an encoding will not even start, and a single error message may be found in your log:
+```
+Error: unsupported DPX (non conforming) alternate end of line non padding
+Please contact info@mediaarea.net if you want support of such content.
+```
+
The automation scripts used at the BFI National Archive look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
```Reversibility was checked, no issues detected.```
From 9033a3d7d2f6527d3c6b673477af8ebb9016d75a Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:33:54 +0000
Subject: [PATCH 47/93] Update Case_study.md
---
Doc/Case_study.md | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 319375a6..3014c06e 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -154,14 +154,14 @@ Info: Uncompressed file hashes (used by reversibility check) present.
Reversibility was checked, no issue detected.
```
-
-
+
+
If an encoding has completed then in this last section you might see different types of human readable message including:
* Warnings about the image sequence files
* Errors experienced during encoding
* Information about the RAWcooked encode (shown above)
* Completion success or failure statement (shown above)
-
+
Error example:
```
Reversibility was checked, issues detected, see below.
@@ -183,13 +183,13 @@ Error: unsupported DPX (non conforming) alternate end of line non padding
Please contact info@mediaarea.net if you want support of such content.
```
-The automation scripts used at the BFI National Archive look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
-```Reversibility was checked, no issues detected.```
+The automation scripts used at the BFI National Archive look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
+```Reversibility was checked, no issues detected.```
-There is one error message that triggers a specific type of re-encode:
-```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-
-For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding has completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
+There is one error message that triggers a specific type of re-encode:
+```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
+
+For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding has completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
### FFV1 Matroska validation
From b959546dd54615d1b77fab89a8dcba9eb1f15a91 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:39:04 +0000
Subject: [PATCH 48/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 3014c06e..32cb0b27 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -215,7 +215,7 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
## Conclusion
-We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which is approximately £45,000 saved. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From 95de52d98225f1bced336f843f54d3af93c6ffbf Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:43:01 +0000
Subject: [PATCH 49/93] Update Case_study.md
---
Doc/Case_study.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 32cb0b27..9287dcdb 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -223,10 +223,10 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 140 were 2K or smaller / 880 were 4K
* 222 were Luma Y / 798 were RGB
* 143 were 10-bit / 279 12-bit / 598 16-bit
-* The largest reduction in size of any FFV1 from the DPX was 88%
-* The smallest reduction was just 0.3%
-* The largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied
-* The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
+* The largest reduction in size of any FFV1 was 88% smaller than the source DPX
+ The largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied
+* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX
+ The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
* Across all 1020 encoded sequences the average size of the finished FFV1 was 71% of the original image sequence
A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequebces. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
From 3ab19529910fef135c1bbb8d21bbc216c290f889 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:44:03 +0000
Subject: [PATCH 50/93] Update Case_study.md
---
Doc/Case_study.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 9287dcdb..8162dc7d 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -223,10 +223,8 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 140 were 2K or smaller / 880 were 4K
* 222 were Luma Y / 798 were RGB
* 143 were 10-bit / 279 12-bit / 598 16-bit
-* The largest reduction in size of any FFV1 was 88% smaller than the source DPX
- The largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied
-* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX
- The smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame
+* The largest reduction in size of any FFV1 was 88% smaller than the source DPX (the largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied)
+* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX (the smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame)
* Across all 1020 encoded sequences the average size of the finished FFV1 was 71% of the original image sequence
A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequebces. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
From 9c8b3e2b6de315924c3a1b3a166c7d2af675b0dd Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:46:11 +0000
Subject: [PATCH 51/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 8162dc7d..1f5d6a0f 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -225,7 +225,7 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 143 were 10-bit / 279 12-bit / 598 16-bit
* The largest reduction in size of any FFV1 was 88% smaller than the source DPX (the largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied)
* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX (the smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame)
-* Across all 1020 encoded sequences the average size of the finished FFV1 was 71% of the original image sequence
+* Across all 1020 encoded sequences the average size of the finished FFV1 was 29% smaller than the source image sequence
A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequebces. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
From affc385853985cec31583488d23a79539eb2715d Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:47:38 +0000
Subject: [PATCH 52/93] Update Case_study.md
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 1f5d6a0f..93e5827d 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -227,13 +227,13 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX (the smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame)
* Across all 1020 encoded sequences the average size of the finished FFV1 was 29% smaller than the source image sequence
-A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequebces. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequences. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
### Useful test approaches
When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generated for every and saved into one manifest for the source and one for the decoded version. These manifests are then ```diff``` checked to ensure that every single image file is identical.
-To have confidence in the --check feature, which confirms for us a DPX sequence can be deleted, we ran several --check command tests that included editing test FFV1 Matroska metadata using hexeditor software, and altering test DPX files in the same way while partially encoded. The encoding/check features always identified these data breakages correctly which helped build our confidence in the --all and --check flags.
+To have confidence in the --check feature, which confirms for us a DPX sequence can be deleted, we ran several --check command tests that included editing test FFV1 Matroska metadata using hex editor software, and altering test DPX files in the same way while partially encoded. The encoding/check features always identified these data breakages correctly which helped build our confidence in the --all and --check flags.
When we encounter an error there are a few commands used that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
```
From a13bbd0644e0dcaa7255d8ad8c37ba5a19dd9d21 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:49:02 +0000
Subject: [PATCH 53/93] Update Case_study.md
---
Doc/Case_study.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 93e5827d..4040d260 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -254,10 +254,10 @@ The results of these three enquiries is always a brilliant way to open an Issue
## Additional resources
-* [RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
-* ['No Time To Wait! 5' presentation about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
-* [RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
-* [Further conference presentations about BFI National Archive use of RAWcooked](https://youtu.be/4cG5RL_CZqg?si=w-iEICSfXqBco5NB)
-* [DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
-* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
+* [Media Area's RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
+* ['No Time To Wait! 5' presentation by Joanna White about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
+* [BFI National Archive RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
+* [Further conference presentations about BFI National Archive use of RAWcooked, by Joanna White](https://youtu.be/4cG5RL_CZqg?si=w-iEICSfXqBco5NB)
+* [BFI National Archive DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [RAWCooking With Gas: A Film Digitization and QC Workflow-in-progress by Genevieve Havemeyer-King](https://youtu.be/-cJxq7Vr3Nk?si=BjPWzsZ7LRKMVZNF)
+* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
From 1f9bce0879e861109236b2062f9f26c50c1ebc74 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Wed, 28 Feb 2024 10:49:58 +0000
Subject: [PATCH 54/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 4040d260..2cd809e9 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -254,10 +254,10 @@ The results of these three enquiries is always a brilliant way to open an Issue
## Additional resources
+* [BFI National Archive DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [Media Area's RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
* ['No Time To Wait! 5' presentation by Joanna White about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
* [BFI National Archive RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
* [Further conference presentations about BFI National Archive use of RAWcooked, by Joanna White](https://youtu.be/4cG5RL_CZqg?si=w-iEICSfXqBco5NB)
-* [BFI National Archive DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [RAWCooking With Gas: A Film Digitization and QC Workflow-in-progress by Genevieve Havemeyer-King](https://youtu.be/-cJxq7Vr3Nk?si=BjPWzsZ7LRKMVZNF)
* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
From 1c1e232c664ef8cdaeaec661f1ad34d72920ad42 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 5 Mar 2024 11:56:10 +0000
Subject: [PATCH 55/93] Update Case_study.md
Add link to NTTW5 vid
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 2cd809e9..cbb52b12 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -256,7 +256,7 @@ The results of these three enquiries is always a brilliant way to open an Issue
* [BFI National Archive DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [Media Area's RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
-* ['No Time To Wait! 5' presentation by Joanna White about the BFI's evolving RAWcooked use](https://www.youtube.com/@MediaAreaNet/streams). Link to follow.
+* ['No Time To Wait! 5' presentation by Joanna White about the BFI's evolving RAWcooked use](https://www.youtube.com/watch?v=Mgo_DKHJEfI)
* [BFI National Archive RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
* [Further conference presentations about BFI National Archive use of RAWcooked, by Joanna White](https://youtu.be/4cG5RL_CZqg?si=w-iEICSfXqBco5NB)
* [RAWCooking With Gas: A Film Digitization and QC Workflow-in-progress by Genevieve Havemeyer-King](https://youtu.be/-cJxq7Vr3Nk?si=BjPWzsZ7LRKMVZNF)
From 53c1d146ddbc030d700db576fd54fc14b501e842 Mon Sep 17 00:00:00 2001
From: Stephen
Date: Wed, 6 Mar 2024 21:40:54 +0000
Subject: [PATCH 56/93] Update Case_study.md
Tiny change in NAS wording in Server COnfig section
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index cbb52b12..2662b429 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -17,7 +17,7 @@ This case study is broken into the following sections:
---
### Server configurations
-To encode our DPX sequences we have a single server that completes this work against 6 different Network Accessed Storage (NAS) in parallel.
+To encode our DPX sequences we have a single server that completes this work against 6 different Network Attached Storage (NAS) devices in parallel.
Our current server configuration:
- Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
From f33ed87566726995a06c5db08d67993c2a2d42f6 Mon Sep 17 00:00:00 2001
From: Stephen
Date: Wed, 6 Mar 2024 21:42:47 +0000
Subject: [PATCH 57/93] Update Case_study.md
Modified 'current server config' detail
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index cbb52b12..d2ad7fad 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -25,7 +25,7 @@ Our current server configuration:
- 32-core with 64 CPU threads
- Ubuntu 20.04 LTS
- 40Gbps Network card
-- NAS storage on 40GB network
+- NAS storage with 40Gbps network card
The more CPU threads you have the better your FFmpeg encode to FFV1 will perform. To calculate the CPU threads for your server you can multiply the Threads x Cores x Sockets. So for our configuration this would be 2 (threads) x 16 (sockets) x 2 (cores) = 64. To retrieve these figures we would use Linux's ```lscpu```.
From d499e964af00da30e674725830c7a7d56097bbb2 Mon Sep 17 00:00:00 2001
From: Stephen
Date: Wed, 6 Mar 2024 21:44:25 +0000
Subject: [PATCH 58/93] Update Case_study.md
Typo fix
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index cbb52b12..83b2436b 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -37,7 +37,7 @@ Our previous server configuration:
- 8 threads
- Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps), but 4K scans are generally 1 fps or less. These figures can be impacted by the quantity of parellel processes running at any one time.
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps), but 4K scans are generally 1 fps or less. These figures can be impacted by the quantity of parallel processes running at any one time.
---
## Workflow
From 2b2d3ca61aaaff1bf06642227e466e832bfbc664 Mon Sep 17 00:00:00 2001
From: Stephen
Date: Wed, 6 Mar 2024 21:49:35 +0000
Subject: [PATCH 59/93] Update Case_study.md
Tiny edit in the COnclusion text
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index cbb52b12..95f0c68b 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -215,7 +215,7 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
## Conclusion
-We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From 7e7b865e97138bb03426c5d5d5c0bca394672d3a Mon Sep 17 00:00:00 2001
From: Stephen
Date: Wed, 6 Mar 2024 21:51:33 +0000
Subject: [PATCH 60/93] Update Case_study.md
Typos only!
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index cbb52b12..93551f3c 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -217,7 +217,7 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to Digital Preservation Infrastructure. Most often this caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
-In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between Febraury 2023 and February 2024 the BFI collected data about its business as usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
+In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between February 2023 and February 2024 the BFI collected data about its business-as-usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* 140 were 2K or smaller / 880 were 4K
From 5caa4ee8465fcf2fe608980f499698b9f062c88a Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 10:46:33 +0000
Subject: [PATCH 61/93] Update Case_study.md
---
Doc/Case_study.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 44a04e52..4c748ed9 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -215,7 +215,7 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
## Conclusion
-We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business as usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between February 2023 and February 2024 the BFI collected data about its business-as-usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
From d7a5939015f309e3a82ae6d7bae51ab1f5aa37de Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 11:16:43 +0000
Subject: [PATCH 62/93] Update User_Manual.md
--all command additions
---
Doc/User_Manual.md | 40 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 38 insertions(+), 2 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 3a64e53d..6a33cd64 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -1,8 +1,43 @@
# RAWcooked User Manual
+## Preservation standard encoding
+
+```
+rawcooked --all /
+```
+
+To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag which concatenates several important flags into one:
+
+| Commands with --all | Description |
+| ------------------------- | ------------------------------------------ |
+| ```--info``` | Supplies useful file information |
+| ```--conch``` | Conformance checks file format where |
+| | supported (partially implemented for DPX) |
+| ```--encode / --decode``` | Selected based on supplied file type |
+| ```--hash``` | Important flag which computes hashes and |
+| | embeds them in reversibility data stored |
+| | in MKV wrapper allowing reversibility test |
+| | assurance when original sequences absent. |
+| ```--coherency``` | Ensures package and content are coherent |
+| | eg, sequence gap checks and audio duration |
+| | matches image sequence duration |
+| ```--check``` | Checks that an encoded file can be decoded |
+| | correctly. Requires hashes to be present |
+| | for checking compressed content. |
+| ```--check_padding``` | Runs padding checks for DPX files that do |
+| | not have zero padding. Ensures additional |
+| | padding data is stored in reversibility |
+| | file for perfect restoration of the DPX |
+| ```--accept-gaps``` | Where gaps in a sequence are found this |
+| | flag ensures the encoding completes |
+| | successfully. If you require that gaps |
+| | are not encoded then follow the ```--all```|
+| | command with ```--no-accept-gaps``` |
+
+
## Encode
-### Folder
+### Simple folder encoding
```
rawcooked
@@ -32,7 +67,7 @@ This behaviour could help to manage different use cases, according to local pref
Note that maximum permitted video tracks is encoded in the `RAWcooked` licence, so users may have to request extended track allowance as required.
-### File
+### Simple file encoding
```
rawcooked
@@ -46,6 +81,7 @@ The file contains RAW data (e.g. it is a .dpx or .wav file). The `RAWcooked` too
The filenames usually end with a numbered sequence. Enter one frame and the tool will generate the regex to parse all the frames in the folder.
+
## Decode
```
From 6061dc298c6c84a8e0ad9b553c3ed5545556a079 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 11:29:41 +0000
Subject: [PATCH 63/93] Update User_Manual.md
Incorporate --all command
---
Doc/User_Manual.md | 39 +++++++++++++++------------------------
1 file changed, 15 insertions(+), 24 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 6a33cd64..0aa4449c 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -1,11 +1,17 @@
# RAWcooked User Manual
-## Preservation standard encoding
+## Encoding
```
rawcooked --all /
```
+When using the `RAWcooked` tool:
+- encodes an image sequence by supplying the folder path to the sequence, or by supplying the path to a media file within your sequence folder
+- encodes with the FFV1 video codec all single-image files or video files in the folder path/folder containing the file
+- encodes with the FLAC audio codec all audio files in the folder
+- muxes these into a Matroska container (.mkv)
+
To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag which concatenates several important flags into one:
| Commands with --all | Description |
@@ -27,27 +33,24 @@ To encode your sequences using the best preservation flags within RAWcooked then
| ```--check_padding``` | Runs padding checks for DPX files that do |
| | not have zero padding. Ensures additional |
| | padding data is stored in reversibility |
-| | file for perfect restoration of the DPX |
+| | file for perfect restoration of the DPX. |
+| | Can be time consuming. |
| ```--accept-gaps``` | Where gaps in a sequence are found this |
| | flag ensures the encoding completes |
| | successfully. If you require that gaps |
| | are not encoded then follow the ```--all```|
| | command with ```--no-accept-gaps``` |
-
-## Encode
-
-### Simple folder encoding
-
+If you do not require all of these flags you can build your own command with just the flags you prefer, for exmaple:
```
-rawcooked
+rawcooked --info --conch --encode --hash --check --no-accept-gap /
```
-The `RAWcooked` tool
+For more information about all the available flags in RAWcooked please visit the help page:
+```
+rawcooked --help / -h
+```
-- encodes with the FFV1 video codec all single-image files in the folder
-- encodes with the FLAC audio codec all audio files in the folder
-- muxes these into a Matroska container (.mkv)
The filenames of the single-image files must end with a numbered sequence. `RAWcooked` will generate the regular expression (regex) to parse in the correct order all the frames in the folder.
@@ -67,19 +70,7 @@ This behaviour could help to manage different use cases, according to local pref
Note that maximum permitted video tracks is encoded in the `RAWcooked` licence, so users may have to request extended track allowance as required.
-### Simple file encoding
-
-```
-rawcooked
-```
-
-The file contains RAW data (e.g. it is a .dpx or .wav file). The `RAWcooked` tool
-
-- encodes with the FFV1 video codec all single-image video files in the folder containing the given file
-- encodes with the FLAC audio codec all audio files in the folder containing the given file
-- muxes these into a Matroska container (.mkv).
-The filenames usually end with a numbered sequence. Enter one frame and the tool will generate the regex to parse all the frames in the folder.
## Decode
From 0f4e63d11809f2911c8a66aa0bb81cf4fa498d6b Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 11:31:04 +0000
Subject: [PATCH 64/93] Update User_Manual.md
---
Doc/User_Manual.md | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 0aa4449c..87dbc261 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -17,13 +17,9 @@ To encode your sequences using the best preservation flags within RAWcooked then
| Commands with --all | Description |
| ------------------------- | ------------------------------------------ |
| ```--info``` | Supplies useful file information |
-| ```--conch``` | Conformance checks file format where |
-| | supported (partially implemented for DPX) |
+| ```--conch``` | Conformance checks file format where supported (partially implemented for DPX) |
| ```--encode / --decode``` | Selected based on supplied file type |
-| ```--hash``` | Important flag which computes hashes and |
-| | embeds them in reversibility data stored |
-| | in MKV wrapper allowing reversibility test |
-| | assurance when original sequences absent. |
+| ```--hash``` | Important flag which computes hashes and embeds them in reversibility data stored in MKV wrapper allowing reversibility test assurance when original sequences absent. |
| ```--coherency``` | Ensures package and content are coherent |
| | eg, sequence gap checks and audio duration |
| | matches image sequence duration |
From 7490c52d4087a0b8d9f95f1dcea41d9a0b15ce18 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 11:34:38 +0000
Subject: [PATCH 65/93] Update User_Manual.md
---
Doc/User_Manual.md | 28 +++++++++-------------------
1 file changed, 9 insertions(+), 19 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 87dbc261..5ce5b0b5 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -17,25 +17,15 @@ To encode your sequences using the best preservation flags within RAWcooked then
| Commands with --all | Description |
| ------------------------- | ------------------------------------------ |
| ```--info``` | Supplies useful file information |
-| ```--conch``` | Conformance checks file format where supported (partially implemented for DPX) |
-| ```--encode / --decode``` | Selected based on supplied file type |
-| ```--hash``` | Important flag which computes hashes and embeds them in reversibility data stored in MKV wrapper allowing reversibility test assurance when original sequences absent. |
-| ```--coherency``` | Ensures package and content are coherent |
-| | eg, sequence gap checks and audio duration |
-| | matches image sequence duration |
-| ```--check``` | Checks that an encoded file can be decoded |
-| | correctly. Requires hashes to be present |
-| | for checking compressed content. |
-| ```--check_padding``` | Runs padding checks for DPX files that do |
-| | not have zero padding. Ensures additional |
-| | padding data is stored in reversibility |
-| | file for perfect restoration of the DPX. |
-| | Can be time consuming. |
-| ```--accept-gaps``` | Where gaps in a sequence are found this |
-| | flag ensures the encoding completes |
-| | successfully. If you require that gaps |
-| | are not encoded then follow the ```--all```|
-| | command with ```--no-accept-gaps``` |
+| ```--conch``` | Conformance checks file format where supported (partially implemented for DPX) |
+| ```--encode``` | Select encode when an image sequence path is supplied |
+| ```--decode``` | Select decode when an FFV1 Matroska file is supplied |
+| ```--hash``` | Important flag which computes hashes and embeds them in reversibility data stored in MKV wrapper allowing reversibility test assurance when original sequences absent |
+| ```--coherency``` | Ensures package and content are coherent. Eg, sequence gap checks and audio duration matches image sequence duration |
+| ```--check``` | Checks that an encoded file can be decoded correctly. Requires hashes to be present for checking compressed content |
+| ```--check_padding``` | Runs padding checks for DPX files that do not have zero padding. Ensures additional padding data is stored in reversibility |
+| | file for perfect restoration of the DPX. Can be time consuming |
+| ```--accept-gaps``` | Where gaps in a sequence are found this flag ensures the encoding completes successfully. If you require that gaps are not encoded then follow the ```--all``` command with ```--no-accept-gaps``` |
If you do not require all of these flags you can build your own command with just the flags you prefer, for exmaple:
```
From 392871790fc39bd5bbf6be0604a20d613c1ea622 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 11:43:37 +0000
Subject: [PATCH 66/93] Update User_Manual.md
---
Doc/User_Manual.md | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 5ce5b0b5..3cc7d162 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -6,7 +6,7 @@
rawcooked --all /
```
-When using the `RAWcooked` tool:
+Using the `RAWcooked` tool:
- encodes an image sequence by supplying the folder path to the sequence, or by supplying the path to a media file within your sequence folder
- encodes with the FFV1 video codec all single-image files or video files in the folder path/folder containing the file
- encodes with the FLAC audio codec all audio files in the folder
@@ -14,7 +14,7 @@ When using the `RAWcooked` tool:
To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag which concatenates several important flags into one:
-| Commands with --all | Description |
+| Flags | Description |
| ------------------------- | ------------------------------------------ |
| ```--info``` | Supplies useful file information |
| ```--conch``` | Conformance checks file format where supported (partially implemented for DPX) |
@@ -22,7 +22,7 @@ To encode your sequences using the best preservation flags within RAWcooked then
| ```--decode``` | Select decode when an FFV1 Matroska file is supplied |
| ```--hash``` | Important flag which computes hashes and embeds them in reversibility data stored in MKV wrapper allowing reversibility test assurance when original sequences absent |
| ```--coherency``` | Ensures package and content are coherent. Eg, sequence gap checks and audio duration matches image sequence duration |
-| ```--check``` | Checks that an encoded file can be decoded correctly. Requires hashes to be present for checking compressed content |
+| ```--check``` | Checks that an encoded file can be decoded correctly. If input is raw content, after encoding it checks that output would be same as the input content. Whereas if input is compressed content, it checksthat output would be same as the original content where hashes are present |
| ```--check_padding``` | Runs padding checks for DPX files that do not have zero padding. Ensures additional padding data is stored in reversibility |
| | file for perfect restoration of the DPX. Can be time consuming |
| ```--accept-gaps``` | Where gaps in a sequence are found this flag ensures the encoding completes successfully. If you require that gaps are not encoded then follow the ```--all``` command with ```--no-accept-gaps``` |
@@ -37,6 +37,7 @@ For more information about all the available flags in RAWcooked please visit the
rawcooked --help / -h
```
+### For successful encoding
The filenames of the single-image files must end with a numbered sequence. `RAWcooked` will generate the regular expression (regex) to parse in the correct order all the frames in the folder.
@@ -57,12 +58,12 @@ This behaviour could help to manage different use cases, according to local pref
Note that maximum permitted video tracks is encoded in the `RAWcooked` licence, so users may have to request extended track allowance as required.
-
-
## Decode
```
-rawcooked
+rawcooked --all
```
The file is a Matroska container (.mkv). The `RAWcooked` tool decodes back the video and the audio of file to its original formats. All metadata accompanying the original data are preserved **bit-by-bit**.
+
+For the best decoding experience you should always ensure you encode with the ```--all``` command which includes hashes within the reversibility data of the encoded Matroska file. This ensures the the decoded files can be compared to the original source file hashes, ensuring bit perfect reversibility.
From 6048c8034e8b84896ac0e86d265385c6d0272daa Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 11:55:22 +0000
Subject: [PATCH 67/93] Update User_Manual.md
---
Doc/User_Manual.md | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 3cc7d162..b3b27664 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -11,6 +11,7 @@ Using the `RAWcooked` tool:
- encodes with the FFV1 video codec all single-image files or video files in the folder path/folder containing the file
- encodes with the FLAC audio codec all audio files in the folder
- muxes these into a Matroska container (.mkv)
+- uses FFmpeg for this encoding process
To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag which concatenates several important flags into one:
@@ -57,6 +58,11 @@ This behaviour could help to manage different use cases, according to local pref
Note that maximum permitted video tracks is encoded in the `RAWcooked` licence, so users may have to request extended track allowance as required.
+If your encodings do not succeed and you receive these messages, then you will need to encode your image sequence with the additional flag ```--output-version 2```:
+```
+Error: the reversibility file is becoming big | Error: undecodable file is becoming too big
+```
+This is caused by padding data that is not zeros and which must be written into your reversibility data file attachment for restoration to the DPX images when decoded. As this data can exceed FFmpeg's maximum attachment size limit of 1GB, this flag appends the attachment to the FFV1 Matroska file after encoding has completed. This feature is not backward compatible with `RAWcooked` software before version 21.09.
## Decode
@@ -64,6 +70,10 @@ Note that maximum permitted video tracks is encoded in the `RAWcooked` licence,
rawcooked --all
```
-The file is a Matroska container (.mkv). The `RAWcooked` tool decodes back the video and the audio of file to its original formats. All metadata accompanying the original data are preserved **bit-by-bit**.
+The file supplied must be a Matroska container (.mkv) created by the `RAWcooked` software. The `RAWcooked` tool decodes the video, audio and any attachments within the file to its original format. All metadata accompanying the original data are preserved **bit-by-bit**.
+
+### For successful decoding
For the best decoding experience you should always ensure you encode with the ```--all``` command which includes hashes within the reversibility data of the encoded Matroska file. This ensures the the decoded files can be compared to the original source file hashes, ensuring bit perfect reversibility.
+
+
From 3e0d770401866b3d514d294e1d35a34a84ab1d1b Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:25:11 +0000
Subject: [PATCH 68/93] Update User_Manual.md
---
Doc/User_Manual.md | 50 ++++++++++++++++++++++++++++++++++++----------
1 file changed, 40 insertions(+), 10 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index b3b27664..d5a3a100 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -1,7 +1,8 @@
# RAWcooked User Manual
## Encoding
-
+
+
```
rawcooked --all /
```
@@ -12,8 +13,9 @@ Using the `RAWcooked` tool:
- encodes with the FLAC audio codec all audio files in the folder
- muxes these into a Matroska container (.mkv)
- uses FFmpeg for this encoding process
-
+
To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag which concatenates several important flags into one:
+
| Flags | Description |
| ------------------------- | ------------------------------------------ |
@@ -27,43 +29,47 @@ To encode your sequences using the best preservation flags within RAWcooked then
| ```--check_padding``` | Runs padding checks for DPX files that do not have zero padding. Ensures additional padding data is stored in reversibility |
| | file for perfect restoration of the DPX. Can be time consuming |
| ```--accept-gaps``` | Where gaps in a sequence are found this flag ensures the encoding completes successfully. If you require that gaps are not encoded then follow the ```--all``` command with ```--no-accept-gaps``` |
+
If you do not require all of these flags you can build your own command with just the flags you prefer, for exmaple:
```
rawcooked --info --conch --encode --hash --check --no-accept-gap /
```
-
+
For more information about all the available flags in RAWcooked please visit the help page:
```
rawcooked --help / -h
```
+
### For successful encoding
-The filenames of the single-image files must end with a numbered sequence. `RAWcooked` will generate the regular expression (regex) to parse in the correct order all the frames in the folder.
+The filenames of the single-image files must end with a numbered sequence. `RAWcooked` will generate the regular expression (regex) to parse in the correct order of all of the frames in the folder.
The number sequence within the filename can be formed with leading zero padding - e.g. 000001.dpx, 000002.dpx... 500000.dpx - or no leading zero padding - e.g. 1.dpx, 2.dpx... 500000.dpx.
-`RAWcooked` has no strict expectations of a complete, continuous number sequence, so breaks in the sequence - e.g. 47.dpx, 48.dpx, 65.dpx, 66.dpx - will cause no error or failure in `RAWcooked`.
+`RAWcooked` has no strict expectations of a complete, continuous number sequence, so breaks in the sequence - e.g. 47.dpx, 48.dpx, 65.dpx, 66.dpx - will cause no error or failure in `RAWcooked`, unless you specify that you want this with the ```--no-accept-gaps``` flag.
`RAWcooked` has expectations about the folder and subfolder structures containing the image files, and the Matroska that is created will manage subfolders in this way:
-
+
- a single folder of image files, or a folder with a single subfolder of image files, will result in a Matroska with one video track
- a folder with multiple subfolders of image files, will result in a Matroska with multiple video tracks, one track per subfolder
-
+
This behaviour could help to manage different use cases, according to local preference. For example:
+
- multiple reels in a single Matroska, one track per reel
- multiple film scan attempts (rescanning to address a technical issue in the first scan) in a single Matroska, one track per scan attempt
- multiple overscan approaches (e.g. no perfs, full perfs) in a single Matroska, one track per overscan approach
-
-Note that maximum permitted video tracks is encoded in the `RAWcooked` licence, so users may have to request extended track allowance as required.
-
+
+Note that maximum permitted video tracks is encoded in the `RAWcooked` licence (see licence section below), so users may have to request extended track allowance as required.
+
If your encodings do not succeed and you receive these messages, then you will need to encode your image sequence with the additional flag ```--output-version 2```:
```
Error: the reversibility file is becoming big | Error: undecodable file is becoming too big
```
This is caused by padding data that is not zeros and which must be written into your reversibility data file attachment for restoration to the DPX images when decoded. As this data can exceed FFmpeg's maximum attachment size limit of 1GB, this flag appends the attachment to the FFV1 Matroska file after encoding has completed. This feature is not backward compatible with `RAWcooked` software before version 21.09.
+
## Decode
```
@@ -76,4 +82,28 @@ The file supplied must be a Matroska container (.mkv) created by the `RAWcooked`
For the best decoding experience you should always ensure you encode with the ```--all``` command which includes hashes within the reversibility data of the encoded Matroska file. This ensures the the decoded files can be compared to the original source file hashes, ensuring bit perfect reversibility.
+
+## Default licence and expansion
+
+The default `RAWcooked` license allows you to encode and decode without any additional purchases for these few limited flavours:
+
+| From | To |
+| -------------------- | --------------------- |
+| DPX 8-bit | FFV1 / Matroska |
+| DPX 10-bit LE Filled A | FFV1 / Matroska |
+| DPX 10-bit BE Filled A | FFV1 / Matroska |
+| PCM 48kHz 16-bit 2 channel in WAV, BWF, RF64, AIFF, AVI | FLAC / Matroska |
+
+`RAWcooked` is an open-source project and so the software can be built from binary, but to ensure long-term support for this project we ask you install this software using our guide and support the project with development sponsorship and by purchasing licence additions that support your file formats.
+
+When you purchase and additional licence you will need to update your software installation with the licence number supplied by Media Area.
+```
+rawcooked --store-license
+```
+
+To review your licence details you can use this command:
+```
+rawcooked --show-license
+```
+You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use. To find out more please contact Media Area by email on [info@mediaarea.net](mailto:info@mediaarea.net)
From 4742e9cae9b99cccb8dfc876206e2d9394729764 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:26:56 +0000
Subject: [PATCH 69/93] Update User_Manual.md
---
Doc/User_Manual.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index d5a3a100..01cb7408 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -14,8 +14,8 @@ Using the `RAWcooked` tool:
- muxes these into a Matroska container (.mkv)
- uses FFmpeg for this encoding process
-To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag which concatenates several important flags into one:
-
+To encode your sequences using the best preservation flags within RAWcooked then you can use the ```--all``` flag. This flag concatenates several important flags into one ensuring lossless compression and assured reversibilty:
+
| Flags | Description |
| ------------------------- | ------------------------------------------ |
From 9983b2b3b4bc816796a386a6491bbfde34bd95fe Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:38:07 +0000
Subject: [PATCH 70/93] Update User_Manual.md
---
Doc/User_Manual.md | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 01cb7408..1455348c 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -70,7 +70,7 @@ Error: the reversibility file is becoming big | Error: undecodable file is becom
This is caused by padding data that is not zeros and which must be written into your reversibility data file attachment for restoration to the DPX images when decoded. As this data can exceed FFmpeg's maximum attachment size limit of 1GB, this flag appends the attachment to the FFV1 Matroska file after encoding has completed. This feature is not backward compatible with `RAWcooked` software before version 21.09.
-## Decode
+## Decoding
```
rawcooked --all
@@ -80,8 +80,21 @@ The file supplied must be a Matroska container (.mkv) created by the `RAWcooked`
### For successful decoding
-For the best decoding experience you should always ensure you encode with the ```--all``` command which includes hashes within the reversibility data of the encoded Matroska file. This ensures the the decoded files can be compared to the original source file hashes, ensuring bit perfect reversibility.
+For the best decoding experience you should always ensure you encode with the ```--all``` command which includes hashes within the reversibility data of the encoded Matroska file. This ensures that the decoded files can be compared to the original source file hashes, ensuring bit perfect reversibility.
+
+
+## Capturing encoding and decoding logs
+
+It is advisable to always capture the console output of your `RAWcooked` encoding and decoding for review over time. The console output will include `RAWcooked` software information, warning or error messagess, plus confirmation of a successful encode or decode. The console also outputs important encoding information from the FFmpeg encoding software including FFmpeg version, file metadata and stream encoding configurations. Over time this information can be valuable for understanding your compressed files. To capture console log outputs for standard output and standard errors you can use the following commands.
+MacOS/Linux:
+```
+rawcooked --all >> 2>&1
+```
+Windows:
+```
+rawcooked --all 1> 2>&1
+```
## Default licence and expansion
From 1842df56a1593b90eee5e1d331faef1ed628f852 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:38:38 +0000
Subject: [PATCH 71/93] Update User_Manual.md
---
Doc/User_Manual.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 1455348c..045cc69b 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -83,7 +83,7 @@ The file supplied must be a Matroska container (.mkv) created by the `RAWcooked`
For the best decoding experience you should always ensure you encode with the ```--all``` command which includes hashes within the reversibility data of the encoded Matroska file. This ensures that the decoded files can be compared to the original source file hashes, ensuring bit perfect reversibility.
-## Capturing encoding and decoding logs
+## Capturing logs
It is advisable to always capture the console output of your `RAWcooked` encoding and decoding for review over time. The console output will include `RAWcooked` software information, warning or error messagess, plus confirmation of a successful encode or decode. The console also outputs important encoding information from the FFmpeg encoding software including FFmpeg version, file metadata and stream encoding configurations. Over time this information can be valuable for understanding your compressed files. To capture console log outputs for standard output and standard errors you can use the following commands.
From 3b9bbfc043b5b2f405ea1336283a97eece50f8ce Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:39:08 +0000
Subject: [PATCH 72/93] Update User_Manual.md
---
Doc/User_Manual.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 045cc69b..430be19d 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -98,7 +98,7 @@ rawcooked --all 1> 2>&1
## Default licence and expansion
-The default `RAWcooked` license allows you to encode and decode without any additional purchases for these few limited flavours:
+The default `RAWcooked` license allows you to encode and decode without any additional purchases for these flavours:
| From | To |
| -------------------- | --------------------- |
From cd1ebb4f7bf379ff5b42a10c967e249a91f40360 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:41:34 +0000
Subject: [PATCH 73/93] Update User_Manual.md
---
Doc/User_Manual.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 430be19d..b4a36f1a 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -107,9 +107,9 @@ The default `RAWcooked` license allows you to encode and decode without any addi
| DPX 10-bit BE Filled A | FFV1 / Matroska |
| PCM 48kHz 16-bit 2 channel in WAV, BWF, RF64, AIFF, AVI | FLAC / Matroska |
-`RAWcooked` is an open-source project and so the software can be built from binary, but to ensure long-term support for this project we ask you install this software using our guide and support the project with development sponsorship and by purchasing licence additions that support your file formats.
+`RAWcooked` is an open-source project and so the software can be built from binary, but to ensure long-term support and development for this project we ask you install this software using our simple [installation guidelines](https://mediaarea.net/RAWcooked/Download) and support the project with by purchasing licence additions to support your file formats, or by sponsorship of new feature development.
-When you purchase and additional licence you will need to update your software installation with the licence number supplied by Media Area.
+When you purchase an additional licence you will need to update your software installation with the new licence number, supplied by Media Area.
```
rawcooked --store-license
```
From 3a8d4046f2b5087790fa684c05f95718f48db845 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 12:44:06 +0000
Subject: [PATCH 74/93] Update User_Manual.md
---
Doc/User_Manual.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index b4a36f1a..bb4a446c 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -119,4 +119,4 @@ To review your licence details you can use this command:
rawcooked --show-license
```
-You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use. To find out more please contact Media Area by email on [info@mediaarea.net](mailto:info@mediaarea.net)
+You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use. To find out about more about licences or any other feature development please contact Media Area - [info@mediaarea.net](mailto:info@mediaarea.net).
From 656f4c3a198dff71e5dd4ff7f851e75aa4588ccf Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 13:53:17 +0000
Subject: [PATCH 75/93] Update User_Manual.md
---
Doc/User_Manual.md | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index bb4a446c..158225d7 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -26,8 +26,7 @@ To encode your sequences using the best preservation flags within RAWcooked then
| ```--hash``` | Important flag which computes hashes and embeds them in reversibility data stored in MKV wrapper allowing reversibility test assurance when original sequences absent |
| ```--coherency``` | Ensures package and content are coherent. Eg, sequence gap checks and audio duration matches image sequence duration |
| ```--check``` | Checks that an encoded file can be decoded correctly. If input is raw content, after encoding it checks that output would be same as the input content. Whereas if input is compressed content, it checksthat output would be same as the original content where hashes are present |
-| ```--check_padding``` | Runs padding checks for DPX files that do not have zero padding. Ensures additional padding data is stored in reversibility |
-| | file for perfect restoration of the DPX. Can be time consuming |
+| ```--check_padding``` | Runs padding checks for DPX files that do not have zero padding. Ensures additional padding data is stored in reversibility file for perfect restoration of the DPX. Can be time consuming |
| ```--accept-gaps``` | Where gaps in a sequence are found this flag ensures the encoding completes successfully. If you require that gaps are not encoded then follow the ```--all``` command with ```--no-accept-gaps``` |
From 4dd0d3b038a240a9b896abbf3969b3e1f3ac524c Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 13:53:46 +0000
Subject: [PATCH 76/93] Update User_Manual.md
---
Doc/User_Manual.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 158225d7..9c2ef4d4 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -25,7 +25,7 @@ To encode your sequences using the best preservation flags within RAWcooked then
| ```--decode``` | Select decode when an FFV1 Matroska file is supplied |
| ```--hash``` | Important flag which computes hashes and embeds them in reversibility data stored in MKV wrapper allowing reversibility test assurance when original sequences absent |
| ```--coherency``` | Ensures package and content are coherent. Eg, sequence gap checks and audio duration matches image sequence duration |
-| ```--check``` | Checks that an encoded file can be decoded correctly. If input is raw content, after encoding it checks that output would be same as the input content. Whereas if input is compressed content, it checksthat output would be same as the original content where hashes are present |
+| ```--check``` | Checks that an encoded file can be decoded correctly. If input is raw content, after encoding it checks that output would be same as the input content. Whereas if input is compressed content, it checks that output would be same as the original content where hashes are present |
| ```--check_padding``` | Runs padding checks for DPX files that do not have zero padding. Ensures additional padding data is stored in reversibility file for perfect restoration of the DPX. Can be time consuming |
| ```--accept-gaps``` | Where gaps in a sequence are found this flag ensures the encoding completes successfully. If you require that gaps are not encoded then follow the ```--all``` command with ```--no-accept-gaps``` |
From bb0aa27062f942ff19ffd339adfd88d742d78b22 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Thu, 7 Mar 2024 13:56:06 +0000
Subject: [PATCH 77/93] Update User_Manual.md
---
Doc/User_Manual.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 9c2ef4d4..6d892867 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -84,15 +84,15 @@ For the best decoding experience you should always ensure you encode with the ``
## Capturing logs
-It is advisable to always capture the console output of your `RAWcooked` encoding and decoding for review over time. The console output will include `RAWcooked` software information, warning or error messagess, plus confirmation of a successful encode or decode. The console also outputs important encoding information from the FFmpeg encoding software including FFmpeg version, file metadata and stream encoding configurations. Over time this information can be valuable for understanding your compressed files. To capture console log outputs for standard output and standard errors you can use the following commands.
+It is advisable to always capture the console output of your `RAWcooked` encoding and decoding for review over time. The console output will include `RAWcooked` software information, warning or error messagess, plus confirmation of a successful encode or decode. The console also outputs important encoding information from the FFmpeg encoding software including FFmpeg version, file metadata and stream encoding configurations. Over time this information can be valuable for understanding your compressed files. To capture console log outputs for standard output and standard errors you can use the following commands. You may want to add ```-y``` or ```-n``` which answers yes or no to any questions asked by `RAWcooked` software, unless you're happy monitoring the logs as they are created to intercept any questions.
MacOS/Linux:
```
-rawcooked --all >> 2>&1
+rawcooked --all -y >> 2>&1
```
Windows:
```
-rawcooked --all 1> 2>&1
+rawcooked --all -y 1> 2>&1
```
## Default licence and expansion
From 9e9ca24e0e5859956a5f0a96efd411edf057727b Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 10:53:24 +0100
Subject: [PATCH 78/93] Update Case_study.md
Add 2k bits
---
Doc/Case_study.md | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 4c748ed9..bd5d7a84 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -227,7 +227,12 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX (the smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame)
* Across all 1020 encoded sequences the average size of the finished FFV1 was 29% smaller than the source image sequence
-A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 16-bit sequences. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 4K 16-bit sequences. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+
+A separate 2K solo and parallel encoding test revealed much quicker encoding times from our servers:
+* Solo 341GB 2K RGB 12-bit sequence took 80 minutes to complete RAWcooked encoding.
+* Solo 126GB 2K RGB 16-bit sequence tool 62 minutes to complete.
+* Parallel 367GB/325GB 2K RGB 16-bit sequences took 160 minutes/140 minutes to complete respectively.
### Useful test approaches
From 080ce97680ac85e0de171d24d4e047f30a5bc4c9 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 10:55:38 +0100
Subject: [PATCH 79/93] Update Case_study.md
---
Doc/Case_study.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index bd5d7a84..91512eb8 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -230,9 +230,9 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 4K 16-bit sequences. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
A separate 2K solo and parallel encoding test revealed much quicker encoding times from our servers:
-* Solo 341GB 2K RGB 12-bit sequence took 80 minutes to complete RAWcooked encoding.
-* Solo 126GB 2K RGB 16-bit sequence tool 62 minutes to complete.
-* Parallel 367GB/325GB 2K RGB 16-bit sequences took 160 minutes/140 minutes to complete respectively.
+* Solo 341GB 2K RGB 12-bit sequence took 80 minutes to complete RAWcooked encoding. MKV was 22.5% smaller than DPX.
+* Solo 126GB 2K RGB 16-bit sequence tool 62 minutes to complete. MKV was 30.6% smaller than the DPX.
+* Parallel 367GB/325GB 2K RGB 16-bit sequences took 160 minutes/140 minutes to complete. MKVs were 27.6% and 24.4% smaller than their DPX respectively.
### Useful test approaches
From 5e086997439dbce04066a436aee0be1b4a287fea Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 11:02:32 +0100
Subject: [PATCH 80/93] Update User_Manual.md
Add sublicense data
---
Doc/User_Manual.md | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 6d892867..45dfe653 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -118,4 +118,16 @@ To review your licence details you can use this command:
rawcooked --show-license
```
-You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use. To find out about more about licences or any other feature development please contact Media Area - [info@mediaarea.net](mailto:info@mediaarea.net).
+You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use.
+
+To issue a sublicence to be loan to a third party company for one month:
+```
+--sublicense
+```
+The value entry would be your own unique number. To set an expiry date, if you were to create this licence the beginning of May for example:
+```
+--sublicense-dur 0
+```
+This would create a licence that would last until the end of May. The default value is 1, which would provide an active licence until the end of the following month of issue.
+
+To find out about more about licences or any other feature development please contact Media Area - [info@mediaarea.net](mailto:info@mediaarea.net).
From 4b8deee03eea463c9893751c8f355290a494f8f4 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 11:03:58 +0100
Subject: [PATCH 81/93] Update User_Manual.md
---
Doc/User_Manual.md | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 45dfe653..59fa40f9 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -118,9 +118,7 @@ To review your licence details you can use this command:
rawcooked --show-license
```
-You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use.
-
-To issue a sublicence to be loan to a third party company for one month:
+You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use. To issue a sublicence to be loaned to a third party company supplying RAWcooked files to you:
```
--sublicense
```
From 1bd00b024fa104a8951b0fe735bae4d85a35fb02 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 11:07:59 +0100
Subject: [PATCH 82/93] Update User_Manual.md
Typing error fixes
---
Doc/User_Manual.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index 59fa40f9..fa459f51 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -106,7 +106,7 @@ The default `RAWcooked` license allows you to encode and decode without any addi
| DPX 10-bit BE Filled A | FFV1 / Matroska |
| PCM 48kHz 16-bit 2 channel in WAV, BWF, RF64, AIFF, AVI | FLAC / Matroska |
-`RAWcooked` is an open-source project and so the software can be built from binary, but to ensure long-term support and development for this project we ask you install this software using our simple [installation guidelines](https://mediaarea.net/RAWcooked/Download) and support the project with by purchasing licence additions to support your file formats, or by sponsorship of new feature development.
+`RAWcooked` is an open-source project and so the software can be built from binary, but to ensure long-term support and development for this project we ask you install this software using our simple [installation guidelines](https://mediaarea.net/RAWcooked/Download) and support the project by purchasing licence additions to support your file formats, or by sponsorship of new feature development.
When you purchase an additional licence you will need to update your software installation with the new licence number, supplied by Media Area.
```
@@ -117,15 +117,15 @@ To review your licence details you can use this command:
```
rawcooked --show-license
```
-
-You may purchase a sublicence from Media Area which can be loaned to third party suppliers in the creation of assets for the purchaser's use. To issue a sublicence to be loaned to a third party company supplying RAWcooked files to you:
+
+You may purchase a sublicence from Media Area which can be loaned to third party suppliers for the creation of assets for the purchaser's use. To issue a sublicence that can be loaned to a third party company supplying RAWcooked files to you:
```
--sublicense
```
-The value entry would be your own unique number. To set an expiry date, if you were to create this licence the beginning of May for example:
+The value entry would be your own unique number, and to set a unique expiry date:
```
--sublicense-dur 0
```
-This would create a licence that would last until the end of May. The default value is 1, which would provide an active licence until the end of the following month of issue.
+This would create a licence that would last until the end of May if created at the beginning of that month. The default value is 1, which would provide an active licence until the end of the following month of issue.
-To find out about more about licences or any other feature development please contact Media Area - [info@mediaarea.net](mailto:info@mediaarea.net).
+To find out more about licences or any other feature development please contact Media Area - [info@mediaarea.net](mailto:info@mediaarea.net).
From 69522a5e013e751124e4004192d1239b8fd16084 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 11:15:07 +0100
Subject: [PATCH 83/93] Update Case_study.md
Add MKV durations
---
Doc/Case_study.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 91512eb8..28a7ac65 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -230,9 +230,9 @@ From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 4K 16-bit sequences. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
A separate 2K solo and parallel encoding test revealed much quicker encoding times from our servers:
-* Solo 341GB 2K RGB 12-bit sequence took 80 minutes to complete RAWcooked encoding. MKV was 22.5% smaller than DPX.
-* Solo 126GB 2K RGB 16-bit sequence tool 62 minutes to complete. MKV was 30.6% smaller than the DPX.
-* Parallel 367GB/325GB 2K RGB 16-bit sequences took 160 minutes/140 minutes to complete. MKVs were 27.6% and 24.4% smaller than their DPX respectively.
+* Solo 341GB 2K RGB 12-bit sequence took 80 minutes to complete RAWcooked encoding. MKV was 22.5% smaller than DPX. The MKV was 16 minutes and 16 seconds.
+* Solo 126GB 2K RGB 16-bit sequence tool 62 minutes to complete. MKV was 30.6% smaller than the DPX. The duration of the MKV was 11 mins 42 secs.
+* Parallel 367GB/325GB 2K RGB 16-bit sequences took 160 minutes/140 minutes to complete. MKVs were 27.6% and 24.4% smaller than their DPX respectively. The durations were 11 mins 34 secs, and 10 mins 15 secs.
### Useful test approaches
From 69ac58b9ff755a433d42b68d859107472960fac4 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Tue, 9 Apr 2024 11:21:28 +0100
Subject: [PATCH 84/93] Update User_Manual.md
Clean up month statement
---
Doc/User_Manual.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Doc/User_Manual.md b/Doc/User_Manual.md
index fa459f51..68766971 100644
--- a/Doc/User_Manual.md
+++ b/Doc/User_Manual.md
@@ -126,6 +126,6 @@ The value entry would be your own unique number, and to set a unique expiry date
```
--sublicense-dur 0
```
-This would create a licence that would last until the end of May if created at the beginning of that month. The default value is 1, which would provide an active licence until the end of the following month of issue.
+This would create a licence that would last until the of the current month. The default value is 1, which would provide an active licence until the end of the following month of issue.
To find out more about licences or any other feature development please contact Media Area - [info@mediaarea.net](mailto:info@mediaarea.net).
From 913f81a566557cd2194c261ae0d6fb8c47b26e79 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 12 Apr 2024 15:04:33 +0100
Subject: [PATCH 85/93] Update Case_study.md
---
Doc/Case_study.md | 32 ++++++++++++++++++--------------
1 file changed, 18 insertions(+), 14 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 28a7ac65..f0b71dc1 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -215,24 +215,28 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
## Conclusion
-We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered, usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
-In recent years we have been encoding a larger variety of DPX sequences, a mix of 2K and 4K of various bit depths has seen our licence expand. Between February 2023 and February 2024 the BFI collected data about its business-as-usual encoding capturing details of 1020 DPX encodings to CSV. A Python script was written to capture data about these encoded files, including sequence pixel size, colourspace, bits, total byte size of the image sequence and completed FFV1 Matroska.
+In recent years we have been encoding a mix of 2K and 4K of various bit depths, seeing our licence expand. When we solely encoded 2K sequences we found we could run multiple parallel processes with good efficiency, seeing 32 concurrent encodings running at once. This was before we implemented the '--all' command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. We saw our concurrency drop to accomodate the more detailed encoding process, particularly as our workflow introduced a final '--check' pass against the Matroska file that automated the deletion of the DPX sequence when successful.
-From 1020 total DPX sequences successfully encoded to FFV1 Matroska:
-* 140 were 2K or smaller / 880 were 4K
-* 222 were Luma Y / 798 were RGB
-* 143 were 10-bit / 279 12-bit / 598 16-bit
-* The largest reduction in size of any FFV1 was 88% smaller than the source DPX (the largest reductions were from 10/12-bit sequences, with RGB colorspace that had black and white filters applied)
-* The smallest reduction saw the FFV1 just 0.3% smaller than the DPX (the smallest reductions were from RGB and Y-Luma 16-bit image sequences scanned full frame)
-* Across all 1020 encoded sequences the average size of the finished FFV1 was 29% smaller than the source image sequence
+Since running an increasing number of 4K sequences we find we have better '--all' encoding and parallel '--check' efficiency running just two parallel encodings at any given time. We recently ran a review of our 4K and 2K encoding timings. Below are some recent 4K DPX encoding times using RAWcooked's '--all' command with a maximum of two parallel encodings, and where we can assume another single '--check' run is underway from the server:
-A small group of sequences had their RAWcooked encoding times recorded, revealing an average of 24 hours per sequence. The sequences all had finished MKV durations between 5 and 10 minutes and were mostly 4K 16-bit sequences. The fastest encodes took just 7 hours with some taking upto 46 hours. There appears to be no cause for these variations in the files themselves and so we must assume that general network activity and/or amount of parallel processes running have influenced these variations.
+* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
+* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
+* Parallel 4K RGB 16-bit DPX (1078.7 GB) - MKV duration 9:56 (954.9 GB) - encoding time 7:49:20 - MKV 11.5% smaller than DPX
+* Parallel 4K RGB 12-bit DPX (796.3 GB) - MKV duration 9:47 (194.1 GB) - encoding time 5:13:22 - MKV 75.6% smaller than DPX *
+* Parallel 4K RGB 12-bit DPX (118.1 GB) - MKV duration 1:27 (87.1 GB) - encoding time 1:06:02 - MKV 26.3% smaller than DPX
+* Parallel 4K RGB 12-bit DPX (121.6 GB) - MKV duration 1:29 (87.3 GB) - encoding time 0:54:00 - MKV 28.2% smaller than DPX
+* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX *
-A separate 2K solo and parallel encoding test revealed much quicker encoding times from our servers:
-* Solo 341GB 2K RGB 12-bit sequence took 80 minutes to complete RAWcooked encoding. MKV was 22.5% smaller than DPX. The MKV was 16 minutes and 16 seconds.
-* Solo 126GB 2K RGB 16-bit sequence tool 62 minutes to complete. MKV was 30.6% smaller than the DPX. The duration of the MKV was 11 mins 42 secs.
-* Parallel 367GB/325GB 2K RGB 16-bit sequences took 160 minutes/140 minutes to complete. MKVs were 27.6% and 24.4% smaller than their DPX respectively. The durations were 11 mins 34 secs, and 10 mins 15 secs.
+Note *: Where the MKV is significantly smaller than the DPX we can assume there is either lots of spare padding data in the file, or a b/w filter has been applied to an RGB scan.
+
+A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the '--all' command and where we can assume another single '--check' run is underway:
+
+* Solo 2K RGB 12-bit DPX (341 GB) - MKV duration 16:16 - encoding time 1:20:00 - MKV 22.5% smaller than DPX
+* Solo 2K RGB 16-bit DPX (126 GB) - MKV duration 11:42 - encoding time 1:02:00 - MKV was 30.6% smaller than the DPX
+* Parallel 2K RGB 16-bit DPX (367 GB) - MKV duration 11:34 - encoding time 2:40:00 - MKV was 27.6% smaller than the DPX
+* Parallel 2K RGB 16-bit DPX (325 GB) - MKV duration 10:15 - encoding time 2:21:00 - MKV was 24.4% smaller than the DPX
### Useful test approaches
From 4fd29987d3cb818f9fb7cfc3195a6041edf43b0e Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 12 Apr 2024 15:06:10 +0100
Subject: [PATCH 86/93] Update Case_study.md
---
Doc/Case_study.md | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index f0b71dc1..042c8141 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -224,12 +224,11 @@ Since running an increasing number of 4K sequences we find we have better '--all
* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
* Parallel 4K RGB 16-bit DPX (1078.7 GB) - MKV duration 9:56 (954.9 GB) - encoding time 7:49:20 - MKV 11.5% smaller than DPX
-* Parallel 4K RGB 12-bit DPX (796.3 GB) - MKV duration 9:47 (194.1 GB) - encoding time 5:13:22 - MKV 75.6% smaller than DPX *
+* Parallel 4K RGB 12-bit DPX (796.3 GB) - MKV duration 9:47 (194.1 GB) - encoding time 5:13:22 - MKV 75.6% smaller than DPX **
* Parallel 4K RGB 12-bit DPX (118.1 GB) - MKV duration 1:27 (87.1 GB) - encoding time 1:06:02 - MKV 26.3% smaller than DPX
* Parallel 4K RGB 12-bit DPX (121.6 GB) - MKV duration 1:29 (87.3 GB) - encoding time 0:54:00 - MKV 28.2% smaller than DPX
-* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX *
-
-Note *: Where the MKV is significantly smaller than the DPX we can assume there is either lots of spare padding data in the file, or a b/w filter has been applied to an RGB scan.
+* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX **
+** Where the MKV is significantly smaller than the DPX then a black and while filter will have been applied to an RGB scan, as in these cases.
A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the '--all' command and where we can assume another single '--check' run is underway:
From f286b0849b59e4c93b8df138bac3d765a434a6c3 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 12 Apr 2024 15:07:20 +0100
Subject: [PATCH 87/93] Update Case_study.md
Update encoding timings with set parallel encoding number
---
Doc/Case_study.md | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 042c8141..4a43c96d 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -221,13 +221,13 @@ In recent years we have been encoding a mix of 2K and 4K of various bit depths,
Since running an increasing number of 4K sequences we find we have better '--all' encoding and parallel '--check' efficiency running just two parallel encodings at any given time. We recently ran a review of our 4K and 2K encoding timings. Below are some recent 4K DPX encoding times using RAWcooked's '--all' command with a maximum of two parallel encodings, and where we can assume another single '--check' run is underway from the server:
-* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
-* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
-* Parallel 4K RGB 16-bit DPX (1078.7 GB) - MKV duration 9:56 (954.9 GB) - encoding time 7:49:20 - MKV 11.5% smaller than DPX
-* Parallel 4K RGB 12-bit DPX (796.3 GB) - MKV duration 9:47 (194.1 GB) - encoding time 5:13:22 - MKV 75.6% smaller than DPX **
-* Parallel 4K RGB 12-bit DPX (118.1 GB) - MKV duration 1:27 (87.1 GB) - encoding time 1:06:02 - MKV 26.3% smaller than DPX
-* Parallel 4K RGB 12-bit DPX (121.6 GB) - MKV duration 1:29 (87.3 GB) - encoding time 0:54:00 - MKV 28.2% smaller than DPX
-* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX **
+* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
+* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
+* Parallel 4K RGB 16-bit DPX (1078.7 GB) - MKV duration 9:56 (954.9 GB) - encoding time 7:49:20 - MKV 11.5% smaller than DPX
+* Parallel 4K RGB 12-bit DPX (796.3 GB) - MKV duration 9:47 (194.1 GB) - encoding time 5:13:22 - MKV 75.6% smaller than DPX **
+* Parallel 4K RGB 12-bit DPX (118.1 GB) - MKV duration 1:27 (87.1 GB) - encoding time 1:06:02 - MKV 26.3% smaller than DPX
+* Parallel 4K RGB 12-bit DPX (121.6 GB) - MKV duration 1:29 (87.3 GB) - encoding time 0:54:00 - MKV 28.2% smaller than DPX
+* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX **
** Where the MKV is significantly smaller than the DPX then a black and while filter will have been applied to an RGB scan, as in these cases.
A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the '--all' command and where we can assume another single '--check' run is underway:
From 89e35cf42ec6f679dd5849aece5f9f8e626b8653 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 12 Apr 2024 15:09:49 +0100
Subject: [PATCH 88/93] Update Case_study.md
---
Doc/Case_study.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 4a43c96d..4018b172 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -217,9 +217,9 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
-In recent years we have been encoding a mix of 2K and 4K of various bit depths, seeing our licence expand. When we solely encoded 2K sequences we found we could run multiple parallel processes with good efficiency, seeing 32 concurrent encodings running at once. This was before we implemented the '--all' command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. We saw our concurrency drop to accomodate the more detailed encoding process, particularly as our workflow introduced a final '--check' pass against the Matroska file that automated the deletion of the DPX sequence when successful.
+In recent years we have been encoding a mix of 2K and 4K of various bit depths, seeing our licence expand. When we solely encoded 2K sequences we found we could run multiple parallel processes with good efficiency, seeing 32 concurrent encodings running at once. This was before we implemented the ```--all``` command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. We saw our concurrency drop to accomodate this more detailed encoding process, particularly as our workflow introduced a final ```--check``` pass against the Matroska file that automated the deletion of the DPX sequence, when successful.
-Since running an increasing number of 4K sequences we find we have better '--all' encoding and parallel '--check' efficiency running just two parallel encodings at any given time. We recently ran a review of our 4K and 2K encoding timings. Below are some recent 4K DPX encoding times using RAWcooked's '--all' command with a maximum of two parallel encodings, and where we can assume another single '--check' run is underway from the server:
+Since running an increasing number of 4K sequences we find we have better ```--all``` encoding and parallel ```--check``` efficiency running just two parallel encodings at any given time. We recently ran a review of our 4K and 2K encoding timings. Below are some recent 4K DPX encoding times using RAWcooked's ```--all``` command with a maximum of two parallel encodings, and where we can assume another single ```--check``` run is underway from the server:
* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
@@ -230,7 +230,7 @@ Since running an increasing number of 4K sequences we find we have better '--all
* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX **
** Where the MKV is significantly smaller than the DPX then a black and while filter will have been applied to an RGB scan, as in these cases.
-A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the '--all' command and where we can assume another single '--check' run is underway:
+A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the ```--all``` command and where we can assume another single ```--check``` run is also working in parallel:
* Solo 2K RGB 12-bit DPX (341 GB) - MKV duration 16:16 - encoding time 1:20:00 - MKV 22.5% smaller than DPX
* Solo 2K RGB 16-bit DPX (126 GB) - MKV duration 11:42 - encoding time 1:02:00 - MKV was 30.6% smaller than the DPX
From 63e3ab49bc21a390bb16e1e332cba665f514939c Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 12 Apr 2024 15:46:04 +0100
Subject: [PATCH 89/93] Update Case_study.md
Rewrite last paragraph, for review
---
Doc/Case_study.md | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 4018b172..e021d495 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -215,11 +215,13 @@ It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reve
---
## Conclusion
-We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film. We plan to begin testing RAWcooked encoding of TIFF image sequences shortly with the intention of moving DCDM image sequences to FFV1. Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg file encoding a specific DPX scan - there can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence, a build up of reported errors will indicate repeated problems.
+We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film.
-In recent years we have been encoding a mix of 2K and 4K of various bit depths, seeing our licence expand. When we solely encoded 2K sequences we found we could run multiple parallel processes with good efficiency, seeing 32 concurrent encodings running at once. This was before we implemented the ```--all``` command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. We saw our concurrency drop to accomodate this more detailed encoding process, particularly as our workflow introduced a final ```--check``` pass against the Matroska file that automated the deletion of the DPX sequence, when successful.
+Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg while encoding a specific DPX scan. There can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence.
-Since running an increasing number of 4K sequences we find we have better ```--all``` encoding and parallel ```--check``` efficiency running just two parallel encodings at any given time. We recently ran a review of our 4K and 2K encoding timings. Below are some recent 4K DPX encoding times using RAWcooked's ```--all``` command with a maximum of two parallel encodings, and where we can assume another single ```--check``` run is underway from the server:
+When we solely encoded 2K sequences we found we could run multiple parallel processes with good efficiency, seeing as many as 32 concurrent encodings running at once. This was before we implemented the ```--all``` command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. When introducing this command we reduced our concurrency, particularly as our workflow introduced a final ```--check``` pass against the Matroska file that automated the deletion of the DPX sequence, when successful. We generally set 6 to 8 concurrent encodings on our busier QNAP storage, and 2 concurrent encodings on other storage.
+
+In recent years we have seen a shift from majority 2K DPX to majority 4K DPX with mostly 12- or 16-bit depths. Very recently we have found ```--all``` encoding and parallel ```--check``` efficiency increases when running just two parallel encodings at any given time. Below are some recent 4K DPX encoding times using RAWcooked's ```--all``` command with a maximum of two parallel encodings, and where we can assume another single ```--check``` run is underway from the server:
* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
@@ -236,6 +238,8 @@ A separate 2K solo and parallel encoding test revealed much quicker encoding tim
* Solo 2K RGB 16-bit DPX (126 GB) - MKV duration 11:42 - encoding time 1:02:00 - MKV was 30.6% smaller than the DPX
* Parallel 2K RGB 16-bit DPX (367 GB) - MKV duration 11:34 - encoding time 2:40:00 - MKV was 27.6% smaller than the DPX
* Parallel 2K RGB 16-bit DPX (325 GB) - MKV duration 10:15 - encoding time 2:21:00 - MKV was 24.4% smaller than the DPX
+
+It provides us with great reassurance to implement the ```--all``` command and we remain highly satisfied with RAWcooked encoding of DPX sequences despite the reduction in our concurrent encodings. The embedded DPX hashes which ```all``` includes are critical for long-term preservation of the digitised film. In addition there are checksums embedded in the slices of every video frame (upto 576 per frame so 576 checksums per video frame) allowing granular analysis of any problems found with digital FFV1 preservation files, should they arise. This is thanks to the FFV1 codec, and it allows us to pinpoint exactly where digital damage may have ocurred. This means we can easily replace the impacted DPX files with duplicates from our duplicate preservation copies. Open-source RAWcooked, FFV1 and Matroska allow open access to their source code which means reduced likelihood of obsolescence long into the future. Finally, we plan to begin testing RAWcooked encoding of TIFF image sequences with the intention of encoding DCDM image sequences to FFV1 also.
### Useful test approaches
From 291e0e37031b4419e3cd482fa98fcfa36471b408 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 12 Apr 2024 16:33:05 +0100
Subject: [PATCH 90/93] Update Case_study.md
Clean up spelling errors
---
Doc/Case_study.md | 44 ++++++++++++++++++++++----------------------
1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index e021d495..070fb2bc 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,7 +1,7 @@
# BFI National Archive RAWcooked Case Study
**by Joanna White, Knowledge & Collections Developer**
-At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions and flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K and 4K image sequences. This workflow is built on some of the flags developed in RAWcooked by Media Area and written in a mix of Bash shell and Python3 scripts ([BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding)). In addition to RAWcooked we use other Media Area tools to complete necessary stages of this workflow. Our encoding processes do not include any alpha channel or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska.
+At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions, flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K and 4K image sequences. This workflow is built on some of the flags developed in RAWcooked by Media Area and written in a mix of Bash shell and Python3 scripts ([BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding)). In addition to RAWcooked we use other Media Area tools to complete necessary stages of this workflow. Our encoding processes do not include any alpha channel or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska.
This case study is broken into the following sections:
* [Server configurations](#server_config)
@@ -45,9 +45,9 @@ When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps)
For each image sequence processed the metadata of the first DPX is collected and saved into the sequence folder, along with total folder size in bytes and a list of all contents of the sequence. We collect this information using [Media Area's MediaInfo software](https://mediaarea.net/MediaInfo) and capture the output into script variables.
-Next the first file within the image sequence is checked against a [Media Area's MediaConch software](https://mediaarea.net/MediaConch) policy for the file ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX.
+Next the first file within the image sequence is checked against a DPX policy created using [Media Area's MediaConch software](https://mediaarea.net/MediaConch) - ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX.
-The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequences to make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.
+The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequence will make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.
| RAWcooked 2K RGB | RAWcooked Luma & RAWcooked 4K |
| -------------------- | ----------------------------- |
@@ -56,7 +56,7 @@ The frame pixel size and colourspace of the sequence are used to calculate the p
### Encoding the image sequence
-To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into this one simple flag. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.
+To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into one. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.
Our RAWcooked encode command:
```
@@ -67,7 +67,7 @@ rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/seque
| ---------------------- | ------------------------------------------ |
| ```rawcooked``` | Calls the software |
| ```-y``` | Answers 'yes' to software questions |
-| ```-all``` | Preservation command with CRC-32 hashes |
+| ```--all``` | Preservation command with CRC-32 hashes |
| ```--no-accept-gaps``` | Exit with warning if sequence gaps found |
| | --all command defaults to accepting gaps |
| ```-s 5281680``` | Set max attachment size to 5MB |
@@ -75,14 +75,14 @@ rawcooked -y --all --no-accept-gaps -s 5281680 path/sequence_name/ -o path/seque
| ```>>``` | Capture console output to text file |
| ```2>&1``` | stderr and stdout messages captured in log |
-This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run multiple encodes at the same time. This software makes it very simple to fix a specific number of encodes specified by the ```--jobs``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work to save time. If not run in parallel a computer will usually process jobs serially, one after another. As well as parallelisation, FFmpeg usinges multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (64 slice minimum in RAWcooked frame) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so for our server that works as 64 separate tasks per thread, one slice per frame of the FFV1 file.
+This command is generally launched from within a Bash script, and is passed to [GNU Parallel](https://www.gnu.org/software/parallel/) to run concurrent encodings. This software makes it very simple to fix a specific number of encodes using the ```--jobs``` flag. Parallelisation is the act of processing jobs in parallel, dividing up the work across threads to maximise efficiency. If not run in parallel a computer will usually process jobs serially, one after another. As well as parallelisation, FFmpeg uses multi-threading to create the FFV1 file. The FFV1 codec has slices through each frame (usually between 64 and 576 slices) which allows for granular checksum verification, but also allows FFmpeg multi-threading. Each slice block is split into different processing tasks and run across your CPU threads, so for our server that works as 64 separate tasks per thread, one slice per frame of the FFV1 file.
By listing all the image sequence paths in one text file you can launch a parallel command like this to run 5 parallel encodes:
```
cat ${sequence_list.txt} | parallel --jobs 5 "rawcooked -y --all --no-accept-gaps -s 5281680 {} -o {}.mkv >> {}.mkv.txt 2>&1"
```
-We always capture our console logs for every encode. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an encode. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and valuable metadata of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We would definitely encourage all RAWcooked users to capture and retain this information as part of their long-term preservation of their RAWcooked sequences.
+We always capture our console logs for every encode job. The ```2>&1``` ensures any error messages are output alongside the usual standard console messages for the software. These are essential for us to review if a problem is found with an FFV1 Matroska. Over time they also provide a very clear record of changes encountered in FFmpeg and RAWcooked software, and valuable metadata of our own image sequence files. These logs have been critical in identifying unanticipated edge cases with some DPX encodings, allowing for impact assessment by Media Area. We would definitely encourage all RAWcooked users to capture and retain this information as part of the long-term preservation of RAWcooked encoded sequences.
### Encoding log assessment
@@ -156,7 +156,7 @@ Reversibility was checked, no issue detected.
```
-If an encoding has completed then in this last section you might see different types of human readable message including:
+If an encoding has completed then in this last section you might see different types of messages that include:
* Warnings about the image sequence files
* Errors experienced during encoding
* Information about the RAWcooked encode (shown above)
@@ -183,13 +183,13 @@ Error: unsupported DPX (non conforming) alternate end of line non padding
Please contact info@mediaarea.net if you want support of such content.
```
-The automation scripts used at the BFI National Archive look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encode attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
+The automation scripts used at the BFI National Archive look for any messages that have 'Error' in them. If any are found the FFV1 Matroska is deleted and the sequence is queued for a repeated encoding attempt. Likewise, if the completion statement suggests a failure then the FFV1 is deleted and the sequence is queued for a repeat encode. A successful completion statement should always read:
```Reversibility was checked, no issues detected.```
There is one error message that triggers a specific type of re-encode:
```Error: the reversibility file is becoming big | Error: undecodable file is becoming too big```
-For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding has completed. FFmpeg has an upper size limit of 1GB for attachments. If there is lots of additional data stored in your DPX file headers then this flag will ensure that your FFV1 Matroska completes fine and the data remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V 21.09.
+For this error we know that we need to re-encode our image sequence with the additional flag ```--output-version 2``` which writes the large reversibility data to the FFV1 Matroska once encoding has completed. FFmpeg has an upper size limit of 1GB for attachments. If there is additional data stored in your DPX file headers (not zero padding) then this flag will ensure that this data is stored safely into the reversibility data and that the FFV1 Matroska remains verifiably reversible. FFV1 Matroskas that are encoded using the ```--output-version 2``` flag are not backward compatible with RAWcooked version before V21.09.
### FFV1 Matroska validation
@@ -204,24 +204,24 @@ Again the stderr and stdout messages are captured to a log, and this log is chec
### FFV1 Matroska decode to image sequence
-We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preseration colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which can select decode when an FFV1 Matroska is supplied.
+We have automation scripts that return an FFV1 Matroska back to the original image sequence. These are essential for our film preservation colleagues who may need to perform grading or enhancement work on preserved films. For this we use the ```--all``` command again which automatically selects decode when an FFV1 Matroska is supplied.
This simple script runs this command:
```
rawcooked -y --all path/sequence_name.dpx -o path/decode_sequence >> path/sequence_name.txt 2>&1
```
-It decodes the FFV1 Matroska back to image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
+It decodes the FFV1 Matroska back to it's original form as a DPX image sequence, checks the logs for ```Reversibility was checked, no issue detected``` and reports the outcome to a script log.
---
## Conclusion
We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film.
-Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg while encoding a specific DPX scan. There can be many differences found in DPX metadata depending on the scanning technology. Where errors are found by our automations these are reported to an error log named after the image seqeuence.
+Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated in error logs or when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg while encoding a specific DPX scan. There can be many differences found in DPX metadata depending on the scanning technology used. Where errors are found by our automations these are reported to an error log named after the image seqeuence.
-When we solely encoded 2K sequences we found we could run multiple parallel processes with good efficiency, seeing as many as 32 concurrent encodings running at once. This was before we implemented the ```--all``` command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. When introducing this command we reduced our concurrency, particularly as our workflow introduced a final ```--check``` pass against the Matroska file that automated the deletion of the DPX sequence, when successful. We generally set 6 to 8 concurrent encodings on our busier QNAP storage, and 2 concurrent encodings on other storage.
+Our 2K workflows could run multiple parallel processes with good efficiency, seeing as many as 32 concurrent encodings running at once against a single storage device. This was before we implemented the ```--all``` command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. When introducing this command we reduced our concurrency, particularly as our workflow introduced a final ```--check``` pass against the Matroska file that automated the deletion of the DPX sequence, when successful. We also expanded our storage devices for RAWcooking and currently have 8 storage devices (a mix of Isilon, QNAPs and G-Rack NAS) generally set for between 2 and 8 concurrent encodings with the aim of not exceeding 32.
-In recent years we have seen a shift from majority 2K DPX to majority 4K DPX with mostly 12- or 16-bit depths. Very recently we have found ```--all``` encoding and parallel ```--check``` efficiency increases when running just two parallel encodings at any given time. Below are some recent 4K DPX encoding times using RAWcooked's ```--all``` command with a maximum of two parallel encodings, and where we can assume another single ```--check``` run is underway from the server:
+In recent years we have seen a shift from majority 2K DPX to majority 4K DPX with mostly 12- or 16-bit depths. To maintain speed of specific DPX throughout it is better to limit our parallel encodings to two DPX per storage at any given time. Below are some recent 4K DPX encoding times using RAWcooked's ```--all``` command with a maximum of just two parallel encodings per server targeting a single QNAP storage, and where we can assume a single ```--check``` run is underway also:
* Parallel 4K RGB 16-bit DPX (699.4 GB) - MKV duration 5:10 (639.8 GB) - encoding time 5:17:00 - MKV 8.5% smaller than DPX
* Parallel 4K RGB 16-bit DPX (723.1 GB) - MKV duration 5:20 (648.9 GB) - encoding time 5:40:07 - MKV 10.25% smaller than DPX
@@ -232,24 +232,24 @@ In recent years we have seen a shift from majority 2K DPX to majority 4K DPX wit
* Parallel 4K RGB 12-bit DPX (887.3 GB) - MKV duration 10:54 (208.7 GB) - encoding time 5:02:00 - MKV 76.5% smaller than DPX **
** Where the MKV is significantly smaller than the DPX then a black and while filter will have been applied to an RGB scan, as in these cases.
-A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the ```--all``` command and where we can assume another single ```--check``` run is also working in parallel:
+A separate 2K solo and parallel encoding test revealed much quicker encoding times for >10 minute sequences, again using the ```--all``` command against a single QNAP storage, and where we can assume another single ```--check``` run is also working in parallel:
* Solo 2K RGB 12-bit DPX (341 GB) - MKV duration 16:16 - encoding time 1:20:00 - MKV 22.5% smaller than DPX
* Solo 2K RGB 16-bit DPX (126 GB) - MKV duration 11:42 - encoding time 1:02:00 - MKV was 30.6% smaller than the DPX
* Parallel 2K RGB 16-bit DPX (367 GB) - MKV duration 11:34 - encoding time 2:40:00 - MKV was 27.6% smaller than the DPX
* Parallel 2K RGB 16-bit DPX (325 GB) - MKV duration 10:15 - encoding time 2:21:00 - MKV was 24.4% smaller than the DPX
-It provides us with great reassurance to implement the ```--all``` command and we remain highly satisfied with RAWcooked encoding of DPX sequences despite the reduction in our concurrent encodings. The embedded DPX hashes which ```all``` includes are critical for long-term preservation of the digitised film. In addition there are checksums embedded in the slices of every video frame (upto 576 per frame so 576 checksums per video frame) allowing granular analysis of any problems found with digital FFV1 preservation files, should they arise. This is thanks to the FFV1 codec, and it allows us to pinpoint exactly where digital damage may have ocurred. This means we can easily replace the impacted DPX files with duplicates from our duplicate preservation copies. Open-source RAWcooked, FFV1 and Matroska allow open access to their source code which means reduced likelihood of obsolescence long into the future. Finally, we plan to begin testing RAWcooked encoding of TIFF image sequences with the intention of encoding DCDM image sequences to FFV1 also.
+It provides us with great reassurance to implement the ```--all``` command and we remain highly satisfied with RAWcooked encoding of DPX sequences despite the reduction in our concurrent encodings. The embedded DPX hashes which ```--all``` includes are critical for long-term preservation of the digitised film. In addition there are checksums embedded in the slices of every video frame (up to 576 checksums *per* video frame) allowing granular analysis of any problems found with digital FFV1 preservation files, should they arise. This is thanks to the FFV1 codec, and it allows us to pinpoint exactly where digital damage may have ocurred. This means we can easily replace the impacted DPX files using our duplicate preservation copies. Open-source RAWcooked, FFV1 and Matroska allow open access to their source code which means reduced likelihood of obsolescence long into the future. Finally, we plan to begin testing RAWcooked encoding of TIFF image sequences with the intention of encoding DCDM image sequences to FFV1 also.
### Useful test approaches
-When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generated for every and saved into one manifest for the source and one for the decoded version. These manifests are then ```diff``` checked to ensure that every single image file is identical.
+When any system upgrades occur we like to run reversibility test to ensure RAWcooked is still operating as we would expect. This is usually in response to RAWcooked software updates, FFmpeg updates, but also for updates to our operating system. To perform a reversibility test, a cross-section of image sequences are encoded using our usual ```--all``` command, and then decoded again fully. The image sequences of both the original and decoded version then have whole file MD5 checksums generated for every DPX which are written into a manifest for the source DPX and a manifest for the decoded DPX. These manifests are then ```diff``` checked to ensure that every single image file has identical checksums.
-To have confidence in the --check feature, which confirms for us a DPX sequence can be deleted, we ran several --check command tests that included editing test FFV1 Matroska metadata using hex editor software, and altering test DPX files in the same way while partially encoded. The encoding/check features always identified these data breakages correctly which helped build our confidence in the --all and --check flags.
+To have confidence in the ```--check``` feature, which confirms for us a DPX sequence can be deleted, we ran several ```--check``` command tests that included editing test FFV1 Matroska metadata using hex editor software, and altering test DPX files in the same way during the encoding run. The encoding/check features always identified these data breakages correctly which helped build our confidence in the ```--all``` and ```--check``` flags.
When we encounter an error there are a few commands used that make reporting the issue a little easier at the [Media Area RAWcooked GitHub issue tracker](https://github.com/MediaArea/RAWcooked/issues).
```
-rawcooked -d -y -all --accept-gaps
+rawcooked -d -y -all --no-accept-gaps
```
Adding the ```-d``` flag doesn't run the encoding but returns the command that would be sent to FFmpeg. This flag also leaves the reversibility data available as a text file and this is useful for sending to Media Area to help with finding errors.
```
@@ -261,15 +261,15 @@ echo $?
```
This command should be run directly after a failed RAWcooked encode, and it will tell you the exit code returned from that terminated run.
-The results of these three enquiries is always a brilliant way to open an Issue enquiry for Media Area and will help ensure swift diagnose for your problem. It may also be necessary to supply a DPX sequence, and your ```head``` command can be used again to extract the header data.
+The results of these three enquiries is always a great help when opening an Issue enquiry for Media Area aiding diagnosis of your problem. It may also be necessary to supply a DPX image, and your ```head``` command can be used again to extract the header data.
## Additional resources
-* [BFI National Archive DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [Media Area's RAWcooked GitHub page](https://github.com/MediaArea/RAWcooked)
* ['No Time To Wait! 5' presentation by Joanna White about the BFI's evolving RAWcooked use](https://www.youtube.com/watch?v=Mgo_DKHJEfI)
* [BFI National Archive RAWcooked cheat sheet for optimization](https://github.com/bfidatadigipres/dpx_encoding/blob/main/RAWcooked_Cheat_Sheet.pdf)
+* [BFI National Archive DPX Preservation Workflows](https://digitensions.home.blog/2019/11/08/dpx-preservation-workflow/)
* [Further conference presentations about BFI National Archive use of RAWcooked, by Joanna White](https://youtu.be/4cG5RL_CZqg?si=w-iEICSfXqBco5NB)
* [RAWCooking With Gas: A Film Digitization and QC Workflow-in-progress by Genevieve Havemeyer-King](https://youtu.be/-cJxq7Vr3Nk?si=BjPWzsZ7LRKMVZNF)
* [Introduction to FFV1 and Matroska for film scans by Kieran O’Leary](https://kieranjol.wordpress.com/2016/10/07/introduction-to-ffv1-and-matroska-for-film-scans/)
From 224a7ddd784c702fd51fbf83bfe004f2b243a75c Mon Sep 17 00:00:00 2001
From: Stephen
Date: Mon, 15 Apr 2024 17:48:07 +0100
Subject: [PATCH 91/93] Update Case_study.md
Added network contention info to the introduction section
---
Doc/Case_study.md | 2 ++
1 file changed, 2 insertions(+)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index 070fb2bc..ef00a27c 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -2,6 +2,8 @@
**by Joanna White, Knowledge & Collections Developer**
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions, flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K and 4K image sequences. This workflow is built on some of the flags developed in RAWcooked by Media Area and written in a mix of Bash shell and Python3 scripts ([BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding)). In addition to RAWcooked we use other Media Area tools to complete necessary stages of this workflow. Our encoding processes do not include any alpha channel or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska.
+
+A note on our RAWcooked performance in context of the operational network: This case study covers DPX sequence processing within the operational network where we run dozens of Windows and Mac workstations, and a similar number of Linux servers, addressing 20 nodes of NAS storage - all these devices connected via a mixture of 10Gbps, 25Gbps and 40Gbps. Data flows to the network storage from automated off-air television recording (over 20 channels), born-digital acquisition of cinema and streaming platform content, videotape and film digitisation workflows, as well as media restored from our data tape libraries. And data flows from the network storage to the data tape libraries as we ingest at high volume. We are in the process of upgrading the network to use only 100Gbps switches with higher-than-10Gbps cards on all critical devices; but meanwhile the throughput we achieve is constrained by the very heavy concurrent use of the network for many high-bitrate audiovisual data workflows; and this network contention impacts on the speed of our RAWcooked workflows.
This case study is broken into the following sections:
* [Server configurations](#server_config)
From d8deb5ed86c16d768f96716f00a777c756f86e9f Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 6 Dec 2024 10:20:17 +0000
Subject: [PATCH 92/93] Update Case_study.md
Update 4K results following NAS upgrades
---
Doc/Case_study.md | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index ef00a27c..f487a1c0 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -39,7 +39,7 @@ Our previous server configuration:
- 8 threads
- Ubuntu 18.04 LTS
-When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps), but 4K scans are generally 1 fps or less. These figures can be impacted by the quantity of parallel processes running at any one time.
+When encoding 2K RGB we generally reach between 3 and 10 frames per second (fps), and 4K scans can be up to 5.5 fps. These figures can be impacted by the quantity of parallel processes running at any one time.
---
## Workflow
@@ -137,6 +137,14 @@ Stream mapping:
Press [q] to stop, [?] for help
Output #0, matroska, to '../encoded/mkv_cooked/N_9623089_01of04.mkv':
```
+* The encoding outputs which give frame fps, size processed, timecode locations in FFV1, bitrate and speed data
+```
+frame= 0 fps=0.0 q=-0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed=N/A
+frame= 1 fps=0.7 q=-0.0 size= 4864kB time=00:00:00.04 bitrate=948711.6kbits/s speed=0.028x
+frame= 3 fps=1.3 q=-0.0 size= 52736kB time=00:00:00.12 bitrate=3456106.5kbits/s speed=0.0531x
+frame= 5 fps=1.6 q=-0.0 size= 153344kB time=00:00:00.20 bitrate=6039394.5kbits/s speed=0.0665x
+frame= 7 fps=1.8 q=-0.0 size= 254464kB time=00:00:00.29 bitrate=7138935.2kbits/s speed=0.0749x
+```
* The post-encoding RAWcooked assessment of the FFV1 Matroska
```
...
@@ -149,12 +157,11 @@ Time=00:23:25 (99.98%), 1.2 MiB/s, 0.04x realtime
Time=00:23:26 (99.99%), 1.6 MiB/s, 0.04x realtime
3.3 MiB/s, 0.02x realtime
```
-* Text review of the success/failures of the encoded sequence
+* Text review of the success/failures of the encoded sequence provided by the -info flag
```
-Info: Reversibility data created by RAWcooked 23.12.
+Info: Reversibility data created by RAWcooked 24.11.
Info: Uncompressed file hashes (used by reversibility check) present.
-
-Reversibility was checked, no issue detected.
+Info: Reversibility was checked, no issue detected.
```
From 0e2d28255de4abecb85944139d6dade6ea1ccae6 Mon Sep 17 00:00:00 2001
From: Joanna White <37188631+digitensions@users.noreply.github.com>
Date: Fri, 6 Dec 2024 15:48:57 +0000
Subject: [PATCH 93/93] Typos.
---
Doc/Case_study.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Doc/Case_study.md b/Doc/Case_study.md
index f487a1c0..2c4a5704 100644
--- a/Doc/Case_study.md
+++ b/Doc/Case_study.md
@@ -1,5 +1,5 @@
# BFI National Archive RAWcooked Case Study
-**by Joanna White, Knowledge & Collections Developer**
+**by Joanna White, Knowledge Learning & Collections Developer**
At the [BFI National Archive](https://www.bfi.org.uk/bfi-national-archive) we have been encoding DPX sequences to FFV1 Matroska since late 2019. In that time our RAWcooked workflow has evolved with the development of RAWcooked, DPX resolutions, flavours and changes in our encoding project priorities. Today we have a fairly hands-off automated workflow which handles 2K and 4K image sequences. This workflow is built on some of the flags developed in RAWcooked by Media Area and written in a mix of Bash shell and Python3 scripts ([BFI Data & Digital Preservation GitHub](https://github.com/bfidatadigipres/dpx_encoding)). In addition to RAWcooked we use other Media Area tools to complete necessary stages of this workflow. Our encoding processes do not include any alpha channel or audio file processing, but RAWcooked is capable of encoding both into the completed FFV1 Matroska.
@@ -137,7 +137,7 @@ Stream mapping:
Press [q] to stop, [?] for help
Output #0, matroska, to '../encoded/mkv_cooked/N_9623089_01of04.mkv':
```
-* The encoding outputs which give frame fps, size processed, timecode locations in FFV1, bitrate and speed data
+* The encoding outputs which give current frame, frames per second, size of data processed, timecode for FFV1, bitrate and speed data
```
frame= 0 fps=0.0 q=-0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed=N/A
frame= 1 fps=0.7 q=-0.0 size= 4864kB time=00:00:00.04 bitrate=948711.6kbits/s speed=0.028x