Add batch size tuning docs #341

Infernaught · 2024-01-17T16:16:13Z

Add docs for clarity on batch size tuning and the differences between ECD and LLM batch size tuning. DO NOT MERGE YET. Waiting on LLM batch size tuning PR to land.

justinxzhao · 2024-01-17T17:42:44Z

docs/user_guide/batch_size_tuning.md

@@ -0,0 +1,10 @@
+To maximize efficiency, Ludwig performs automatic batch size tuning when the `batch_size` parameter is noet in the configuration in order to best saturate the GPU. Batch size tuning does not occur during CPU training due to the lack of effective parallelization, and Ludwig instead sets the batch size to a fixed value.


Suggestion for a minor rewrite:

"In Ludwig, users have the option to set batch_size to a fixed value as part of the training config.

trainer: batch_size: 128

If the batch size is unspecified Ludwig sets batch_size=auto.

trainer: batch_size: auto

auto enables Ludwig to select an efficient batch size automatically. The actual value of the batch size can be found in training logs and in the model output directory.

Batch size tuning is supported in single-node and multi-node CPU and GPU settings.

ECD Models

Batch size tuning for ECD models follows this procedure, starting from batch size 1:

Perform a small number of forward passes through the model using a sample from the dataset and observe whether the model hits a memory error and the overall throughput speed (examples/sec).

If the model hits a memory error or if throughput decreases, then use the last valid batch size. Otherwise, double the batch size and repeat step 1.

LLMs

The main element that separates LLM batch size tuning from its ECD counterpart is the sequence length. LLM's thus undergo the same batch size tuning process as ECD models with the exception being that, instead of using a random sample from the dataset, the forward passes use a synthetic data sample with a sequence length equal to specified max sequence length (or the longest sequence length in the provided dataset if max sequence length is unspecified)."

Add batch size tuning docs

bdd4c9c

Infernaught requested review from justinxzhao and arnavgarg1 January 17, 2024 16:16

justinxzhao reviewed Jan 17, 2024

View reviewed changes

Update batch size tuning docs

a6dc642

arnavgarg1 requested a review from justinxzhao February 6, 2024 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batch size tuning docs #341

Add batch size tuning docs #341

Infernaught commented Jan 17, 2024

justinxzhao Jan 17, 2024

		@@ -0,0 +1,10 @@
		To maximize efficiency, Ludwig performs automatic batch size tuning when the `batch_size` parameter is noet in the configuration in order to best saturate the GPU. Batch size tuning does not occur during CPU training due to the lack of effective parallelization, and Ludwig instead sets the batch size to a fixed value.

Add batch size tuning docs #341

Are you sure you want to change the base?

Add batch size tuning docs #341

Conversation

Infernaught commented Jan 17, 2024

justinxzhao Jan 17, 2024

Choose a reason for hiding this comment

ECD Models

LLMs