update README for DS-FastGen

microsoft · Nov 7, 2023 · e85f98a · e85f98a
1 parent 98ae963
commit e85f98a
Showing 1 changed file with 8 additions and 16 deletions.
diff --git a/benchmarks/inference/mii/README.md b/benchmarks/inference/mii/README.md
@@ -3,38 +3,30 @@
 ## Run the Benchmark
 
 The benchmarking scripts use DeepSpeed-FastGen in the persistent mode.
-You can launch the server by the following command. 
+You can start the server with the command below:
 
 ```bash
 python server.py [options] start
 ```
 
-`-h` option shows all options.
-You can also stop the server by the following command.
+Use the -h option to view all available options. To stop the server, use this command:
 
 ```bash
 python server.py stop
 ```
 
-After you launch the server, you can run the client by the following command.
-`-h` option shows all options.
+Once the server is up and running, initiate the client using the command below. The -h option will display all the possible options.
 
 ```bash
 python run_benchmark_client.py [options]
 ```
 
-`run_all.sh` sweeps different model sizes and number of clients.
-`run_all_vllm.sh` runs the same benchmark for VLLM.
-These script saves the log in the directory named `logs.[BENCHMARK_PARAMETERS]`.
-
+The run_all.sh script performs benchmarks across various model sizes and client numbers. For VLLM benchmarks, use the run_all_vllm.sh script. Results are logged in a directory named logs.[BENCHMARK_PARAMETERS].
 
 ## Analyze the Benchmark Results
 
-We used these scripts to plot the results in our blog.
-Set the root directory of log directories to `--log_dir`.
-
-- `plot_th_lat.py`: Plot throughput and latency for different model sizes and number of clients
-- `plot_effective_throughput.py`: Plot effective throughput
-- `plot_latency_percentile.py`: Plot P50/P90/P95 latency
-
+The scripts mentioned below were used for generating the plots featured in our blog. Specify the root directory for log files using --log_dir.
 
+- `plot_th_lat.py`: This script generates charts for throughput and latency across different model sizes and client counts.
+- `plot_effective_throughput.py`: Use this to chart effective throughput.
+- `plot_latency_percentile.py`: This script will plot the 50th, 90th, and 95th percentile latencies.