Skip to content

Commit

Permalink
update README for DS-FastGen
Browse files Browse the repository at this point in the history
  • Loading branch information
tohtana committed Nov 7, 2023
1 parent 98ae963 commit e85f98a
Showing 1 changed file with 8 additions and 16 deletions.
24 changes: 8 additions & 16 deletions benchmarks/inference/mii/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,38 +3,30 @@
## Run the Benchmark

The benchmarking scripts use DeepSpeed-FastGen in the persistent mode.
You can launch the server by the following command.
You can start the server with the command below:

```bash
python server.py [options] start
```

`-h` option shows all options.
You can also stop the server by the following command.
Use the -h option to view all available options. To stop the server, use this command:

```bash
python server.py stop
```

After you launch the server, you can run the client by the following command.
`-h` option shows all options.
Once the server is up and running, initiate the client using the command below. The -h option will display all the possible options.

```bash
python run_benchmark_client.py [options]
```

`run_all.sh` sweeps different model sizes and number of clients.
`run_all_vllm.sh` runs the same benchmark for VLLM.
These script saves the log in the directory named `logs.[BENCHMARK_PARAMETERS]`.

The run_all.sh script performs benchmarks across various model sizes and client numbers. For VLLM benchmarks, use the run_all_vllm.sh script. Results are logged in a directory named logs.[BENCHMARK_PARAMETERS].

## Analyze the Benchmark Results

We used these scripts to plot the results in our blog.
Set the root directory of log directories to `--log_dir`.

- `plot_th_lat.py`: Plot throughput and latency for different model sizes and number of clients
- `plot_effective_throughput.py`: Plot effective throughput
- `plot_latency_percentile.py`: Plot P50/P90/P95 latency

The scripts mentioned below were used for generating the plots featured in our blog. Specify the root directory for log files using --log_dir.

- `plot_th_lat.py`: This script generates charts for throughput and latency across different model sizes and client counts.
- `plot_effective_throughput.py`: Use this to chart effective throughput.
- `plot_latency_percentile.py`: This script will plot the 50th, 90th, and 95th percentile latencies.

0 comments on commit e85f98a

Please sign in to comment.