Skip to content
代码片段 群组 项目
未验证 提交 aeaafc0d 编辑于 作者: Eduardo Bonet's avatar Eduardo Bonet 提交者: GitLab
浏览文件

Add TPS values for mistral models

上级 223f03a2
No related branches found
No related tags found
无相关合并请求
...@@ -31,18 +31,12 @@ For more information on: ...@@ -31,18 +31,12 @@ For more information on:
- vLLM supported models, see the [vLLM Supported Models documentation](https://docs.vllm.ai/en/latest/models/supported_models.html). - vLLM supported models, see the [vLLM Supported Models documentation](https://docs.vllm.ai/en/latest/models/supported_models.html).
- Available options when using vLLM to run a model, see the [vLLM documentation on engine arguments](https://docs.vllm.ai/en/stable/usage/engine_args.html). - Available options when using vLLM to run a model, see the [vLLM documentation on engine arguments](https://docs.vllm.ai/en/stable/usage/engine_args.html).
- The hardware needed for the models, see the [Supported models and Hardware requirements documentation](supported_llm_serving_platforms.md).
Examples: Examples:
#### Mistral-7B-Instruct-v0.2 #### Mistral-7B-Instruct-v0.2
Mistral-7B-Instruct-v0.3 requires at least:
- 55GB of disk memory for storage
- 35 GB of GPU vRAM for serving.
With a `a2-highgpu-2g` machine on GCP or equivalent (2x Nvidia A100 40GB - 150 GB vRAM), the model is expected to infer requests at the rate of 250 tokens per second.
1. Download the model from HuggingFace: 1. Download the model from HuggingFace:
```shell ```shell
...@@ -63,13 +57,6 @@ With a `a2-highgpu-2g` machine on GCP or equivalent (2x Nvidia A100 40GB - 150 G ...@@ -63,13 +57,6 @@ With a `a2-highgpu-2g` machine on GCP or equivalent (2x Nvidia A100 40GB - 150 G
#### Mixtral-8x7B-Instruct-v0.1 #### Mixtral-8x7B-Instruct-v0.1
Mistral-7B-Instruct-v0.3 requires at least:
- 355 GB of disk memory for storage
- 210 GB of GPU vRAM for serving.
You should at least a `a2-highgpu-4g` machine on GCP or equivalent (4x Nvidia A100 40GB - 340 GB vRAM). With this configuration, the model is expected to infer requests at the rate of 25 tokens per second.
1. Download the model from HuggingFace: 1. Download the model from HuggingFace:
```shell ```shell
......
...@@ -73,12 +73,55 @@ The following hardware specifications are the minimum requirements for running s ...@@ -73,12 +73,55 @@ The following hardware specifications are the minimum requirements for running s
### GPU requirements by model size ### GPU requirements by model size
| Model size | Minimum GPU configuration | Minimum VRAM required | | Model size | Minimum GPU configuration | Minimum VRAM required |
|------------|------------------------------|---------------------| |--------------------------------------------|---------------------------|-----------------------|
| 7B models<br>(for example, Mistral 7B) | 1x NVIDIA A100 (40GB) | 24 GB | | 7B models<br>(for example, Mistral 7B) | 1x NVIDIA A100 (40GB) | 35 GB |
| 22B models<br>(for example, Codestral 22B) | 2x NVIDIA A100 (80GB) | 90 GB | | 22B models<br>(for example, Codestral 22B) | 2x NVIDIA A100 (80GB) | 110 GB |
| Mixtral 8x7B | 2x NVIDIA A100 (80GB) | 100 GB | | Mixtral 8x7B | 2x NVIDIA A100 (80GB) | 220 GB |
| Mixtral 8x22B | 8x NVIDIA A100 (80GB) | 300 GB | | Mixtral 8x22B | 8x NVIDIA A100 (80GB) | 526 GB |
Use [Hugging Face's memory utility](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) to verify memory requirements.
### Response time by model size and GPU
#### Small machine
With a `a2-highgpu-2g` (2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:
| Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
|--------------------------|--------------------|------------------------------|----------------------------|---------------------------------------|-------------------------|-----------|
| Mistral-7B-Instruct-v0.3 | 1 | 7.09 | 717.0 | 101.19 | 7.09 | 101.17 |
| Mistral-7B-Instruct-v0.3 | 10 | 8.41 | 764.2 | 90.35 | 13.70 | 557.80 |
| Mistral-7B-Instruct-v0.3 | 100 | 13.97 | 693.23 | 49.17 | 20.81 | 3331.59 |
#### Medium machine
With a `a2-ultragpu-4g` (4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:
| Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests | Total TPS |
|----------------------------|--------------------|------------------------------|----------------------------|---------------------------------------|-------------------------|-----------|
| Mistral-7B-Instruct-v0.3 | 1 | 3.80 | 499.0 | 131.25 | 3.80 | 131.23 |
| Mistral-7B-Instruct-v0.3 | 10 | 6.00 | 740.6 | 122.85 | 8.19 | 904.22 |
| Mistral-7B-Instruct-v0.3 | 100 | 11.71 | 695.71 | 59.06 | 15.54 | 4477.34 |
| Mixtral-8x7B-Instruct-v0.1 | 1 | 6.50 | 400.0 | 61.55 | 6.50 | 61.53 |
| Mixtral-8x7B-Instruct-v0.1 | 10 | 16.58 | 768.9 | 40.33 | 32.56 | 236.13 |
| Mixtral-8x7B-Instruct-v0.1 | 100 | 25.90 | 767.38 | 26.87 | 55.57 | 1380.68 |
#### Large machine
With a `a2-ultragpu-8g` (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:
| Model name | Number of requests | Average time per request (sec) | Average tokens in response | Average tokens per second per request | Total time for requests (sec) | Total TPS |
|-----------------------------|--------------------|------------------------------|----------------------------|---------------------------------------|-----------------------------|-----------|
| Mistral-7B-Instruct-v0.3 | 1 | 3.23 | 479.0 | 148.41 | 3.22 | 148.36 |
| Mistral-7B-Instruct-v0.3 | 10 | 4.95 | 678.3 | 135.98 | 6.85 | 989.11 |
| Mistral-7B-Instruct-v0.3 | 100 | 10.14 | 713.27 | 69.63 | 13.96 | 5108.75 |
| Mixtral-8x7B-Instruct-v0.1 | 1 | 6.08 | 709.0 | 116.69 | 6.07 | 116.64 |
| Mixtral-8x7B-Instruct-v0.1 | 10 | 9.95 | 645.0 | 63.68 | 13.40 | 481.06 |
| Mixtral-8x7B-Instruct-v0.1 | 100 | 13.83 | 585.01 | 41.80 | 20.38 | 2869.12 |
| Mixtral-8x22B-Instruct-v0.1 | 1 | 14.39 | 828.0 | 57.56 | 14.38 | 57.55 |
| Mixtral-8x22B-Instruct-v0.1 | 10 | 20.57 | 629.7 | 30.24 | 28.02 | 224.71 |
| Mixtral-8x22B-Instruct-v0.1 | 100 | 27.58 | 592.49 | 21.34 | 36.80 | 1609.85 |
### AI Gateway Hardware Requirements ### AI Gateway Hardware Requirements
......
0% 加载中 .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册