Skip to content

Commit 78c3ccb

Browse files
Ying1123Michaelvll
andauthored
Add SGLang example for Sky Serve (#3126)
* add sglang * Update llm/sglang/README.md Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * Update llm/sglang/README.md Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * Update llm/sglang/sglang.yaml Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * Update llm/sglang/README.md Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * Update llm/sglang/README.md Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * Update llm/sglang/README.md Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
1 parent 75cf8e9 commit 78c3ccb

File tree

2 files changed

+132
-0
lines changed

2 files changed

+132
-0
lines changed

llm/sglang/README.md

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# SGLang: Fast and Expressive LLM Inference with RadixAttention for 5x throughput
2+
3+
This README contains instructions to run a demo for SGLang, an open-source library for fast and expressive LLM inference and serving with **5x throughput**.
4+
5+
* [Repo](https://github.com/sgl-project/sglang)
6+
* [Blog](https://lmsys.org/blog/2024-01-17-sglang)
7+
8+
## Prerequisites
9+
Install the latest SkyPilot and check your setup of the cloud credentials:
10+
```bash
11+
pip install "skypilot-nightly[all]"
12+
sky check
13+
```
14+
15+
## Serving Llama-2 with SGLang using SkyServe
16+
1. Create a [`SkyServe Service YAML`](https://skypilot.readthedocs.io/en/latest/serving/service-yaml-spec.html) with a `service` section:
17+
18+
```yaml
19+
service:
20+
# Specifying the path to the endpoint to check the readiness of the service.
21+
readiness_probe: /health
22+
# How many replicas to manage.
23+
replicas: 2
24+
```
25+
26+
The entire Service YAML can be found here: [sglang.yaml](sglang.yaml).
27+
28+
2. Start serving by using [SkyServe](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html) CLI:
29+
```bash
30+
sky serve up -n sglang sglang.yaml
31+
```
32+
33+
3. Use `sky serve status` to check the status of the serving:
34+
```bash
35+
sky serve status sglang
36+
```
37+
38+
You should get a similar output as the following:
39+
40+
```console
41+
Services
42+
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
43+
sglang 1 8m 16s READY 2/2 34.32.43.41:30001
44+
45+
Service Replicas
46+
SERVICE_NAME ID VERSION IP LAUNCHED RESOURCES STATUS REGION
47+
sglang 1 1 34.85.154.76 16 mins ago 1x GCP({'L4': 1}) READY us-east4
48+
sglang 2 1 34.145.195.253 16 mins ago 1x GCP({'L4': 1}) READY us-east4
49+
```
50+
51+
4. Check the endpoint of the service:
52+
```bash
53+
ENDPOINT=$(sky serve status --endpoint sglang)
54+
```
55+
56+
4. Once it status is `READY`, you can use the endpoint to interact with the model:
57+
58+
```bash
59+
curl -L $ENDPOINT/v1/chat/completions \
60+
-H "Content-Type: application/json" \
61+
-d '{
62+
"model": "meta-llama/Llama-2-7b-chat-hf",
63+
"messages": [
64+
{
65+
"role": "system",
66+
"content": "You are a helpful assistant."
67+
},
68+
{
69+
"role": "user",
70+
"content": "Who are you?"
71+
}
72+
]
73+
}'
74+
```
75+
76+
You should get a similar response as the following:
77+
78+
```console
79+
{
80+
"id": "cmpl-879a58992d704caf80771b4651ff8cb6",
81+
"object": "chat.completion",
82+
"created": 1692650569,
83+
"model": "meta-llama/Llama-2-7b-chat-hf",
84+
"choices": [{
85+
"index": 0,
86+
"message": {
87+
"role": "assistant",
88+
"content": " Hello! I'm just an AI assistant, here to help you"
89+
},
90+
"finish_reason": "length"
91+
}],
92+
"usage": {
93+
"prompt_tokens": 31,
94+
"total_tokens": 47,
95+
"completion_tokens": 16
96+
}
97+
}
98+
```

llm/sglang/sglang.yaml

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
service:
2+
# Specifying the path to the endpoint to check the readiness of the service.
3+
readiness_probe: /health
4+
# How many replicas to manage.
5+
replicas: 2
6+
7+
envs:
8+
MODEL_NAME: meta-llama/Llama-2-7b-chat-hf
9+
HF_TOKEN: <your-huggingface-token> # Change to your own huggingface token
10+
11+
resources:
12+
accelerators: {L4:1, A10G:1, A10:1, A100:1, A100-80GB:1}
13+
ports:
14+
- 8000
15+
16+
setup: |
17+
conda activate sglang
18+
if [ $? -ne 0 ]; then
19+
conda create -n sglang python=3.10 -y
20+
conda activate sglang
21+
fi
22+
23+
pip list | grep sglang || pip install "sglang[all]"
24+
pip list | grep transformers || pip install transformers==4.37.2
25+
26+
python -c "import huggingface_hub; huggingface_hub.login('${HF_TOKEN}')"
27+
28+
29+
run: |
30+
conda activate sglang
31+
echo 'Starting sglang openai api server...'
32+
export PATH=$PATH:/sbin/
33+
python -m sglang.launch_server --model-path $MODEL_NAME --host 0.0.0.0 --port 8000
34+

0 commit comments

Comments
 (0)