Add SGLang inference commands to README (#7)

- Add SGLang inference commands to README (bc8d397781e5ac79e2521ecf4bb1aac099a6db43)

Co-authored-by: Netanel Haber <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -89,6 +89,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 **Runtime Engine(s):**
 * [vLLM] <br>
 * [TRT-LLM] <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA L40S <br>*
@@ -346,6 +347,29 @@ vllm serve nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 --trust-remote-code --quant
 vllm serve nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD --trust-remote-code --quantization modelopt_fp4 --video-pruning-rate 0
 ```
 ## Training, Testing, and Evaluation Datasets:
 ### Training Datasets:
@@ -498,6 +522,7 @@ Evaluation benchmarks scores: <br>
 # Inference: <br>
 **Acceleration Engine:** vLLM <br>
 **Acceleration Engine:** TRT-LLM <br>
 **Test Hardware:** <br>
 * NVIDIA L40S <br>

 **Runtime Engine(s):**
 * [vLLM] <br>
 * [TRT-LLM] <br>
+* [SGLang] <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA L40S <br>*
 vllm serve nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD --trust-remote-code --quantization modelopt_fp4 --video-pruning-rate 0
 ```
+#### Inference with SGLang
+Support is verified in **main**:
+```bash
+pip install "git+https://github.com/sgl-project/sglang.git@main#subdirectory=python"
+```
+**BF16**
+```bash
+sglang serve --trust-remote-code --model-path nvidia/Nemotron-Nano-12B-v2-VL-BF16 --max-mamba-cache-size 256 # Adjust '--max-mamba-cache-size' as needed, to fit in memory
+```
+**FP8**
+```bash
+sglang serve --trust-remote-code --model-path nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-FP8
+```
+**FP4**
+```bash
+sglang serve --trust-remote-code --model-path nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD --quantization modelopt_fp4
+```
 ## Training, Testing, and Evaluation Datasets:
 ### Training Datasets:
 # Inference: <br>
 **Acceleration Engine:** vLLM <br>
 **Acceleration Engine:** TRT-LLM <br>
+**Acceleration Engine:** SGLang <br>
 **Test Hardware:** <br>
 * NVIDIA L40S <br>