Search Results

WebNews

Please enter a web search for web results.

NewsWeb

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > model_executor > kernels > linear > mxfp8 > Mxfp8 Linear Kernel

Mxfp8 Linear Kernel

2+ week, 4+ day ago (46+ words) v LLM Docs Base class for MXFP8 quantized linear kernels. Each subclass implements a specific GEMM backend (Flash Infer CUTLASS, Marlin, emulation). Configuration for an MXFP8 linear layer. All MXFP8 layers share the same structure: FP8-E4 M3 weights with uint8 (E8 M0) per-block scales at block size 32....

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > model_executor > kernels > linear > mxfp8 > emulation

emulation

2+ week, 4+ day ago (10+ words) v LLM Docs Software emulation fallback for MXFP8 (dequant to BF16)....

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > model_executor > kernels > linear > mxfp8

mxfp8 - v LLM

2+ week, 4+ day ago (28+ words) mxfp8v LLM Docs Configuration for an MXFP8 linear layer. All MXFP8 layers share the same structure: FP8-E4 M3 weights with uint8 (E8 M0) per-block scales at block size 32....

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > transformers_utils > processors > fireredlid

fireredlid

2+ week, 4+ day ago (73+ words) v LLM Docs Fire Red LID feature extractor and processor. - Raw waveform " 80-dim log-mel filterbank (via kaldi_native_fbank) The Processor wraps the Feature Extractor and a tokenizer. Extracts 80-dim log-mel filterbank features from raw waveforms, applies CMVN, and returns padded feature tensors…...

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > model_executor > kernels > linear > mixed_precision > triton_w4a16

triton_w4a16

2+ week, 4+ day ago (169+ words) Triton-based W4 A16 GEMM kernel for ROCm (MI300 and newer). Supports GPTQ-format int4 weights (uint4b8 symmetric, uint4 asymmetric) with grouped quantization. Weight tensors are transposed from the compressed-tensors checkpoint layout to the kernel's [K, N//8] layout. Fused W4 A16 GEMM using GPTQ-packed int4 weights. Activation matrix [M, K], float16 or…...

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > model_executor > kernels > linear > nvfp4 > base

base - v LLM

2+ week, 5+ day ago (133+ words) Base class for NVFP4 quantized linear kernels. Each subclass implements a specific GEMM backend (CUTLASS, Marlin, etc). The kernel selection mechanism iterates over registered subclasses in priority order, calling is_supported and can_implement to find the best match for the current hardware. Run the…...

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > entrypoints > pooling > pooling > io_processor

io_processor

2+ week, 5+ day ago (23+ words) v LLM Docs IO Processor plugins are a feature that allows pre- and post-processing of the model input and output for pooling models....

v LLM Docs
docs. vllm. ai > en > latest > api > vllm > model_executor > kernels > linear > nvfp4 > flashinfer

flashinfer

2+ week, 5+ day ago (21+ words) v LLM Docs NVFP4 GEMM via Flash Infer's cu DNN wrapper. NVFP4 GEMM via Flash Infer's CUTLASS wrapper. NVFP4 GEMM via Flash Infer's Tensor RT-LLM wrapper....

v LLM Docs
docs. vllm. ai > projects > ascend > zh-cn > v0. 18. 0 > faqs. html

FAQs " vllm-ascend

3+ week, 1+ day ago (1048+ words) [v0. 17. 0rc1] FAQ & Feedback [v0. 13. 0] FAQ & Feedback Atlas A2 "Atlas 800 T A2 Atlas 900 A2 Po DAtlas 200 T A2 Box16 Atlas 300 T A2" Atlas 800 I A2 "Atlas 800 I A2" Atlas A3 Training series (Atlas 800 T A3, Atlas 900 A3 Super Po D, Atlas 9000 A3 Super Po D) Atlas 800 I A3 Inference series (Atlas 800 I A3) [Experimental] Atlas 300 I Inference series…...

v LLM Docs
docs. vllm. ai > projects > ascend > zh-cn > v0. 18. 0 > developer_guide > feature_guide > add_custom_aclnn_op. html

Adding a custom aclnn operation

3+ week, 1+ day ago (137+ words) This document describes how to add a custom aclnn operation to vllm-ascend. Custom aclnn operations are built and installed into vllm_ascend/cann_ops_custom directory during the build process of vllm-ascend. Then the aclnn operators are bound to torch. ops. _C_ascend module, enabling users to…...