SparVNM: Efficient vector-wise N:M sparsity implementation for GPGPU
Author(s)
Cong Ma | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Xiaowen Huang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Xu Zhang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Jintao Meng | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Abstract
In image processing, where DNNs are widely used for tasks such as image classification, object detection, and semantic segmentation, large models face significant computational overhead, limiting their deployment on resource-constrained devices. Weight pruning in Deep Neural Networks (DNNs) has been extensively studied to reduce model size and accelerate inference. Pruning techniques are classified into structured, unstructured, and the more recent semi-structured N:M sparsity. Consequently, reducing model size and computational requirements without sacrificing accuracy is crucial. Unstructured pruning introduces irregular computation and memory access patterns, making practical GPU acceleration challenging, while structured pruning often struggles to balance performance and speedup on these platforms. N:M sparsity has emerged as a promising alternative, enforcing a constraint to retain N out of every M units, though Sparse Tensor Core support is currently limited to 2:4 sparsity. To address this, nmSPARSE, a GPU library of SpMM kernels, was developed to support general N:M sparsity. We further propose SparVNM, an efficient vector-wise N:M sparsity implementation using multi-layer tiling and double buffering to optimize memory access. Experiments show SparVNM achieves a 2.1x speedup over nmSPARSE and a 1.4x to 6.3x speedup over dense cuBLAS, approaching ideal speedup from sparsity.
SparVNM: Efficient vector-wise N:M sparsity implementation for GPGPU
Description
Date and Location: 2/3/2025 | 03:50 PM - 04:10 PM | Regency APrimary Session Chair:
Yuankai Huo | Vanderbilt University
Session Co-Chair:
Paper Number: HPCI-175
Back to Session Gallery