Palisades Convention Management Inc

Times are displayed in (UTC-07:00) Pacific Time (US & Canada) Change

2/3/2025 | 3:30 PM - 5:30 PM | Regency A

SparVNM: Efficient vector-wise N:M sparsity implementation for GPGPU

Author(s)

Cong Ma | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Xiaowen Huang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Xu Zhang | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Jintao Meng | Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences

Abstract

In image processing, where DNNs are widely used for tasks such as image classification, object detection, and semantic segmentation, large models face significant computational overhead, limiting their deployment on resource-constrained devices. Weight pruning in Deep Neural Networks (DNNs) has been extensively studied to reduce model size and accelerate inference. Pruning techniques are classified into structured, unstructured, and the more recent semi-structured N:M sparsity. Consequently, reducing model size and computational requirements without sacrificing accuracy is crucial. Unstructured pruning introduces irregular computation and memory access patterns, making practical GPU acceleration challenging, while structured pruning often struggles to balance performance and speedup on these platforms. N:M sparsity has emerged as a promising alternative, enforcing a constraint to retain N out of every M units, though Sparse Tensor Core support is currently limited to 2:4 sparsity. To address this, nmSPARSE, a GPU library of SpMM kernels, was developed to support general N:M sparsity. We further propose SparVNM, an efficient vector-wise N:M sparsity implementation using multi-layer tiling and double buffering to optimize memory access. Experiments show SparVNM achieves a 2.1x speedup over nmSPARSE and a 1.4x to 6.3x speedup over dense cuBLAS, approaching ideal speedup from sparsity.

SparVNM: Efficient vector-wise N:M sparsity implementation for GPGPU

Description

Date and Location: 2/3/2025 | 03:50 PM - 04:10 PM | Regency A

Primary Session Chair:
Yuankai Huo | Vanderbilt University

Session Co-Chair:

Paper Number: HPCI-175

Back to Session Gallery