This repo aims at providing a collection of efficient Triton-based implementations for state-of-the-art linear attention models. All implementations are written purely in PyTorch and Triton, making ...
Abstract: Approximate computing along with quantized low-precision computing has gained significant interest in today’s neural network (NN) implementation. This paper proposes a library of VLSI ...