Download the correct version of the app for your platform from the Releases page. Currently only tested on Intel Macs and Windows 10. Builds for other platforms coming soon.
flash-attention-with-sink implements an attention variant used in GPT-OSS 20B that integrates a "sink" step into FlashAttention. This repo focuses on the forward path and provides an experimental ...
一些您可能无法访问的结果已被隐去。
显示无法访问的结果