Support transformer models #16

dacorvo · 2023-10-17T09:18:49Z

This adds support for Transformer models:

quantize LayerNorm,
add QTensor dispatches for typical transformer operations (split/merge per-head),
add text-classification example (SST2).

Instead of raising an error, we dequantize before adding QTensor of different scales. This is required to support attention blocks that add activation tensors from two different branches. In attention blocks, the float output of the addition will in general be immediately requantized by the next QLinear.

This adds two default unary dispatches: - one for type-agnostic operations, where we can simply apply the operation on the data withoput changing the scale, - one for unuspported operations, where we dequantize to apply the operation on float values.

dacorvo added 13 commits October 12, 2023 18:36

feat(nn): add QLayerNorm

e3ac77c

feat(quantize): also replace LayerNorm

f4bada5

fix(qtensor): add missing transpose args

6e08d35

feat(qtensor): support float division

113faf7

feat(qtensor): add numpy() method

3c09cd5

feat(examples): add BERT model example

8afd149

feat(calibrate): also calibrate QLayerNorm

c0006c8

refactor(examples): create subfolders by tasks

8de2d45

chore: update setup.sh

f957ae0

doc: update README

5ae72a1

fix: avoid using evaluate

bb553fc

dacorvo merged commit d76820c into main Oct 17, 2023

dacorvo deleted the support_transformer_models branch October 17, 2023 10:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support transformer models #16

Support transformer models #16

dacorvo commented Oct 17, 2023

Support transformer models #16

Support transformer models #16

Conversation

dacorvo commented Oct 17, 2023