Fixed Sometimes the dtype of the model is incorrect #301

balala8 · 2024-09-01T16:36:31Z

Fixed Sometimes the dtype of the model is incorrect, This bug occurs often when loading weights from a local file.
Fixed the issue that the QuantizedTransformersModel model does not have a __call__ method, which caused an error when using the T5Encoder model in flux

What does this PR do?

Fixes Sometimes the dtype of the model is incorrect,
Fixed the issue that the QuantizedTransformersModel model does not have a call method, which caused an error when using the T5Encoder model in quantized flux

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you run all tests locally and make sure they pass.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…issues 299

optimum/quanto/models/diffusers_models.py

dacorvo · 2024-09-02T14:07:02Z

optimum/quanto/models/transformers_models.py

@@ -133,6 +137,8 @@ def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike],
        config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
        with init_empty_weights():
            model = cls.auto_class.from_config(config)
+            dtype = kwargs.get("torch_dtype", torch.float16)


same thing here.

dacorvo · 2024-09-04T15:16:03Z

optimum/quanto/models/transformers_models.py

        return self._wrapped.forward(*args, **kwargs)
+
+    def forward(self, *args, **kwargs):
+        return self.model.forward(*args, **kwargs)


This reverts the change I just made to fix a bug ...

Sorry, I will change it.

change default dtype from float16 to float32

change forward method

balala8

change default dtype from float16 to float32

optimum/quanto/models/diffusers_models.py

balala8 · 2024-09-05T01:35:34Z

optimum/quanto/models/transformers_models.py

        return self._wrapped.forward(*args, **kwargs)
+
+    def forward(self, *args, **kwargs):
+        return self.model.forward(*args, **kwargs)


Sorry, I will change it.

dacorvo · 2024-09-13T11:38:21Z

optimum/quanto/models/diffusers_models.py

+            dtype = kwargs.get("torch_dtype", torch.float32)
+            model = model.to(dtype=dtype)


This is incorrect, as it will force model to torch.float32 if torch_dtype is not specified.
I think the correct fix would be:

if "torch_dtype" in kwargs: model = model.to(kwargs["torch_dtype")

I think this parameter must be provided.. I quantized the flux transformer model and the t5encoder model. After saving them locally, when reloading, the bias of a convolutional layer in the transformer model was always in fp32, while the entire t5encoder model was in fp32, which did not reflect the acceleration advantage, the code can be run without error. Forcing the model to be fp16 can avoid this problem.
this is i using code.
quanto and save,

class QuantizedFluxTransformer2DModel(QuantizedDiffusersModel): base_class = FluxTransformer2DModel class QuantizedT5EncoderModelForCausalLM(QuantizedTransformersModel): auto_class = T5EncoderModel auto_class.from_config = auto_class._from_config if __name__ == "__main__": bfl_repo = "black-forest-labs/FLUX.1-dev" dtype = torch.float16 transformer = FluxTransformer2DModel.from_pretrained("black-forest-labs/FLUX.1-dev",subfolder="transformer", torch_dtype=dtype) q_transformer = QuantizedFluxTransformer2DModel.quantize(transformer, weights=qfloat8) q_transformer.save_pretrained("flux_transforemer_fp8_quanto") t5encoder = T5EncoderModel.from_pretrained("black-forest-labs/FLUX.1-dev",subfolder="text_encoder_2", torch_dtype=dtype) t5encoder = QuantizedT5EncoderModelForCausalLM.quantize(t5encoder, weights=qfloat8) t5encoder.save_pretrained("flux_T5Encoder_fp8_quanto")

here is load and inference code

import torch from diffusers import FluxTransformer2DModel, FluxPipeline from transformers import T5EncoderModel, CLIPTextModel from optimum.quanto import freeze, qfloat8, quantize,QuantizedTransformersModel,QuantizedDiffusersModel from huggingface_hub import login bfl_repo = "black-forest-labs/FLUX.1-dev" dtype = torch.float16 class QuantizedFluxTransformer2DModel(QuantizedDiffusersModel): base_class = FluxTransformer2DModel class QuantizedT5EncoderModelForCausalLM(QuantizedTransformersModel): auto_class = T5EncoderModel auto_class.from_config = auto_class._from_config transformer = QuantizedFluxTransformer2DModel.from_pretrained("./flux_transforemer_fp8_quanto", torch_dtype=dtype) transformer.to(device="cuda") print("transformer.dtype:",transformer.dtype) text_encoder_2 = QuantizedT5EncoderModelForCausalLM.from_pretrained("./flux_T5Encoder_fp8_quanto", torch_dtype=dtype) text_encoder_2.to(device="cuda") print("text_encoder_2.dtype:",text_encoder_2.dtype) pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=None, text_encoder_2=None, torch_dtype=dtype) pipe.transformer = transformer pipe.text_encoder_2 = text_encoder_2 pipe = pipe.to(device="cuda") pipe.enable_model_cpu_offload() prompt = "cookie monster, yarn art style" image = pipe( prompt, guidance_scale=3.5, output_type="pil", num_inference_steps=20, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("flux-fp8-dev.png")

github-actions · 2024-10-04T02:02:48Z

This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2024-10-10T02:02:42Z

This PR was closed because it has been stalled for 5 days with no activity.

balala8 requested a review from dacorvo as a code owner September 1, 2024 16:36

balala8 added 2 commits September 2, 2024 16:00

Fixed Sometimes the dtype of the initialized model is incorrect,like …

f7f224e

…issues 299

QuantizedTransformersModel has no __call__ method

c68768c

dacorvo force-pushed the yongfang branch from 128251f to c68768c Compare September 2, 2024 14:00

Merge branch 'main' into yongfang

68a5bd1

dacorvo requested changes Sep 4, 2024

View reviewed changes

balala8 added 3 commits September 5, 2024 09:28

Update diffusers_models.py

6d824cb

Update transformers_models.py

cbb0e77

change default dtype from float16 to float32

Update transformers_models.py

e9013b4

change forward method

balala8 commented Sep 5, 2024

View reviewed changes

Merge branch 'main' into yongfang

99e580d

dacorvo requested changes Sep 13, 2024

View reviewed changes

github-actions bot added the Stale label Oct 4, 2024

github-actions bot closed this Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Sometimes the dtype of the model is incorrect #301

Fixed Sometimes the dtype of the model is incorrect #301

balala8 commented Sep 1, 2024 •

edited

Loading

dacorvo Sep 2, 2024

dacorvo Sep 4, 2024

balala8 Sep 5, 2024

balala8 left a comment

balala8 Sep 5, 2024

dacorvo Sep 13, 2024

balala8 Sep 18, 2024

github-actions bot commented Oct 4, 2024

github-actions bot commented Oct 10, 2024

		dtype = kwargs.get("torch_dtype", torch.float32)
		model = model.to(dtype=dtype)

Fixed Sometimes the dtype of the model is incorrect #301

Fixed Sometimes the dtype of the model is incorrect #301

Conversation

balala8 commented Sep 1, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

dacorvo Sep 2, 2024

Choose a reason for hiding this comment

dacorvo Sep 4, 2024

Choose a reason for hiding this comment

balala8 Sep 5, 2024

Choose a reason for hiding this comment

balala8 left a comment

Choose a reason for hiding this comment

balala8 Sep 5, 2024

Choose a reason for hiding this comment

dacorvo Sep 13, 2024

Choose a reason for hiding this comment

balala8 Sep 18, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 4, 2024

github-actions bot commented Oct 10, 2024

balala8 commented Sep 1, 2024 •

edited

Loading