How to combine some models into a new model

reuse the function: from transformers import VisionEncoderDecoderModel

https://discuss.huggingface.co/t/custom-vlm-swapping-a-vision-encoder-from-a-vlm/146421/2

场景1： build 一个自己的vision encoder, 和一个已有的LLM 拼接起来，组成一个新的VLM 模型。

custom_vlm = VisionEncoderDecoderModel( encoder=CustomVisionEncoder(), decoder=language_model ）

场景1.2：怎么存/取这个构建模型的checkpoint

# Save the custom VLM

custom_vlm.save_pretrained("path_to_save")

# Load the custom VLM

loaded_model = VisionEncoderDecoderModel.from_pretrained("path_to_save")

场景2：怎么去build 任意的组合模型。比如说，我想build 一个新的omni 模型，用CLIP做vision encoder，用diffusion 做image generation, 用Qwen3 做LLM.

思路就是，可以考虑build 一个新的class, 继承一个已有的class, 然后resue 已有class的模块。只需要执行你想要更改部分，然后定义forward 就可以。
https://discuss.huggingface.co/t/how-to-implement-custom-vision-encoder-decoder/45848

__init__ 方法

super().__init__(...)

Page updated

Google Sites

Report abuse