ComfyUI Wan 2.1 Text to Video Native Workflow

Workflow Name
wan-2.1-t2v-native
Workflow Description
This workflow is a basic Wan 2.1 Text to Video ComfyUI Native workflow.
Default Workflow Dependencies
Download model “wan2.1_t2v_1.3B_bf16.safetensors”, save in ComfyUI\models\diffusion_models:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
Download text encoder “umt5_xxl_fp8_e4m3fn_scaled.safetensors”, save in ComfyUI\models\text_encoders:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders
Download VAE “wan_2.1_vae.safetensors”, save in ComfyUI\models\vae:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae
Install “ComfyUI-VideoHelperSuite” custom node via ComfyUI manager.
Workflow Details
CLIPLoader
Purpose: This node loads a CLIP (Contrastive Language–Image Pre-training) model, which is used to encode text prompts into a format that the diffusion model can understand.
Customizable Settings:
clip_name: This setting selects the CLIP model file. Changing this will alter how text prompts are interpreted and encoded, significantly impacting the image generation based on the text prompt.
model_type: This setting is used to select the model type, in this case it is set to “wan”. Changing this will select a different model type.
UNETLoader
Purpose: This node loads the UNET (U-Net) model, which is the core diffusion model responsible for iteratively denoising the latent space.
Customizable Settings:
unet_name: This setting selects the UNET model file. Changing this will determine the base diffusion model used for the image generation, affecting the style, quality, and subject matter of the generated images.
model_type: This setting is used to select the model type, in this case it is set to “default”. Changing this will select a different model type.
VAELoader
Purpose:
This node loads a VAE (Variational Autoencoder) model, which is used to convert latent space representations (noise maps) into pixel space images and vice-versa.
Customizable Settings:
vae_name: This setting allows the user to select the VAE model file to load. Changing this will determine which VAE is used for decoding the latent space, affecting the visual style and quality of the generated images.
CLIPTextEncode (Positive Prompt)
Purpose: This node encodes the positive text prompt into conditioning information that guides the diffusion process.
Customizable Settings:
text: This setting allows the user to input the positive text prompt. The text entered here will influence the content and style of the generated video.
CLIPTextEncode (Negative Prompt)
Purpose: This node encodes the negative text prompt into conditioning information that tells the diffusion model what to avoid in the generated video.
Customizable Settings:
text: This setting allows the user to input the negative text prompt. The text entered here will help refine the output by preventing unwanted artifacts or styles.
EmptyHunyuanLatentVideo
Purpose: This node creates an empty latent space representation for a video, providing the starting point for the diffusion process.
Customizable Settings:
width: This setting determines the width of the generated video frames in pixels.
height: This setting determines the height of the generated video frames in pixels.
frame_count: This setting sets the number of frames that will be generated for the video.
batch_size: This setting determines the number of latent samples processed at once.
KSampler
Purpose: This node performs the iterative denoising process, using the loaded UNET model and the encoded prompts to generate the latent space representation of the video.
Customizable Settings:
seed: This setting controls the random seed used for the denoising process. Changing this will generate different variations of the video. Setting to “randomize” will create a new seed for each generation.
control after generate: This determines how the seed value will be calculated for the next generation.
steps: This setting determines the number of denoising steps. Higher values generally result in higher quality images but take longer to generate.
cfg: This setting controls the classifier-free guidance scale, which influences how closely the generated video follows the prompt. Higher values result in stronger adherence to the prompt but can reduce diversity.
sampler_name: This setting selects the sampling algorithm used for denoising. Different samplers offer varying trade-offs between speed and quality.
scheduler: This setting selects the scheduler used during the sampling process. Different schedulers can affect the speed and stability of the generation.
denoise: Controls the amount of denoising.
VAEDecode
Purpose: This node decodes the latent space representation generated by the KSampler into pixel space images.
Customizable Settings: There are no customizable settings in this node.
VHS_VideoCombine
Purpose: This node combines the generated image frames into a video file.
Customizable Settings:
frame_rate: This setting determines the frame rate of the output video.
loop_count: This setting controls how many times the video will loop. 0 means it will not loop.
filename_prefix: This setting allows the user to set the prefix for the output video file name.
format: This setting selects the video format.
pix_fmt: This setting selects the pixel format.
crf: This setting controls the Constant Rate Factor, which influences the video quality and file size. Lower values result in higher quality but larger files.
save_metadata: This setting determines whether to save metadata with the video file.
trim_to_audio: This setting determines whether to trim the video to the length of the audio.
pingpong: This setting enables ping-pong looping of the video.
save_output: This setting controls whether the video is saved to the output folder.
videopreview: This setting controls the video preview. It controls if the preview is hidden, paused and the parameters of the preview.
Credits
This workflow has been modified from the original ComfyUI Wan “text_to_video_wan.json” workflow example:
https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/example%20workflows_Wan2.1