Inference-Time Text-to-Video Alignment 

with Diffusion Latent Beam Search