Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search