Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation