City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning