SF-CRL: Speech-Facial Contrastive Representation Learning for Speaker Feature Extraction