Multimodal recommender systems (MMRS) leverage images, text, and interaction signals to enrich item representations. However, recent alignment-based MMRSs that enforce a unified embedding space often blur modality-specific structures and exacerbate ID dominance. Therefore, we propose AnchorRec, a multimodal recommendation framework that performs indirect, anchor-based alignment in a lightweight projection domain. By decoupling alignment from representation learning, AnchorRec preserves each modality’s native structure while maintaining cross-modal consistency and avoiding positional collapse. Experiments on four Amazon datasets show that AnchorRec achieves competitive top-𝑁 rec-ommendation accuracy, while qualitative analyses demonstrate improved multimodal expressiveness and coherence. The code-base of AnchorRec is currently available at
https://github.com/hun9008/AnchorRec.