Beyond Reward: Offline Preference-guided Policy Optimization