Although SPT identified potential modules, they may not be guaranteed to be unique (lacking experimental support, at least from Alibaba’s paper). Sycophancy is defined to please human responses without respecting facts. Thus, we hypothesize that reducing such sycophancy needs LLM to be informed of factual information, which may exceed the scope of an LLM’s internal generation ability and requires external source of truth. So we propose to introduce RAG (retrieval-augmented generation) [2] into the SPT/SFT.
The sycophancy research focused in LLMs without considering recently-advanced SLMs (small language models) [3]. Recognizing the potential prevailing application of SLMs for its local deployment computation advantage, we would also like to include SLM as one baseline. It also enables us to assess whether the SFT is sensitive to LM scales and might demonstrate if RAG is a more viable approach to reduce sycophancy.
Question answer pairs extracted in a dialogue content and then evaluated by LLM (not being fine-tuned) to score its confidence and truthfulness. We’ll take confidence as one example.