It is well known that reinforcement learning algorithms for a learning agent can converge to the optimal strategy for impartial combinatorial games such as Nim. We study convergence conditions for groups (societies) of agents consisting of Q-learning agents among a few optimal strategy agents. The big societal implications we would like to infer include: can a learning strategy make a difference to the welfare of the whole society? As we abstract learning objectives about discovering “truths”, we would like to learn how to overcome misinformation with seeding techniques, and with interaction designs for a society of agents.