Publications
Publications
* = joint first author; † = senior author
Pre-Prints
Chiu, Y. Y., Lee, M. S., Calcott, R., Handoko, B., de Font-Reaulx, P., Rodriguez, P., ... & Levine, S.† (2025). MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More than Outcomes.
Levine, S., Franklin, M. , Zhi-Xuan, T., Yanik Guyot, S., Wong, L. , Kilov, D., Choi, Y. , Tenenbaum, J. , Goodman, N., Lazar, S. , Gabriel, I. (Under review). Resource Rational Contractualism Should Guide AI Alignment.
Chiu, Y. Y., Wang, Z., Maiya, S., Choi, Y., Fish, K., Levine, S., & Hubinger, E. (Under review). Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas.
Moore, J., Choi, Y., and Levine, S. † (Under review.) Perceptions of compromise: Comparing consequentialist and contractualist accounts.
Kwon, J., Tenenbaum, J., and Levine, S.† When it is not out of line to get out of line: The role of universalization and outcome-based reasoning in rule-breaking judgments.
Publications (including archival conference papers)
Kim, H., Sclar, M., Zhi-Xuan, T., Ying, L., Levine, S., Liu, Y., ... & Choi, Y. (2025). Hypothesis-driven theory-of-mind reasoning for large language models. Conference on Language Modeling (COLM).
Li, J., Pyatkin, V., Kleiman-Weiner, M., Jiang, L., Dziri, N., Collins, A., Schaich Borg, J., Sap, M., Choi, Y., and Levine, S.† (2025). SafetyAnalyst: Interpretable, Transparent, and Steerable LLM Safety Moderation. International Conference on Machine Learning (ICML).
Jiang, L., Sorensen, T., Levine, S., & Choi, Y. (2025.) Can Language Models Reason about Individualistic Human Values and Preferences? Association for Computational Linguistics (ACL).
Jin, Z., Kleiman-Weiner, M., Piatti, G., Levine, S., Liu, J., Gonzalez, F., Ortu, F., Strausz, A., Sachan, M., Mihalcea, R., Choi, Y. & Schölkopf, B. (2025.) Language Model Alignment in Multilingual Trolley Problems. International Conference on Learning Representations (ICLR). ✨Spotlight✨
Jiang, L., Hwang, J. D., Bhagavatula, C., Bras, R. L., Liang, J., Levine, S.,... & Choi, Y. (2025). Can machines learn morality? The Delphi Experiment. Nature: Machine Intelligence.
Trujillo Jiménez, D.*, Zhang, M.*, Zhi-Xuan, T., Tenenbaum, J., & Levine, S.† (2025). Resource-Rational Virtual Bargaining for Moral Judgment: Towards a Probabilistic Cognitive Model. TopiCS in Cognitive Science. (Special issues in honor of Nick Chater's Rumelhart Prize.)
Levine, S., Chater, N., Tenenbaum, J. and Cushman, F. (2024) Resource-rational contractualism: A triple theory of moral cognition. Brain and Behavioral Sciences (target article). ✨Commentary process in progress ✨
Levine, S., Kleiman-Weiner, M., Chater, N., Cushman, F. and Tenenbaum, J. (2024) When rules are over-ruled: Virtual bargaining as a contractualist method of moral judgment. Cognition, 250: 105790.
Awad, E., Levine, S., Loreggia, A., Mattei, N., Rahwan, I., Rossi, F., Talamadupula, K. and Tenenbaum, J., Kleiman-Weiner, M. (2024.) When Is It Acceptable to Break the Rules? Knowledge Representation of Moral Judgement Based on Empirical Data. Autonomous Agents and Multi-Agent Systems, 38(2), 35. (Special issue on Citizen-Centric AI Systems.)
Sorensen, T., Jiang, L., Hwang, J., Levine, S., Pyatkin, V., West, P., ... & Choi, Y. (2024). Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties. Association for the Advancement of Artificial Intelligence (AAAI).
Levine, S. and Leslie, A. (2022). Preschoolers use the means-ends structure of intention to make moral judgments. Journal of Experimental Psychology: General, 151(11), 2893–2909.
Awad, E.* & Levine, S*., Anderson, A., Anderson, A, Conitzer, V., Crockett, M., Everett, J., Evgeniou, T., Gopnik, A., Jamison, J., Kim, T.W., Liao, S.M., Lin, P., Meyer, M., Mikhail, J., Opoku-Agyemang, K., Borg, J., Schroeder, J., Sinnott-Armstrong, W., Slavkovik, M., Tenenbaum, J. (2022). Computational Ethics. Trends in Cognitive Sciences, 26(5), 388-405.
Jin, Z.*, Levine, S.*, Gonzalez, F.*, Kamal, O., Sap, M., Sachan, M., Mihalcea, R., Tenenbaum, J., and Schölkopf, B. (2022). When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment. In Advances in Neural Information Processing Systems 35. (Selected for oral presentation, <1% acceptance rate.)
Levine, S., Kleiman-Weiner, M., Schulz, L., Cushman, F. and Tenenbaum, J. (2020). The logic of universalization guides moral judgment. Proceedings of the National Academy of Sciences, 117(42), 26158-26169.
Epstein, Z., Levine, S., Rand, D., and Rahwan, I. (2020). Who gets credit for AI-generated art? iScience (special issue on Machine Behaviour), 23(9), 101515.
Levine, S., Rottman, J., Davis, T., O’Neill, E., Stich, S., and Machery, E. (2020). Religion’s impact on conceptions of the moral domain. Social Cognition, 39(1), 139-165.
Awad, E*., Levine, S.*, Kleiman-Weiner, M., Dsouza, S., Tenenbaum, J., Shariff, A., Bonnefon, J.F., Rahwan, I. (2019). Drivers are blamed more than their automated cars when both make mistakes. Nature Human Behavior, 4(2), 134-143.
Levine, S., Mikhail, J. and Leslie, A. (2018). Presumed Innocent? How Tacit Assumptions of Intentional Structure Shape Moral Judgment. Journal of Experimental Psychology: General, 147(11), 1728.
Levine, S. Leslie, A. and Mikhail, J. (2018). The Mental Representation of Human Action. Cognitive Science, 42(4), 1229–1264.
Conference and Workshop Proceedings (non-archival)
J.-J. Li, J. Mire, E. Fleisig, V. Pyatkin, M. Sap, and S. Levine†, (2025). “PluriHarms: Benchmarking the full spectrum of human judgments on AI harm”, in NeurIPS CogInterp Workshop.
Le Pargneux, A., Levine, S., Tenenbaum, J. B., & Cushman, F. (2025). The trade-off between rule-based thinking and mutual benefit in tacit coordination. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 47).
Li, J., Pyatkin, V., Kleiman-Weiner, M., Jiang, L., Dziri, N., Collins, A., Schaich Borg, J., Sap, M., Choi, Y., and Levine, S.† (2024.) SafetyAnalyst: Interpretable, Transparent, and Steerable LLM Safety Moderation. Socially Responsible Language Modeling Research, NeurIPS Workshop.
Jiang, L., Sorensen, T., Levine, S., & Choi, Y. (2024.) Can Language Models Reason about Individualistic Human Values and Preferences? Value Pluralistic Alignment, NeurIPS Workshop.
Jin, Z., Kleiman-Weiner, M., Piatti, G., Levine, S., Liu, J., Gonzalez, F., Ortu, F., Strausz, A., Sachan, M., Mihalcea, R., Choi, Y. & Schölkopf, B. (2024.) Language Model Alignment in Multilingual Trolley Problems. Value Pluralistic Alignment, NeurIPS Workshop. ✨Best paper award ✨
Moore, J., Choi, Y., and Levine, S. † (2024.) Perceptions of compromise: Comparing consequentialist and contractualist accounts. Value Pluralistic Alignment, NeurIPS Workshop.
White, J., Bhui, R., Cushman, F., Tenenbaum, J., and Levine, S. † (2024.) Moral flexibility in applying queuing norms can be explained by contractualist principles in children and adults. Proceedings of the 46th Annual Conference of the Cognitive Science Society.
Wu, S., Ren, X., Choi, Y., and Levine, S. † (2024.) Resource-rational moral judgment. Proceedings of the 46th Annual Conference of the Cognitive Science Society.
Kwon, J., Tenenbaum, J., and Levine, S.† (2024.) Neuro-Symbolic Models of Human Moral Judgment. Proceedings of the 46thAnnual Conference of the Cognitive Science Society.
Wu, S., Ren, X., and Levine, S.† Resource-rational moral judgment. (2023.) AI Meets Moral Psychology and Moral Philosophy. Workshop at NeurIPS 2023, New Orleans, LA.
Kwon, J., Tenenbaum, J., and Levine, S.† (2023.) When it is not out of line to get out of line: The role of universalization and outcome-based reasoning in rule-breaking judgments. Accepted into Proceedings of the 45th Annual Conference of the Cognitive Science Society.
Kwon, J., Tenenbaum, J., and Levine, S.† (2023.) Neuro-Symbolic Models of Human Moral Judgment: LLMs as Automatic Feature Extractors. Presented at 4 conferences.
- Social Intelligence in Humans and Robots. Workshop at RSS (Robotics: Science and Systems).
- Challenges of Deploying Generative AI. Workshop at ICML (International Conference on Machine Learning).
- Counterfactuals in Minds and Machines. Workshop at ICML.
- Artificial Intelligence and Human-Computer Interaction. Workshop at ICML.
Kwon, J., Tenenbaum, J. and Levine, S. (2022). Flexibility in moral cognition: When is it okay to break the rules? Proceedings of the 44th Annual Conference of the Cognitive Science Society.
Levine, S. and Jin, Z. Competing perspectives on building ethical AI: psychological, philosophical, and computational approaches. Proceedings of the 44th Annual Conference of the Cognitive Science Society. (Symposium, winner of the conference-wide Disciplinary Diversity and Integration Award.)
Awad, E., Kleiman-Weiner, M., Levine, S., Loreggia, A., Mattei, N., Rahwan, I., Rossi, F., Talamadupula, K. and Tenenbaum, J. (2020.) “When Is It Morally Acceptable to Break the Rules? A Preference-Based Approach.” 12th Multidisciplinary Workshop on Advances in Preference Handling (MPREF) held at the European Conference on Artificial Intelligence (ECAI).
Levine, S., Kleiman-Weiner, M., Schulz, L., Cushman, F. and Tenenbaum, J. (2019). Universalization as a Mechanism of Moral Decision-Making. Accepted into Proceedings of the 41th Annual Conference ofthe Cognitive Science Society.
Levine, S., Kleiman-Weiner, M., Chater, N., Cushman, F. and Tenenbaum, J. (2018). The Cognitive Mechanisms of Contractualist Moral Decision-Making. Accepted into Proceedings of the 40th Annual Conference of the Cognitive Science Society.
Kleiman-Weiner, M., Gerstenberg, T., Levine, S., & Tenenbaum, J. B. (2015). Inference of intention and permissibility in moral decision making. In Proceedings of the 37th Annual Conference of the Cognitive Science Society.
Popular Writing
Why We Should Crowdsource AI Ethics (and How to Do So Responsibly). (Sept 7, 2020). Behavioral Scientist. With Edmond Awad.
Rethinking the ethics of AI – and ours. (May 7, 2022). The Business Times. With Edmond Awad and Theos Evgeniou.