AI Safety Gridworlds. Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, Shane Legg. November 2017. (arXiv, blog post, code

Reinforcement Learning with a Corrupted Reward Channel. Tom Everitt, Victoria Krakovna, Laurent Orseau, Marcus Hutter, Shane Legg. IJCAI AI and Autonomy track, May 2017. (arXivdemo, code)

Building Interpretable Models: From Bayesian Networks to Neural Networks. Viktoriya Krakovna (PhD thesis). September 2016.

Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models. Viktoriya Krakovna, Finale Doshi-Velez. International Conference on Machine Learning (ICML) Workshop on Human Interpretability in Machine Learning (WHI), June 2016 (arXiv). Neural Information Processing Systems (NIPS) Workshop on Intepretable Machine Learning for Complex Systems, Dec 2016 (arXivposter). 

A Minimalistic Approach to Sum-Product Network Learning for Real Applications. Viktoriya Krakovna, Moshe Looks. International Conference for Learning Representations (ICLR) workshop track, May 2016. (arXivOpenReviewposter

Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests. Viktoriya Krakovna, Jiong Du, Jun S. Liu. New England Statistics Symposium (NESS), April 2015. (arXiv, posterR packagecode)

A generalized-zero-preserving method for compact encoding of concept latticesMatthew Skala, Victoria Krakovna, Janos Kramar, Gerald Penn. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1512–1521, Uppsala, Sweden, July 2010. 


Interpretability for AI safety (slides, video). NIPS Interpretable ML symposium, December 2017, Los Angeles, CA.