I'm the NLP Subject Matter Lead at statworx, lecturer for NLP at DHBW and visiting researcher at ESMT Berlin, where I completed my PhD.
I help leading companies to keep up with the state-of-the-art in AI and NLP and to create value when standardized solutions are no longer enough.
Fields
I am interested in NLP, Graph Neural Networks, Agents, Network Science and Causality.
Research in Applied Machine Learning and Methods
Essays on Networks, Deep Learning, and Leadership
2022
Text analysis and deep learning: A network approach
2021
Abstract:
Much information available to applied researchers is contained within written language or spoken text. Deep language models such as BERT have achieved unprecedented success in many applications of computational linguistics. However, much less is known about how these models can be used to analyze existing text. We propose a novel method that combines transformer models with network analysis to form a self-referential representation of language use within a corpus of interest. Our approach produces linguistic relations strongly consistent with the underlying model as well as mathematically well-defined operations on them, while reducing the amount of discretionary choices of representation and distance measures. It represents, to the best of our knowledge, the first unsupervised method to extract semantic networks directly from deep language models. We illustrate our approach in a semantic analysis of the term "founder". Using the entire corpus of Harvard Business Review from 1980 to 2020, we find that ties in our network track the semantics of discourse over time, and across contexts, identifying and relating clusters of semantic and syntactic relations. Finally, we discuss how this method can also complement and inform analyses of the behavior of deep learning models.
Random Forest Consensus Clustering for Regression and Classification
2021, with Ebru Koca Marquart
Download the python package here
Abstract:
Random forests are invariant and robust estimators that can fit complex interactions between input data of different types and binary, categorical, or continuous outcome variables, including those with multiple dimensions. In addition to these desirable properties, random forests impose a structure on the observations from which researchers and data analysts can infer clusters or groups of interest. These clusters not only provide a structure to the data at hand, they also can be used to elucidate new patterns, define subgroups for further analysis, derive prototypical observations, identify outlier observations, catch mislabeled data, and evaluate the performance of the estimation model in more detail.
We present a novel clustering algorithm called Random Forest Consensus Clustering and implement it in the Scikit-Learn / SciPy data science ecosystem. This algorithm differs from prior approaches by making use of the entire tree structure. Observations become proximate if they follow similar decision paths across trees of a random forest. We illustrate why this approach improves the resolution and robustness of clustering and that is especially suited to hierarchical approaches.
Graph Embedding on Hierarchical Manifolds (WIP)
Work in Progress. More Information coming soon!
Transformers and training variance: How stable are BERT's predictions? (WIP)
Work in Progress. More Information coming soon!
Global Targets: Stable and Isotropic Embeddings in Transformer Language Models (WIP)
Work in Progress. More Information coming soon!
Unsupervised Semantic Networks (WIP)
Work in Progress. More Information coming soon!
Research in Organizational Economics and Networks
Using Semantic Networks to Identify the Meanings of Leadership
2021, with Nghi Truong and Matthew Bothner
Youtube Presentation: Using Semantic Networks to Identify the Meanings of Leadership
Abstract:
We develop a novel method that integrates techniques from machine learning with canonical concepts from network analysis in order to examine how the meaning of leadership has evolved over time. Using articles in Harvard Business Review from 1990 through 2019, we induce yearly semantic networks comprised of roles structurally equivalent to the role of leader. Such roles, from which leader derives meaning, vary in content from coach and colleague to commander and dictator. Yearly shifts in the structural equivalence of leader to clusters of thematically-linked roles reveal a decline in the degree to which leadership is associated with consultative activities and a corresponding rise in the extent to which a leader is understood to occupy a hierarchical position. Our analyses further reveal that the role of leader comes to eclipse the role of manager, measured through changes in PageRank centrality as well as Betweenness centrality over the course of our panel. Implications for new research on leadership, culture, and networks are discussed.
Semantic Decision Networks
2021, with Matthew Bothner
Abstract:
When would members of an organization interpret a choice made by its main decisionmaker as ambiguous, important, irrelevant, surprising, or symbolic? To address this question, we develop a formal model of networks of choices. Extending research on natural language processing, these networks mirror networks of words and enable us to identify the meaning of choices, as well as the level of ambiguity surrounding these meanings. Using our model, we uncover latent relationships between choices and contexts in a focal decision-maker makes these choices. Our contributions our methodological and theoretical. Our method involves a mixture of state-of-the-art deep learning model with classical network analysis, which enables us to present a series of network-based measures characterizing choices. Our primary theoretical contribution is to cast new light on how audiences interpret choices as a function of their organizational context.
When does catalyzing social comparisons cause growth?
2020, with Nghi Truong, Richard Haynes and Matthew Bothner
Abstract:
When does a manager’s choice to activate social comparisons among employees prompt organizational growth? When should a manager instead allow employees to form aspirations and exert effort in relative autonomy, based on their own past performance? To address these questions, we develop an agent-based model that examines the growth-related effects of these two contrasting approaches. Our analyses reveal that activating social comparisons can be either beneficial or corrupting depending on three features of organizational context drawn from performance feedback theory: (i) employees’ goal adaptation rates, (ii) employees’ tendencies to engage in self-improvement, self-assessment, or self-enhancement, and (iii) the skewness of the distribution of their initial goals. We find that whether this distribution is right-skewed (the highly ambitious constitute the right tail) or left-skewed (the un-ambitious comprise the left tail) acts as the governing contextual moderator. Under right skew, social comparisons promote growth. Under left-skew, this effect reverses, but not if employees self-improve or adapt slowly: Slow adaptation “purifies” the intrafirm monitoring network of otherwise corrupting stimuli and thus restores the link between social comparisons and growth. Implications for research in performance feedback theory and organizational design are discussed.
Social Norms in Attention Networks (WIP)
with Nghi Truong, Richard Haynes and Matthew Bothner
Work in Progress. More Information coming soon!
Applications
How to Manage ‘Invisible Transitions’ in Leadership (MIT Sloan Management Review)
2021, with Nora Grasselli and Gianluca Carnabuci
Taking on a substantial new role without a change in title or authority is hard, but there are ways to manage this transition.
Read here!
Threshold Spectral Community Detection for NetworkX
A Python Package for Community Detection with NetworkX
NetworkX Community detection based on the algorithm proposed in Guzzi et. al. 2013 (*).
Developed for semantic similarity networks, this algorithm specifically targets weighted and directed graphs. This implementation adds a couple of options to the algorithm proposed in the paper, such as passing an arbitrary community detection function (e.g. python-louvain). Similarity networks are typically dense, weighted and difficult to cluster. Experience shows that algorithms such as python-louvain have difficulty finding outliers and smaller partitions.
Given a networkX.DiGraph object, threshold-clustering will try to remove insignificant ties according to a local threshold. This threshold is refined until the network breaks into distinct components in a sparse, undirected network.
More information: on GitHub
Python RFCC - Data understanding, clustering and outlier detection for regression and classification tasks
Python Package, joint with Ebru Marquart
Companion package to "Random Forest Consensus Clustering for Regression and Classification (2021)"
More information: on GitHub
ESMT Leadership Transition Assistant
Cloud-based app to help define the most important challenges for your upcoming leadership transition
Creates an individual profile of 7 challenges that leaders typically face during transitions based on user input.
text2network
A Python Package for extracting (contextual) semantic similarity network from Transformer models
Companion package to "Text analysis and deep learning: A network approach"
More information: on GitHub