My research focuses on the intersection between the formal study of causation and the fields of artificial intelligence (AI) and statistics in general, and machine learning in particular. This research area can broadly be structured into two intertwined threads: (I) causal inference for machine learning and (II) machine learning for causal inference.
Below, I outline the motivation and implications of each thread and contextualize my prior work within this framework.
Figure 1: Causal Inference Tasks Involved in My Research. Shaded/white nodes and ? denote observed/latent variables and quantities of interest, respectively. (a) Causal reasoning leverages known causal structure to predict effects of unseen interventions or distribution shifts, typically from purely observational data [13, 47]. (b) Causal discovery aims to infer a causal graph over observed variables [21, 48]. (c) In causal representation learning, the causal variables of interest are not directly observed and only accessible via low-level measurements such as pixels [49]. My work uses (a) to improve the robustness and explainability of AI and leverages machine learning methods for (b) and (c) from heterogeneous and high-dimensional data. Figure adapted from [50, 51].
Despite remarkable progress, machine learning-based AI systems still face challenges that limit their use in high-stakes or safety-critical settings. On the one hand, they lack transparency and often reproduce or amplify historical biases in the data. On the other hand, they lack robustness and tend to generalize poorly to new domains. Our work on (I) seeks to address these challenges through causal reasoning (see Fig. 1a), which provides a powerful language for formalizing counterfactual notions such as fairness and explanations or analysing the effect of distribution shifts. Our work also highlights how viewing the underlying data-generating process from a causal perspective can help inform appropriate algorithm selection and inspire new assumptions and methods.
In AI-assisted consequential decision-making, algorithmic recourse [1] provides individuals who were unfavourably classified (e.g., not approved for a loan) with actionable recommendations on how to improve their situation. As this involves reasoning about the effect of interventions (e.g., lifestyle changes), it is fundamentally a causal problem [2]. We developed methods to identify low-cost interventions that achieve recourse with high probability [3–5] and proposed an individualized fairness criterion [6], which extends non-causal group-based notions of fair recourse [7] and is inspired by counterfactual fairness [8]. Counterfactual reasoning is also at the heart of providing explanations for why something occurred [9, 10]. Traditionally, such counterfactuals have been interpreted in interventional terms [11–13]. We formalized an alternative backtracking semantics [14] (Best Paper Award at CLeaR 2023), which is better suited for diagnostic reasoning tasks such as root cause analysis [15], and developed a practical implementation for causal systems consisting of deep generative models [16]. In a more applied project [17], we took a critical look at the inner workings of trajectory prediction methods by leveraging causal feature attribution methods [18, 19] and links to Granger causality [20].
Causal mechanisms do not inform each other and tend to remain invariant when other parts of the system change [21]. The causal structure underlying a problem thus profoundly influences the effectiveness of machine learning approaches [22]. We studied such implications for covariate shift adaptation [23], semi-supervised learning [24], and natural language processing tasks [25], highlighting that cause and effect features should be treated differently, especially when learning from unlabelled data. When multiple source domains are available and the goal is to learn a predictor that perform well on new test domains, existing methods typically either optimise for average performance at the cost of reduced robustness [26, 27] or for worst-case performance by seeking invariant predictors that rely only on causal features [28, 29]. We have developed methods to interpolate between these objectives by learning predictors that generalize with high probability [30] and
to use invariant predictions as pseudo-labels to safely harness certain spurious (non-causal) features [31].
In unsupervised representation learning, identifiability (i.e., provably inverting the data-generating process) requires restricting the model class [32, 33]. This often involves constraining the mixing function that maps latent variables to observations [34–36]. Inspired by the principle of independent causal mechanisms [21], we proposed an orthogonality condition on the columns of the Jacobian of the mixing function [37, 38], which facilitates blind source separation [39] and dimension reduction [40] while also explaining the effectiveness of VAEs [41] in representation learning [42]. For object-centric representation learning, we have drawn on causal generative scene models to develop methods [43, 44], as well as new assumptions that also impose specific structures on the Jacobian [45] or higher-order derivatives of the mixing function [46].
Inferring causal structure or estimating causal effects from data often involves tasks such as regression, density estimation, conditional independence testing, or uncertainty quantification, which can be challenging for high-dimensional, nonlinear settings. Our work on (II) leverages machine learning to extend, improve, and develop new causal inference methods for such scenarios. Beyond classical causal discovery (see Fig. 1b), a particular focus has been on the theoretical foundations of causal representation learning (see Fig. 1c).
In the age of big data, effectively integrating knowledge from different sources is promising but non-trivial [52]. To this end, we developed an optimisation-based approach for merging causal insights from datasets sharing a subset of variables [53] and a biased but lower-variance causal effect estimator that combines observational and interventional data [54], inspired by ideas in shrinkage estimation [55, 56].
Since correlation does not imply causation, discerning causal structure is notoriously difficult from observational data and often requires additional assumptions or heterogeneous (non-i.i.d.) observations. Given non-identically distributed datasets, we proposed seeking causal graphs that provide parsimonious explanations by minimizing the number of mechanism shifts across domains [57]. This yields a principled method with theoretical guarantees [58] and flexible estimators, e.g., via kernel-based conditional independence tests [59], which we also investigated for non-stationary or functional data [60, 61]. Beyond mere heterogeneity, data resulting from direct experimentation is even more valuable for causal discovery but typically also costly to collect, motivating careful design strategies. To this end, we developed a Bayesian active learning framework, in which the current beliefs about the underlying causal model are used to inform which subsequent intervention would provide the most information about the causal query of interest (e.g., a particular treatment effect or edge in the graph) [62, 63]. To facilitate tractable approximate inference, this framework leverages probabilistic machine learning methods such as Gaussian process regression [64], Bayesian optimization [65], and variational inference [66].
To overcome the obstacle that high-level variables of interest (e.g., properties of objects in a scene) are often not directly accessible, causal representation learning aims to integrate causal discovery with unsupervised learning [67, 68] to infer latent variable models endowed with causal structure. To understand under which assumptions and up what ambiguities learning such latent causal models is possible, identifiability studies are crucial. In our work, we have characterized sufficient conditions for identifiability across several settings [69], including when learning from multiple non-independent views sharing some of the same latent variables, arising, e.g., from data augmentation [70, 71] or different modalities [72]; multiple non-identically distributed datasets, related via interventions in a shared causal model [51, 73]; or i.i.d. data, subject to linearity and sparsity constraints [74]. These theoretical results often also suggest suitable estimation procedures, e.g., based on self-supervised learning [75] or deep generative models such as normalizing flows [76], endowed with latent causal structure.
During the Covid-19 pandemic, we carried out a causal mediation analysis of case fatality rates across different countries by leveraging demographic data to separate age-specific and other effects [77] and compared counterfactual vaccination strategies in Israel by combining causal models with epidemiological simulations and literature estimates of key parameters [78]. Further, in collaboration with researchers from Imperial College London, we sought to better understand the interlinkages between the UN sustainable development goals and climate change [79, 80].
My previous work demonstrates the potential synergies of integrating causality with machine learning, offering new insights and tools for both fields. In my current and future research, I intend to continue studying theoretical and methodological questions at this intersection to further enhance the transparency and domain-shift robustness of AI systems and extend the applicability of causal inference methods to more complex real-world data. Additionally, I am eager to apply these methods to scientific challenges in other disciplines, with a particular focus on single-cell biology—a field that I believe presents unique and exciting opportunities for causal machine learning.