Recent years have seen a tremendous surge in the availability of socioeconomic data characterised by vast complexity and high dimensionality. However, prevalent methods employed to inform practitioners and policy makers are still focused on small to medium-scale datasets. Consequently, crucial transmission channels are easily overlooked and the corresponding inference often suffers from omitted variable bias. This calls for novel methods which enable researchers to fully exploit the ever increasing amount of data.
In this project, we aim to investigate how the largely separate research streams of Bayesian econometrics, statistical model checking, and machine learning can be combined and integrated to create innovative and powerful tools for the analysis of big data in economics and other social sciences. Thereby, we pay special attention to properly incorporating relevant sources of uncertainty. Albeit crucial for thorough empirical analyses, this aspect is often overlooked in traditional machine learning techniques which have mainly been centred on producing point forecasts for key quantities of interest only. In contrast, Bayesian statistics and econometrics are based on designing algorithms to carry out exact posterior inference which in turn allows for density forecasts. Our contributions are twofold: From a methodological perspective, we develop cutting-edge methods that enable fully probabilistic inference of dynamic models in vast dimensions. In terms of empirical advances, we apply these methods to highly complex datasets that comprise situations where either the number of observations, the number of potential time series and/or the number of variables included is large. More specifically, empirical applications center on four topical issues in the realm of sustainable development and socioeconomic policy to answer questions such as:
- How do market and economic uncertainty affect income inequality?
- What are the relationships between greenhouse gas emissions and macroeconomic indicators?
- Which role do tweets play in the evolution of the prices of crypto-currencies?
- Which policy measures are most effective to foster sustainable urban mobility patterns?
In these applications, we focus on probabilistic forecasting using real-time data to perform model validation in an efficient way. Moreover, we address structural inference. As policy makers are typically interested in evaluating their policies quantitatively, robust econometric tools are crucial for counterfactual simulations. In light of the increasing complexity of the economy, however, large information sets need to be exploited to appropriately recover the underlying causal structures and provide a rich picture of potential transmission channels of policy interventions.