A structured comparison causal machine learning methods to assess heterogeneous treatment effects in spatial data
Day 2 (PM session) – Foundations of Data Science II (Tuesday 6th December)
14.00-1700:Dr. Kevin Credit, National Centre for Geocomputation, Maynooth University, and Prof Chris Brunsdon, Maynooth University. A structured comparison causal machine learning methods to assess heterogeneous treatment effects in spatial data
Brief synopsis: Explaining the key ideas of spatial statistics, and provide a number of practical examples using R, and in particular the ‘mgcv’ package for analysis, and the ‘tmap’ and ’sf’ packages to manipulate and visualise geographical data.
1. Working with geographical data in R – including how read in spatial data, and use it to create maps via ’sf’ and ’tmap’
2. Spatial Statistical models with area-based data – including the use of Markov random fields to model broadband uptake in Ireland, based on census data for Irish electoral divisions
The development of the “causal” forest by Wager and Athey (2018) represents a significant advance in the area of explanatory/causal machine learning. However, this approach has not yet been widely applied to geographically-referenced data, which present some unique issues, including the fact that the random split of the test and training sets in the typical causal forest design fractures the spatial fabric of geographic data. To help solve this we use a simulated dataset with known properties to compare the performance of causal forest models across different definitions of the test/train split. We also develop a new “predicted counterfactual” model that can be implemented using predictive methods like random forest to provide estimates of heterogeneous treatment effects across all units. We then apply the preferred model in the context of analysing the treatment effect of the construction of the Valley Metro light rail (tram) system on on-road CO2 emissions per capita at the block group level in Maricopa County, Arizona, and find that the neighbourhoods most likely to benefit from treatment are those with higher pre-treatment proportions of transit and pedestrian commuting and lower proportions of auto commuting.
Speaker biography: Dr. Kevin Credit is an Assistant Professor at the National Centre for Geocomputation at Maynooth University. Broadly, his research focuses on using spatial econometric, causal, and machine learning (ML) approaches to answer questions related to transportation, public health, economic development, and spatial patterns of inequality in urban areas. Methodologically, he is particularly interested in issues around the use and development of large open spatial datasets and how ML and deep learning (DL) methods can be designed to: 1) more explicitly integrate spatial information and spatial ways of thinking, 2) assess problems of causal inference, and 3) provide better insight into the explanatory relationships driving model results. Kevin is currently working on a range of projects and proposals in this area, including the identification and characterisation of ‘startup neighbourhoods’ in Europe, the development of an integrated health + environment spatial data dashboard for the Dublin 8 neighbourhood, a structured comparison of causal ML methods using spatial data, methods to identify built environment and business characteristics from street view imagery, and a large-scale analysis of open policing data in the US. He is an Academic Collaborator at the ADAPT Centre for Digital Media Technology (2022), a Fellow of the Center for Spatial Data Science at the University of Chicago (2021), and received his PhD in Geography from Michigan State University in 2018.
Speaker biography: Professor of Geocomputation, and Director of the National Centre for Geocomputation at Maynooth University. Prior to this I was Professor of Human Geography at the University of Liverpool in the UK, and before this I worked in the Universities of Leicester, Glamorgan and Newcastle, all in the UK. I have degrees from Durham University (BSc Mathematics) and Newcastle University (MSc Medical Statistics, PhD in Geography).
Please note 0.5 of a training day will be deducted from the annual training allowance for each Enterprise Alliance attendee
The SFI Centre for Research Training in Foundations of Data Science will train a cohort of PhD students with world-class foundational understanding in the horizontal themes of Applied Mathematics, Statistics, and Machine Learning.
For perfomance reasons we use Cloudflare as a CDN network. This saves a cookie "__cfduid" to apply security settings on a per-client basis. This cookie is strictly necessary for Cloudflare's security features and cannot be turned off.