workshops for Ukraine
Feedback on the past workshops (if you want to learn how to make wordclouds, check out Text Data Analysis workshop below)
You can learn & support Ukraine at the same time! You can both register for the upcoming workshops AND donate to get recordings & all of the materials of the previous workshops.
If you want to get email updates about our future workshops, please fill this form.
The next workshop will take place on December 12th and will cover Big data made small: Harness the power of SQL via python (through DuckDB).
You can find information on how to register below.
If you experience any difficulties registering or have any questions you can email me at dariia.mykhailyshyn2@unibo.it Please check FAQ section at the end of this page before emailing, as many popular questions are answered there.
Big data made small: Harness the power of SQL via python (through DuckDB)
Date: Thursday, December 12th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Alessandro Martinello is an Italian expat who's been living in Copenhagen for the past 15 years. He got PhD in economics at Copenhagen University, but after a few years in academia, slowly drifted towards industry and finance. First passing through the Danish Central Bank, where he led the Data Science team, then finally in Danske Bank Group as Head of Data and Analytics for their mortgage bank (Realkredit Danmark). Data is after all a universal skill - and the ability to quickly extract insights from (large) data using limited resources is very valuable indeed.
Description: This workshop is not only aimed at academics wanting to speed up their data transformation steps in their coding, but also at under-/graduate students potentially interested in pursuing a career in the private sector, where SQL reigns supreme.
There are good reasons why it does so. In the workshop you will learn to - without leaving the comfort of a Jupyter notebook: Master some basic SQL syntax (SELECT / FROM / JOIN / WHERE / GROUP BY / ORDER BY); Speed up and streamline your data processing pipeline for analysis/research; Easily juggle larger-than-RAM datasets on your laptop - or whatever machine you'll be using
We will be using DuckDB, a fast OLAP database that can be installed in a few seconds via pip on any machine. The workshop is also relevant for R users - besides the python interface, the core content of the workshop is about SQL, and DuckDB is also available in R.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R
Date: Thursday, December 19th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Riva Quiroga is a linguist and educator based in Valparaíso, Chile. She is a Software Sustainability Insitute Fellow, part of the R-Ladies Global Leadership Team, and a Women Techmakers Ambassador.
Description: In this workshop, we will cover the process of creating a fully customized and reproducible PDF report using Quarto and Typst, a modern typesetting and markup language designed for creating high-quality PDFs that offers a more user-friendly alternative to LaTeX. After walking participants through the building blocks of document layout, the workshop will focus on Quarto’s ability to translate CSS properties into Typst properties, a feature that expands the possibilities for customizing a document's appearance.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Introduction to generalized linear models in R
Date: Thursday, January 9th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Bodo Winter is Professor of Linguistics at the Dept. of Linguistics and Communication, University of Birmingham, and a UKRI Future Leaders Fellow for the project “Making numbers meaningful”. He uses data science-driven methods to study gesture, iconicity, and numerical communication in language. Bodo has authored Statistics for Linguists: An Introduction Using R and co-founded the Birmingham Statistics for Linguists Summer School.
Description: In this talk, you’ll learn about the fundamentals of generalized linear models, a powerful extension of the general linear model/multiple regression. We will discuss different distributions that can be used to model a diverse range of data-generating processes and how to interpret models that use different link functions. In the hands-on part of the workshop, we’ll work through a dataset for which we are going to use a mixed Poisson regression model, implemented with the package brms. Materials for the hands-on session will be distributed a couple days prior to the workshop.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Latent Growth Curve Models using the Lavaan Package in R
Date: Thursday, January 16th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Rogier Kievit is Professor of Developmental Neuroscience at the Donders Institute in Nijmegen, where he leads the Lifespan Cognitive Dynamics Lab (https://lifespancognitivedynamics.com/). He studies changes in cognitive abilities across the lifespan using multivariate techniques including factor analysis, growth curve models, mixture models and timeseries analysis. He using R almost every day, especially Lavaan and ggplot, and has contributed to multiple packages (e.g. ggrain, regsem, iced). If you send him exciting longitudinal data there is a real risk he may abandon other more urgent tasks.
Description: Rogier Kievit is Professor of Developmental Neuroscience at the Donders Institute in Nijmegen, where he leads the Lifespan Cognitive Dynamics Lab (https://lifespancognitivedynamics.com/). He studies changes in cognitive abilities across the lifespan using multivariate techniques including factor analysis, growth curve models, mixture models and timeseries analysis. He using R almost every day, especially Lavaan and ggplot, and has contributed to multiple packages (e.g. ggrain, regsem, iced). If you send him exciting longitudinal data there is a real risk he may abandon other more urgent tasks.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Satellite mapping of surface waters in R
Title: Satellite mapping of surface waters in R
Date: Thursday, January 23rd, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Lawrence Vulis is a senior hazard scientist at CoreLogic working on modelling climate impacts to natural hazards and property risk. He regularly works with statistical methods, numerical modelling, and geographic information systems (GIS) to interrogate natural hazard, property/building, and climate data. His background is in hydrology and geomorphology, with prior experience in the satellite imagery based monitoring and classification of coastal landscapes and surface water systems such as beaches, river deltas, and lakes.
Description: Surface waters such as rivers, streams, lakes, and reservoirs are an important source of freshwater and economic activity. Mapping such waters and their seasonal changes is crucial for understanding water resource availability or geomorphic activity. This workshop focuses on the interrogation of optical and multispectral satellite imagery for surface water mapping using R. We will examine different types of satellite imagery and how to extract surface water features. It is recommended but not required to have some basics in understanding geographic information systems (GIS) topics, image processing, and possibly some background in hydrology, geography, or earth science.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Spatial modelling with GAMs in R
Date: Thursday, January 30th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Sophie Lee is a statistician and educator who teaches a range of statistics and R coding courses to non-statisticians. Her goal is to provide accessible, engaging training to prove that statistics does not need to be scary! She has a PhD in Spatio-temporal Epidemiology from LSHTM and is a Fellow of the Higher Education Academy. Her research interests lie in spatial data analysis, planetary health, and Bayesian modelling.
Description: When modelling spatial data we are generally unable to use traditional modelling approaches, such as generalised linear models (GLMs), as the assumption that observations are independent of one another may be invalid. This is due to underlying similarities, including unobservable behaviours, climate, and other characteristics, that are shared between observations close to one another. There are extensions of GLMs that can be used to overcome this lack of independence between observations, often with the inclusion of structured random effects, that try to take account of the underlying spatial relationships. The issue arises when deciding how to structure these spatial random effects: how close is close enough to consider observations no longer independent?
This workshop introduces generalised additive models (GAMs) as a method for generating the underlying spatial structure needed to define spatially structured random effects. We will see how penalised smoothing splines can be applied to coordinates to generate a spatial plane with minimal user assumptions. This ensures the spatial model is relevant and unique to the setting being studied. Using the mgcv package in R, we will apply this approach to real-world data, incorporating the flexible spatial structure into a random effects model, which then can be interpreted similarly to any other spatial model.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Structural Equation Models and the Do-Operator in PyMC
Date: Thursday, February 6th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Nathaniel Forde is a Staff data scientist currently working in People Analytics for Personio (a HR Intelligence platform). He has more than 10 years experience in data science working across a range of industries: Insurance, Gaming and E-commerce. He is an active contributor to the PyMC eco-system with a focus on causal inference. His academic background covers degrees in philosophy and mathematical logic.
Description: In this talk we’ll present on the relationships between structural equation modelling (SEM) and Judea Pearl’s structural causal models (SCM). In particular we will focus on how to articulate Bayesian structural equation models in PyMC and interrogate their implications using PyMC’s do-operator. We will showcase how to assess various model fit characteristics and evaluate their robustness under prior sensitivity analysis.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Tabular ML in R: an overview of tidymodels in R for tabularized data
Date: Thursday, February 20th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Frank Hull is currently Director of Analytics at ACES. Frank oversees ACES’ Data Science department, which works directly with Portfolio Strategy, Portfolio Modeling, Transmission, Resource Planning, Fundamentals, and Trading & Operations. Frank leads & advises various initiatives such as weather-driven stochastics (WDS), long-term load forecasting (LTLF), peak prediction services (PPS), dark calm (DC) and extreme weather event (EWE) analyses. Frank also hosts internal R meetings for programmers at ACES. Prior to his current role, Frank held various roles related to data science, systems, modeling, and quantitative analysis at AES & ACES. Frank holds a degree in physics with a concentration in engineering physics.
Description: In this workshop, we will 1) discuss what we mean by tabular ml in R, 2) why it’s important, 3) when can it be applicable, and 4) how to setup a robust pipeline for iterative machine learning workflows. We will start off by defining and discussing the prevalence of tabular data across sectors. Followed by data exploration to understand and interpret any known relationships with our example dataset. Lastly, we will establish key practices within the tidymodels ecosystem to create a predictive framework and benchmark various ML engines.
Minimal registration fee: 20 euro (or 20 USD or 800 UAH)
To register, please donate at least 20 euro here or here or here enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form. You can also sponsor a participation of student in the workshop by donating at least 20 euro per student here or here or here, saving the confirmation of payment that will be emailed & filling this form. Students, who are unable to afford the registration fee can join the waiting list here. Please note the number of places sponsored is very limited, so please register directly if you can afford it. Ukrainians who are unable to afford it can participate for a smaller donation to the same organisations or without making a donation by filling in this form.
Previous workshops
NEW: You can now view the workshops by category! Click on the category below to view workshops in that category.
How can I get access to the recordings, materials of past workshops? Please donate at least 20 euro per workshop here or here or here , enter your email on the donation page, screenshot the confirmation of payment that will be emailed to you and attach it to this form.
Workshops for beginners in R
Introduction to R with Tidyverse
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: This workshop will cover introduction to R, introduction to tidyverse, basic data loading, cleaning, manipulation, and data visualization techniques.
Introduction to R Programming for Data Analysis
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will cover some of the basics of R Programming essential for anyone who is intending to do any type of data analysis in R. We will discuss how vectors, matrices, and lists work in R and how to do operations with them. We will also cover how to iterate the same functions and apply them to many data points simultaneously. We will learn how to do so both using loops and using special ‘apply’ and ‘map’ functions. The workshop will also cover how to create your own functions. Finally, we will also examine how to work with two special data types: dates and time and factors.
Cleaning Data in R with tidyr and janitor
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will study what makes a dataset ‘tidy’. We will learn how you can clean different types of messy datasets using tidyr and janitor packages. In particular, we will study how to reshape the dataset between ‘wide’ and ‘long’ formats, splitting & uniting columns, and nesting & unnesting the dataset. We will also learn how to detect and work with missing data. We will also look at many other steps of data cleaning process, such as cleaning the names of the variables in a dataset, removing duplicated observations or variables, and checking whether datasets are compatible for matching among other things. In addition, we will study simple and quick ways of converting datasets, created for Excel to be in a ‘tidy’ format, and convenient to work with in R. Finally, we will look at a convenient way of creating summary statistics tables.
Working with Strings in R
Speaker: Harald Puhr, PhD in international business and assistant professor at the University of Innsbruck. His research and teaching focuses on global strategy, international finance, and data science/methods—primarily with R. As part of his research, Harald developed the globaltrends package (available on CRAN) to handle large-scale downloads from Google Trends.
Description: In this seminar, we will talk handling strings in R—one of the basic data types in R. We will proceed in three steps: (1) We will discuss why strings are a special data type and explore functions for string-handling available in the stringr package. (2) We will learn about pattern matching and the opportunities (or pain, depending on your perspective) of regular expressions. (3) We will conclude the seminar by an introduction to string distances.
RMarkdown and Quarto - Mastering the Basics
Speaker: Indrek Seppo, a seasoned R programming expert, brings over 20 years of experience from the academic, private, and public sectors to the table. With more than a decade of teaching R under his belt, Indrek's passionate teaching style has consistently led his courses to top the student feedback charts and has inspired hundreds of upcoming data analysts to embrace R (and Baby Shark).
Description: Discover the power of RMarkdown and its next-generation counterpart, Quarto, to create stunning reports, slides, dashboards, and even entire books—all within the RStudio environment. This session will cover the fundamentals of markdown, guiding you through the process of formatting documents and incorporating R code, tables, and graphs seamlessly. If you've never explored these tools before, prepare to be amazed by their capabilities. Learn how to generate reproducible reports and research with ease, enhancing your productivity and efficiency in the world of data analysis.
Creating R Functions
Speaker: Simisani Ndaba is a Teaching Assistant in the Department of Computer Science at the University of Botswana. Her research interests are in Data Science and Machine Learning.
She is the founder and co-organiser of R-Ladies Gaborone, an occasional blogger and enjoys creating data visualisation.
Description: This is a beginner friendly workshop on creating R functions, testing, error handling, and documenting them. Functions are written in order to make repetitive operations using a single command. In this lesson, we’ll learn how to write functions so that we can repeat several operations with a single command.
Data visualization in R
Data Visualization with ggplot
Speaker: Dr. James Murray. Dr. Murray has 15 years of experience using R and teaches undergraduate and MBA courses in data analysis and econometrics. He does academic research in macroeconomics, fiscal and monetary policy, and the scholarship of teaching and learning.
Description: This is an introductory workshop on creating data visualizations using ggplot2 in R. Workshop participants will get the most out of the session if they have Rstudio installed and working on their computers and have some previous experience using R, even if it is limited, and some basic knowledge of inferential statistics. We'll discuss some best practices for data visualization and some of the philosophy of the grammar of graphics and how we use this framework for creating visualizations. We'll start with the fundamentals to build some simple not-very-pretty visuals and build upon them to make visually attractive plots that can be used for effective communication.
Advanced data visualization in R with ggplot
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop we will focus on 3 things: 1) How to make data visualization with ggplot easier 2) How to make different types of plots that are difficult or impossible to make with ggplot by itself (such as, for example, animated plots, alluvial plots, pie charts, donut plots, lollipop plots, etc.). 3) How to customize the look of your ggplot plots. To do so, we will work with a number of additional packages that work together with the original ggplot package. It is recommended to have basic knowledge of ggplot before attending. If you have never used ggplot before, check out our previous workshop on data visualization with ggplot in R.
Color Palette Choice and Customization in R and ggplot2
Speaker: Dr. Cédric Scherer, independent data visualization designer, consultant, and instructor and graduated computational ecologist from Berlin, Germany. Cédric has created visualizations across all disciplines, purposes, and styles and regularly teaches data visualization principles, the R programming language, and ggplot2. Due to regular participation in social data challenges such as #TidyTuesday, he is now well known for complex and visually appealing figures, entirely made with ggplot2.
Description: In this workshop, we will first cover the basics of color usage in data visualization. Afterward, we will explore different color palettes that are available in R, discuss which extension packages are exceptional in terms of palettes and functionality, and learn how to customize palettes and scales in ggplot2.
Dataviz with R and ggplot: Using colour and annotations for effective story telling
Speaker: Cara Thompson, Cara is a freelance data consultant with an academic background, specialising in dataviz and in "enhanced" reproducible outputs. She lives in Edinburgh, Scotland, and is passionate about maximising the impact of other people's expertise.
Description: If we're passionate about our data and the patterns we've found, a key part of our job is to find effective ways of communicating what we've discovered. Intuitive and compelling data visualisations are a great way to draw attention to our main story, and illustrate some of the details.
In this workshop, we'll talk about how we can make use of colour, fonts and a few other tricks to make it easier for readers to understand and remember our main story and make our plots publication-ready. We'll be using R and ggplot to create, modify and annotate the plots we discuss, but the principles apply regardless of the tools you use to plot your data.
Attendees are encouraged to bring along a plot of their own (which doesn't need to be made with ggplot!) so that think about how best to apply the principles to their own context - and for a chance for some live feedback during our Q&A session.
Designing Beautiful Tables in R
Speaker: Tanya Shapiro is a freelance data consultant, helping businesses make better use of their data with bespoke analytical services. She is passionate about data visualization and design, and fell in love with the online R community via #TidyTuesday. When she’s not working on data projects, you can find her cycling or exploring downtown St. Petersburg, Florida.
Description: When we think about data visualization, bar charts and line charts are often top of mind - but what about tables? Tables are a great way to summarize and display different metrics across many records. In this workshop, we will learn how to design visually engaging tables in R and how to enhance them with HTML/CSS techniques. From sparklines, to heatmaps, to embedded images, we'll cover a variety of tricks to help elevate your tables!
Visualizing Regression Results in R
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will look at how you can use ggplot and other packages to visualize regression results. We will explore different types of plots. Firstly, we will plot regression lines both for the bivariate regressions and multivariate regressions. We will also explore different ways of plotting regression coefficients and look at how we can visualize coefficients and standard error of multiple variables from a single regression, of a single variable from multiple regressions, and of multiple variables from multiple regressions. We will also learn how to plot other regression outputs, such as marginal effects, odds ratios and predicted values. In the process, we will also learn how to tidy the output of the regression model and convert it to the dataframe and how to automize the process of running regression by using a loop.
Visualising connection in R
Speaker: Rita Giordano is a freelance data visualisation consultant and scientific illustrator based in the UK. By training, she is a physicist who holds a PhD in statistics applied to structural biology. She has extensive experience in research and data science. Furthermore, she has over fourteen years of professional experience working with R. She is also a LinkedIn instructor. You can find her course “Build Advanced Charts with R” on LinkedIn Learning.
Description: How to show connection? It depends on the connection we want to visualise. We could use a network, chord, or Sankey diagram. The workshop will focus on how to visualise connections using chord diagrams. We will explore how to create a chord diagram with the {circlize} package. In the final part of the workshop, I will briefly mention how to create a Sankey diagram with networkD3.
Effective Visual Communication with R
Speaker: Claus Wilke is a data scientist and computational biologist at The University of Texas at Austin. He is known for his work on popular R packages for data visualization, such as cowplot, ggridges, and ggtext, as well as his contributions to the package ggplot2. He is also the author of the book Fundamentals of Data Visualization, published in 2019, which provides a concise introduction to effectively visualizing many different types of data sets.
Description: In the first half of this workshop, Wilke will provide a high-level perspective on how to make good visualizations and how to use them effectively to communicate and reason about data. The second half will be more hands-on and will address how to use R to make interactive plots, deal with overplotting, and make compound figures.
Getting creative with ggplot2
Date: Thursday, September 12th, 18:00 - 20:00 CEST (Rome, Berlin, Paris timezone)
Speaker: Georgios Karamanis is a data visualization designer, psychiatrist and researcher, based in Uppsala, Sweden. With a strong background in visual arts and design, he uses almost exclusively R and ggplot2 to make elegant and creative data visualizations.
Description: Creative data visualizations stand out and can help get your message across more easily. But how do you achieve this? In this hands-on workshop, we will look at examples and explore ways to use ggplot2 and related packages to make your visualizations more eye-catching and personal.
Visualizing Variance with Sankey diagrams/Riverplots using R: An Illustration with Longitudinal Multi-level Modeling
Speaker: Daniel P. Moriarity, PhD is a clinical psychologist with a particular interest in immunopsychiatry, psychiatric phenotyping, and methods reform in biological psychiatry. He currently works as a Postdoctoral Fellow in the UCLA Laboratory for Stress Assessment and Research with Dr. George Slavich. Starting January 2025, he will join the University of Pennsylvania's Psychology Department as an Assistant Professor of Clinical Psychology.
Description: This workshop will illustrate how to create Sankey diagrams/Riverplots with a focus on longitudinal multilevel modeling to separately visualize between-person and within-person variance. However, the technique can be applied to many other visualizations of different sources of variance (e.g., different variables, random vs. fixed effects). Data + code templates will be provided to follow along with.
Specialized topics in R
Text data analysis with R
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will explore how you can analyze text data using R. In particular, we will look at the “bag of words” approach to text data analysis, how to plot wordclouds with given text, and how to perform sentiment analysis and topic modeling. In addition, we will look at the analysis of the text at a level of “ngrams” instead of just individual words.
Web Scraping with R
Speaker: Oleksii Hamaniuk, PhD student in Economics at the University of Bonn. Previously worked at Kyiv School of Economics and a Ukrainian think tank Centre for Economic Strategy.
Description: Web scraping is a process of quickly and automatically extracting data from the Internet. In this workshop, we will parse the data from the website https://books.toscrape.com/. It was created especially for people who want to practice web-scraping. We will consider the R package “rvest” and different tools to extract information from the website html code. It is recommended to have at least basic prior experience with R.
Intermediate Web Scraping and API Harvesting using R
Speaker: Felix Lennert is a second-year Ph.D. student in Sociology at the CREST, ENSAE, Institut Polytechnique de Paris. He is a co-organizer of the Summer Institute of Computational Social Science in Paris. His research interests lie in the formation and polarization of political opinions which he tackles using a toolbox consisting of "classic" quantitative as well as text-as-data methods.
Description: Digital trace data are an integral element of CSS (cool social scientific) research. This course will show you how this is done on an intermediate level. This implies that we will not cover the fundamentals of selecting and downloading things from static web pages on the one hand, but also not go as far as firing up RSelenium to scrape dynamic web pages on the other. We will start with a brief revision of CSS selectors, then we move on to rvest to simulate a browser session, fill forms, and click buttons. In the second half of the session, APIs and how to make requests to them will be covered. Tangible examples for API queries will be shown. In the end, exemplary workflows will be introduced to provide a scaffolding for students’ future research projects.
Introduction to Spatial Analysis in R
Speaker: David Zuchowski, PhD student in Economics at the University of Duisburg-Essen and a Researcher at RWI - Leibniz Institute for Economic Research, one of the leading centers for economic research and evidence-based policy advice in Germany.
Description: This workshop will provide an introduction to the analysis of geospatial data with R. It will give an overview of different types of spatial data, as well as basics of importing, wrangling, and visualization of such data. The workshop will focus on geospatial data visualization, which allows for quick exploration of spatial data in R. In particular, you will learn how to create maps with tmap and ggplot2.
Spatial Data Wrangling with R: A Comprehensive Guide
Speaker: Long Nguyen is a PhD student at SOEP RegioHub at Bielefeld University. He likes to make pretty maps.
Description: This workshop is designed to provide a solid foundation for working with spatial data in R. Starting with fundamental concepts of spatial data types and structures, the workshop provides a systematic overview of techniques for manipulating spatial data, such as spatial aggregation, spatial joins, spatial geometry transformations, and distance calculations. With this focus, the workshop's aim is to give participants a skill set that is easily extendable and transferable to new data and tools. The data wrangling techniques presented will be accompanied by instructions on creating maps – both static and interactive – to quickly explore and present the results of the operations performed.
Fundamentals of Exploratory and Inferential Spatial Data Analysis in R
Speaker: Denys Dukhovnov, Ph.D. student in Demography at University of California, Berkeley. His research revolves around small-area estimation and geographic inequalities in mortality in the United States. He holds a previous M.A. degree in Data Analytics and Applied Social Research, held multiple research positions in social science fields, and currently works as a researcher at the Human Mortality Database (HMD).
Description: This workshop will provide a hands-on overview of the exploratory and inferential spatial data analysis in R. The attendees will become familiar with statistical concepts of spatial adjacency and dependence and with various methods of measuring it (using such indicators as Moran's I, Geary's C, LISA/ELSA plots, etc.), as well as with statistical challenges of working with spatial data (e.g. modifiable areal unit problem or MAUP). In addition, the workshop will provide a foundational overview of inferential spatial analysis, specifically through the application of the basic types of spatial econometric regression models (SAR, SLX, SEM models). An emphasis will be made on the interpretation and reporting of the model performance and results. Prior familiarity with spatial data types and OLS regression is helpful, but not necessary.
Survey Data and Missing Data in R
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will look at how to work with survey data and missing data in R. We will first discuss how survey data is collected and how to account for the survey design, in particular, survey weights when performing data analysis in R. We will then explore missing data by discussing the types of missing data, looking how we can explore missing data. Finally, we will look at different types of missing data imputations, including multiple imputation and see how they can be performed in R.
Working with ChatGPT in R
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy
Description: In this workshop we will learn how you can fully harness the power of ChatGPT to improve your R coding. We will learn how to access ChatGPT directly from R, how to make it write R code, including fairly long and complicated command, debug its (and your) code, translate code from one coding language to another, comment your code, make it more efficient and more! We will also explore some of the drawbacks of ChatGPT and examine when and why you can’t always rely on it.
Introduction to R Shiny
Speaker: Agabek Kabdullin, MSc in data science and political science PhD candidate at the University of Rochester. Does data analysis work for The Children’s Agenda, Rochester NGO. Research interests include online censorship and repression in autocracies.
Description: In this seminar, we will cover an introduction into Shiny, R software that allows users to create interactive web apps. We will (a) learn how to set up basic statistical simulations, (b) learn how to create data-based applications, and (c) cover the nuts and bolts of the user interface in Shiny.
Code-driven Publishing: An Introduction to Quarto
Speaker: Arthur Small is an economist and data scientist specializing in applications to energy, environment, weather and climate. He has held faculty positions at Columbia Business School, Columbia School of International and Public Affairs, the Penn State College of Earth and Mineral Sciences, and (visiting) the Dyson School of Applied Economics and Management at Cornell University, and has worked as a commercial data scientist. He has published in venues including Journal of Political Economy, Review of Economics and Statistics, and Journal of Environmental Economics and Management. Small’s research has been supported by the U.S. National Science Foundation (NSF), the U.S. Environmental Protection Agency, and other entities. He has served on review panels for the U.S. National Academy of Sciences, the NSF, and others. His research has been recognized by the Quality of Research Discovery Award from the Agricultural and Applied Economics Association. His training includes an A.B. in Mathematics from Columbia University, M.S. in Mathematics from Cornell University, and a Ph.D. in Agricultural and Resource Economics from the University of California, Berkeley. He currently serves as Lecturer in the School of Engineering and Applied Sciences at the University of Virginia.
Description: The workshop will provide an introduction to creating and publishing documents using Quarto, a modern platform for creating professional articles, slide decks, websites, and other publications. By way of an introductory example, participants will be walked through the process of crafting and publishing their own personal professional website.
TidyFinance: Empirical asset pricing in R
Speaker: Patrick Weiss, PhD, CFA is a postdoctoral researcher at Vienna University of Economics and Business. Jointly with Christoph Scheuch and Stefan Voigt, Patrick wrote the open-source book www.tidy-finance.org, which serves as the basis for this workshops. Visit his webpage for additional information.
Description: This workshop explores empirical asset pricing and combines explanations of theoretical concepts with practical implementations. The course relies on material available on www.tidy-finance.org and proceeds in three steps: (1) We dive into the most used data sources and show how to work with data from WRDS, forming the basis for the analysis. We also briefly introduce some other possible sources of financial data. (2) We show how to implement the capital asset pricing model in rolling-window regressions. (3) We introduce the widely used method of portfolio sorts in empirical asset pricing. During the workshop, we will combine some theoretical insights with hands-on implementations in R.
Note: for this workshop to get a recording, we ask you to donate to Leleka foundation (here)
TidyFinance: Financial Data in R
Speaker: Patrick Weiss, PhD, CFA is a postdoctoral researcher at Vienna University of Economics and Business. Jointly with Christoph Scheuch and Stefan Voigt, Patrick wrote the open-source book www.tidy-finance.org, which serves as the basis for this workshops. Visit his webpage for additional information.
Description: This workshop explores financial data available for research and practical applications in financial economics. The course relies on material available on www.tidy-finance.org and covers: (1) How to access freely available data from Yahoo!Finance and other vendors. (2) Where to find the data most commonly used in academic research. This main part covers data from CRSP, Compustat, and TRACE. (3) How to store and access data for your research project efficiently. (4) What other data providers are available and how to access their services within R.
Note: for this workshop to get a recording, we ask you to donate to Leleka foundation (here)
Classification modelling for profitable decisions: Hands on practice in R
Speaker: Ágoston Reguly is a Postdoctoral Fellow at the Financial Services and Innovation Lab of Scheller College of Business, Georgia Institute of Technology. His research is focused on causal machine learning methods and their application in corporate finance. He obtained his Ph.D. degree from Central European University (CEU), where he has taught multiple courses such as data analysis, coding, and mathematics. Before CEU he worked for more than three years at the Hungarian Government Debt Management Agency.
Description: This workshop will implement methods of probability prediction and classification analysis for the binary target variable. This workshop is a follow-up to Gábor Békés’s workshop on the key concepts and (theoretical) methods for the same subject. We will use R via RStudio to apply probability prediction, classification threshold, loss function, classification, confusion table, expected loss, the ROC curve, AUC, and more. We will use linear probability models, logit models as well as random forests to predict probabilities and classify. In the workshop, we follow the case study on firm defaults using a dataset on financial and management features of firms. The workshop material is based on a chapter and a case study from the textbook of Gábor Békés and Gábor Kézdi (2021): Data Analysis for Business, Economics, and Policy, Cambridge University Press. The workshop will not only implement the key concepts, but the focus will be on data wrangling and modeling decisions we make for a real-life problem.
Introduction to efficiency analysis in R
Speaker: Olha Halytsia, PhD Economics student at the Technical University of Munich. She has a previous working experience in research within the World Bank project, also worked at the National Bank of Ukraine.
Description: In this workshop, we will cover all steps of efficiency analysis using production data. Firstly, we will introduce the notion of efficiency with a special focus on technical efficiency and briefly discuss parametric (stochastic frontier model) and non-parametric approaches to efficiency estimation (data envelopment analysis). Subsequently, with help of "Benchmarking" and "frontier" R packages, we will get estimates of technical efficiency and discuss the implications of our analysis. This workshop may be useful for beginners who are interested in working with input-output data and want to learn how R can be used for econometric production analysis.
A Gentle and Applied Introduction to Rcpp
Speaker: Dirk Eddelbuettel is involved with many R packages on CRAN; co-creator of the Rocker Project providing R Docker containers; the Debian/Ubuntu maintainer for R, many CRAN packages, and some other quantitative software; behind several initiatives to make binary packages more easily available ranging from Quantian to the more recent r2u Project;
an elected board member of the R Foundation; an adjunct Clinical Professor at the University of Illinois Urbana-Champaign; an editor at the Journal of Statistical Software; and a Principal Software Engineer at TileDB. He holds a MA and PhD in Mathematical Economics from EHESS in France, and a MSc in Industrial Engineering from KIT in Germany.
Description:R has become the lingua franca of statistical research and applications. It provides an open and extensible system for which the Rcpp package has become the most widely-used package for extending R via native code. This talk aims to gently introduce going to compiled code without fear thanks to sophisticated tooling R and Rcpp provide which make the otherwise complicated and sometimes feared steps of compiling, linking,
loading, and launching compiled code a relative breeze that is accessible directly from R relying on built-in converters to facilitate exchange to and from R for all key data types. The talk will highlight key aspects, and motivations, of using Rcpp---and will also warn of a few common pitfalls. The second half will be centered around a complete worked example of a package using RcppArmadillo that we will build from scratch. Pointers for further study as well as to additional examples will also be provided.
Working with image data in R
Speaker: Wolfgang Huber is the author of several R packages for statistical analysis of “omics” data and a co-founder of the Bioconductor project. He co-authored the textbook Modern Statistics for Modern Biology with Susan Holmes. He has worked on cellular phenotyping from genetic and chemical screens and is a co-author of the EBImage package). He is a senior group leader at the European Molecular Biology Laboratory, where he co-directs the Molecular Medicine Partnership Unit and the Theory Transversal Theme. Scientific Homepage is here
Description: Images are a rich source of data. In this workshop, we will see how quantitative information can be extracted from images. We will use segmentation to identify objects, measure their properties such as size, intensity distribution moments, shape and morphology descriptors, and explore statistical models to describe spatial relationships between them. The workshop includes a hands-on demonstration of the EBImage package for R, which provides many functions for feature extraction and visualization. Application examples will be taken from biological imaging of cells and tissues, the methods should also be applicable to other types of data.
Introduction to Deep Learning with R
Speaker: Eran Raviv is an expert researcher at APG Asset Management, working for the Digitalization & Innovation department. His academic papers are published in top-tier journals. In his present role, Dr. Raviv helps the organization develop its Data Science capabilities and he is engaged in both strategic planning and leading bottom-up initiatives.
Description: The purpose of this workshop is to offer an introductory understanding of deep learning, regardless of your prior experience. It is important to note that this workshop is tailored to those who are absolute beginners in the field. We therefore begin with few necessary fundamental concepts, after which we cover the basics of deep learning, including topics such as what is actually being learned in deep learning, what makes it "deep," and why it is such a popular field. We will also cover how you can estimate deep learning models in R using the neuralnet package. You should attend this workshop if you heard about deep learning and would like to know more about it.
Automating RMarkdown/Quarto reports
Speaker: Indrek Seppo, a seasoned R programming expert, brings over 20 years of experience from the academic, private, and public sectors to the table. With more than a decade of teaching R under his belt, Indrek's passionate teaching style has consistently led his courses to top the student feedback charts and has inspired hundreds of upcoming data analysts to embrace R (and Baby Shark).
Description: For those who already know the basics of RMarkdown/Quarto, I invite you to delve into the world of report automation to streamline your workflow and enhance efficiency. This session will introduce the use of parameters, among other techniques, to create dynamic and customizable reports without repetitive manual work. Learn how to harness the power of R to generate tailored content for diverse audiences, effortlessly updating data, analyses, and visualizations with just a few clicks.
A toolbox for debugging and refactoring in R
Speaker: Antoine Fabri is a data scientist, R consultant, R developper and teacher/coach at Cynkra in Zurich, Switzerland. Through his experience in the open source community, as an employee in various industries and as a consultant he had the opportunity to witness a lot of R code of varying context and quality, and write a fair amount on the way. The recurrent issues he faced led him to grow a collection of tools that he shares with the community.
Description: Debugging and refactoring are some of the most time consuming tasks a developper or data scientist encounter. They're also not always pleasant. With the right approach and the right tools the experience can be much improved however and I intend to show we can soothe the pain a little, and hopefully even make it fun sometimes. I'd like to talk first a bit about my general approach to debugging and refactoring, review common issues and general tools, then show a glimpse of some of my own packages can be used to respectively understand and visualize the logic of your code (using {flow}), the structure of your objects (using {constructive}), and monitor the execution of your code (using {boomer}).
Building reproducible analytical pipelines in R
Speaker: Bruno Rodrigues. Bruno is currently employed as the head of the statistics department at the Ministry of Higher education and Research in Luxembourg. Before joining the public sector, Bruno worked as a data science consultant in one of the big four accounting companies, and before that as a teaching and research assistant. Bruno discovered tools such as Git and software carpentry techniques while working on his PhD. These tools and techniques served him well for the past decade, and Bruno has been consistently sharing his knowledge on his blog during that time.
Description: This workshop will present some of thetools and techniques that you can use to build reproducible analytical pipelines. We will also learn how to apply these techniques in R. Making sure that your work is reproducible is extremely important because making a pipeline reproducible ensures that the code is of high quality, well-document and tested by design. This way, you won’t have problems communicating results and your collaborators, or future you, will have no trouble understanding the project and re-building it if an update is necessary. In case the project needs to be audited, setting it up as a reproducible analytical pipeline will also will make auditing easy!
It doesn’t matter if you work in research or in the private sector, if you train complex machine learning models or write reports that focus on descriptive statistics: any type of project in any sector can benefit from the ideas presented in this workshop. We will learn about functional and literate programming, and learn how to apply these in R, using {renv}, {targets} and Docker.
Introduction to Supervised Text Classification in R
Speaker: Marius Sältzer is a Professor of Digital Social Science at the University of Oldenburg. He works on party competition and political communication, in particular on social media, using large-scale data collection and text analysis methods.
Description: Text analysis is a staple method in the social sciences and is increasingly influencing other areas in times of large language models. while transformers are very effective they are computationally expensive and lack intepretability. This course introduces supervised machine learning on textual data, including theoretical foundations of human annotation, distributional semantics and word embeddings. Based on quanteda, we apply text classification like Naïve Bayes and Neural Networks in R on a CPU-only laptop.
Machine Learning workflow with tidymodels in R
Speaker: Kelly Bodwin is an Associate Professor of Statistics and Data Science at California Polytechnic State University (Cal Poly, San Luis Obispo). She received a BS in Statistics from Harvard University followed by a PhD in Statistics and Operations Research from the University of North Carolina at Chapel Hill. Her current research focus is software development of tools for education and for unsupervised machine learning. Kelly also works on collaborative data science projects in the biological sciences, social sciences, agriculture, and the humanities. She teaches upper- and lower-division courses in statistics and data science, usually with a focus on computing and/or predictive modeling and machine learning.
Description: This workshop will give a first introduction to the {tidymodels} packages and framework. If you've been wondering what the buzz is about, but you haven't found the time to take your first steps on your own, this workshop is for you! We'll mainly focus on the core workflow of a {tidymodels} analysis, using a few simple models as examples. Then we'll practice automatic cross-validation, tuning, and model selection using {tidymodels} shortcuts, through a real data example. Attendees are expected to have a basic working knowledge of R and tidyverse syntax; as well as a basic understanding of predictive modeling, e.g. linear regression and decision trees.
An Open Source Framework for Choice Based Conjoint Experiments in R
Speaker: John Paul Helveston, John Paul (JP) is an Assistant Professor at George Washington University in the Department of Engineering Management and Systems Engineering. His research focuses on understanding how consumer preferences, market dynamics, and policy affect the emergence and adoption of low-carbon technologies, such as electric vehicles and renewable energy technologies. He also studies the critical relationship between the US and China in developing and mass producing these technologies. He has expertise in discrete choice modeling, conjoint analysis, exploratory data analysis, interview-based research methods, and the R programming language. He speaks fluent Mandarin Chinese and has conducted extensive fieldwork in China. He is also an accomplished violinist and swing dancer. John holds a Ph.D. and M.S. in Engineering and Public Policy from Carnegie Mellon University and a B.S. in Engineering Science and Mechanics (ESM) from Virginia Tech.
Description: Choice based conjoint (CBC) experiments are a critical tool for measuring preferences, yet most practitioners rely on closed source enterprise software to design and implement their survey experiments. This presentation will demonstrate an open source framework for implementing CBC experiments in R. The framework includes designing and testing the experiment with the cbcTools package, implementing the survey with the formr.org survey platform, and modeling results with the logitr package. Combined, the three tools offer a free and fully open source approach to the entire CBC experiment workflow. The framework is also quite flexible and can be integrated into workflows that use enterprise software with relative ease.
Building Websites in R with Distill
Speaker: Jenny Sloane, postdoc fellow in Health Services Research and Development at the Center for Innovations in Quality, Effectiveness and Safety, which is associated with the Houston VA and Baylor College of Medicine. I received my PhD in cognitive psychology from University of New South Wales. My research interests include improving diagnostic decision-making, reducing errors in medicine, and studying the effects of interruptions and time-pressure on decision-making.
Description: This will be an interactive webinar where we will build a website from scratch in R using the distill package. By the end of the webinar, you will have a fully functioning and live website. I will also show you some cool tips and tricks that I have learned through my experiences building websites in R.
Additional information: If you wish to follow along and build your own website, please make sure to have R, RStudio, and Git installed and please have a GitHub account set up ahead of time.
Preparing Data for Modeling Using the Recipes R Package
Speaker: Max Kuhn is a software engineer at Posit (née RStudio). He is working on improving R's modeling capabilities and maintaining about 30 packages, including caret. He was a Senior Director of Nonclinical Statistics at Pfizer and had been applying models in the pharmaceutical and diagnostic industries for over 18 years. Max has a Ph.D. in Biostatistics. He, and Kjell Johnson, wrote the book Applied Predictive Modeling, which won the Ziegel award from the American Statistical Association. Their second book, Feature Engineering and Selection, was published in 2019, and his book Tidy Models with R, was published in 2022.
Description:This workshop will illustrate of the recipes package (part of the tidymodels ecosystem) can be used to prepare your data for modeling. Recipes are part model.matrix() and part dplyr; they can sequentially execute pre-processing steps to create the best representation of the predictor data for a model.
Deep Learning with torch in R
Speaker: Daniel Falbel, Daniel is a software engineer at Posit and maintains the 'torch' R package and its ecosystem. He previously maintained the TensorFlow and Keras packages.
Description: Deep Learning has grown exponentially in recent years and has powered breakthroughs in fields such as computer vision and natural language processing. In this workshop you will learn the basics of torch and its ecosystem, build and train deep learning models with torch.
Introduction to Qualitative Comparative Analysis (QCA) using R
Speaker: Ingo Rohlfing, Ingo Rohlfing is Professor of Methods of Empirical Social Research at the University of Passau. He is doing research on social science methods with a focus on qualitative methods (case studies and process tracing), Qualitative Comparative Analysis, multimethod research and research integrity.
Description: What are the conditions that produce stable coalition governments? What conditions are necessary for married couples not getting divorced? If you are interested in research questions like these or similar ones, QCA should be one of the first methods to consider for answering them. QCA is the go-to method for analyzing set relationships using any number of cases (small, medium, large) and with any kind of data (micro, meso, macro).
The participants of this course are introduced to the fundamentals of set relations and QCA, and the workflow of a QCA study using R. You are be introduced to the basic principles and requirements of coherent QCA designs and learn how to implement them using R. We cover all fundamental steps of a QCA study, including calibration; a necessity analysis (potentially); truth table formation; truth table minimization and interpretation of results.
Network Analysis with R
Speaker: David Schoch is the team lead for "Transparent Social Analytics" in the Department Computational Social Science at GESIS in Cologne. Before joining GESIS, David was a Presidential Fellow in the Department of Sociology at the University of Manchester, affiliated with the "Mitchell Centre for Social Network Analysis". He has a PhD in Computer Science from the University of Konstanz and is the creator and maintainer of several network analysis R packages.
Description: Network analysis is a multidisciplinary field that delves into the intricate web of connections and interactions among entities, whether they are individuals, organizations, or nodes in a complex system. By employing graph theory, statistical methods, and computational tools, network analysis unveils the hidden patterns, structures, and dynamics that underlie these relationships.
In this workshop, I will introduce the package ecosystem for network analysis in R. I provide an overview of the key packages to conduct network studies and discuss some practical aspects. No network theory will be introduced, but I provide pointers for those who are interested to learn more beyond practical skills.
Using Spatial Data with R Shiny
Speaker: Michael C. Rubin, is an Engineer, MIT Data Scientist and Co-Founder of Open Digital Agriculture (former ODAPES), a Start-up with the mission of democratizing Digital Agriculture. Open Digital Agriculture leverages R-Shiny, along with GIS technology and Artificial Intelligence to include the overlooked 540 Million smallholder farmers into the digital transformation. Michael was a 2 times speaker at the global R-Shiny conference.
Description: This workshop is about how to use R-Shiny in the context of geographic information systems (GIS). We will initially cover the R Leaflet package and learn how geographic information, from points to raster files, can be displayed in an R-Shiny app. During the work, we will develop a nice R-Shiny App, which allows us not only to display, but also to manipulate GIS related data. On the way there, we will touch some interesting Geostatistical concepts. Knowledge in R is required to follow the course and previous exposure to R-Shiny and some GIS techniques would be helpful, but you can follow the course even without it.
Factor Analysis in R
Date: Thursday, February 1st, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker:Gagan Atreya is a quantitative social scientist and data science consultant based in Los Angeles, California. He has graduate degrees in Experimental Psychology and Quantitative Political Science from The College of William & Mary in Virginia and The University of Minnesota respectively. He has multiple years of experience in data analysis and visualization in the social sciences - both as a researcher and a consultant with faculty and researchers around the world. You can find him in Bluesky at @gaganatreya.bsky.social.
Description: This workshop will go through the basics of Exploratory and Confirmatory Factor Analysis in the R programming language. Factor Analysis is a valuable statistical technique widely used in Psychology, Economics, Political Science, and related disciplines that allows us to uncover the underlying structure of our data by reducing it to coherent factors. The workshop will heavily (but not exclusively) utilize the "psych" and "lavaan" packages in R. Although open to everyone, a beginner level familiarity with R and some background/interest in survey data analysis will be ideal to make the most out of this workshop.
Creating R packages for data analysis and reproducible research
Date: Thursday, February 29th, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Fred Boehm is a biostatistics and translational medicine researcher living in Michigan, USA. His research focuses on statistical questions that arise in human genetics studies and their applications to clinical medicine and public health. He has extensive teaching experience as a statistics lecturer at the University of Wisconsin-Madison (https://www.wisc.edu) and as a workshop instructor for The Carpentries (https://carpentries.org/index.html). He enjoys spending time with his nieces and nephews and his two dogs. He also blogs (occasionally) at https://fboehm.us/blog/.
Description: Participants will learn to use functions from several packages, including `devtools` and `rrtools`, in the R ecosystem, while learning and adhering to practices to promote reproducible research. Participants will learn to create their own R packages for software development or data analysis. We will also motivate the need to follow reproducible research practices and will discuss strategies and open source tools.
AI Use Cases for R Enthusiasts
Speaker: Dr. Albert Rapp is a mathematician with a fascination for the blend of Data Analytics, Web Development, and Visualization. He applies his expertise as a business analyst, focusing on AI, cloud computing, and data analysis. Outside of his professional pursuits, Albert enjoys engaging with the community by sharing his insights and knowledge on platforms like LinkedIn, YouTube, and through his video courses.
Description: Everyone is talking about AI. And for good reasons: It's a powerful tool that can enhance your productivity as a programmer as well as help you with automated data processing tasks. In this workshop, I share R-specific and general AI tools and workflows that I use for my programming, blogging and video projects. By the end of this session, participants will be equipped with fresh ideas and practical strategies for using AI in their own endeavors.
Conducting Simulation Studies in R
Speaker: Greg Faletto is a statistician and data scientist at VideoAmp, where he works on causal inference. Greg completed his Ph.D. in statistics at the University of Southern California in 2023. His research focused on developing machine learning methods has been published in venues like the International Conference on Machine Learning and the Proceedings of the National Academy of Sciences. Greg has taught classes at USC on data science and communicating insights from data, and he has previously presented his research and led workshops at venues including USC, the University of California San Francisco, the University of Copenhagen, Data Con LA, and IM Data Conference.
Description: In simulation studies (also known as Monte Carlo simulations or synthetic data experiments), we generate data sets according to a prespecified model, perform some calculations on each data set, and analyze the results. Simulation studies are useful for testing whether a methodology will work in a given setting, assessing whether a model “works” and diagnosing problems, evaluating theoretical claims, and more. In this workshop, I’ll walk through how you can use the R simulator package to conduct simple, reproducible simulation studies. You’ll learn how to carry out the full process, including making plots or tables of your results.
Cluster Analysis in R
Speaker: Sejal Davla is a neuroscientist and data scientist who works with industry and government clients on projects at the intersection of science, data, and policy. She received her PhD in neuroscience from McGill University in Canada, where her research identified new pathways in brain development and sleep. She is an advocate for open science and reproducibility and runs R programming workshops to promote best data practices.
Description: Some datasets are unlabeled without obvious classifiers. Unsupervised machine learning methods, such as clustering, allow finding patterns and homogeneous subgroups in unlabeled data. This workshop will cover the basics of cluster analysis and how to perform clustering using k-means and hierarchical clustering methods. The goal of the workshop is to help identify datasets for clustering, learn to visualize and interpret models, validate clusters, and highlight practical issues.
Making tables in R with the gt package
Speaker: Rich Iannone, Rich is a software engineer that enjoys working with R and Python. He likes to create packages that help people to accomplish things. While Rich very clearly digs programming, he enjoys other things as well! Examples include: playing and listening to music, reading books, watching films, meeting up with friends, and wandering through the many valleys and ravines of the Greater Toronto Area.
Description: The goal of the {gt} package is to make building tables for publication a hassle-free process while giving you the freedom to be creative. If you’re familiar with {ggplot2}, the feel of working with {gt} isn’t too far off. Join this workshop for background on the goals of {gt} and an extensive tour of its features by the package developer. There are a lot of functions in this package but we’ll go through the most important ones and learn how to make beautiful tables!
Polytomous Latent Class Analysis and Regression in R
Speaker: Lana Bojanić is a research associate and PhD candidate at the University of Manchester. With over 7 years of experience using R, she is also a co-founder of the R user group at the University of Manchester and R Ladies Zagreb, Croatia. Lana is passionate about introducing people to R and supporting them during their transition to full-time R users.
Description: Polytomous (multi-category) data is common in many fields that utilise surveys, tests or- assessments. This workshop will deal with latent class analysis and latent class regression analysis of this data type, using PoLCA package. Furthermore, we will cover the necessary data preparation for this analysis, specifying the model, and calculating/extracting fit values. Finally, we will look into different ways of plotting results for this analysis.
Meta-Analysis in R
Speaker: Matthew B. Jané is a graduate student in quantitative psychology at the University of Connecticut. His interests involve data visualization and statistical methods for meta-analysis and psychometric measurement. He is affiliated with the Systematic Health Action Research Program where he is advised by Dr. Blair T. Johnson.
Description: In this workshop, we will learn how to conduct meta-analysis in R using real data sets. We will first discuss how effect sizes such as standardized mean differences, correlations, and odds ratios are calculated. Then we will discuss three types of meta-analytic models (i.e., common, fixed, and random effects models) and how we can fit each model in R. Finally, we will visualize how study characteristics can moderate effect sizes with the help of meta-regression.
Introduction to Bayesian Structural Equation Modeling in R
Speaker: Esteban Montenegro-Montenegro serves as a professor and researcher at California State University, Stanislaus. He holds a doctoral degree in Educational Psychology with a concentration in Research, Methods, Statistics, and Evaluation from Texas Tech University. Currently, Dr. Montenegro devotes his time to teaching foundational topics in statistics using R. Moreover, he is actively engaged in learning and instructing advanced concepts in Bayesian inference and latent variable models.
Description: The workshop is designed to offer an introductory overview of Structural Equation Modeling (SEM) in R, followed by a simplified explanation of Bayesian inference through various examples. In the latter part of the workshop, participants will learn to estimate a Bayesian SEM model using the blavaan package in R. This workshop is ideal for those seeking a user-friendly introduction to SEM and Bayesian inference in R. Basic skills in R, such as opening datasets, understanding objects, functions, and loops, are assumed due to time constraints. Comprehensive materials and additional examples will be provided for further practice at home.
Probabilistic Network Inference and Analysis in R and Python
Speaker: Guillermo de Anda Jáuregui is a Mexican researcher at the National Institute of Genomic Medicine. His work focuses on using complex systems and data science to address biomedical and public health questions. Guillermo is a member of the Researchers for Mexico program, the National Researcher System, and a collaborator at the Center for Complexity Sciences at the Universidad Nacional Autónoma de México. With a strong interdisciplinary approach, his research bridges the gap between data science and complex biological phenomena.
Description: Complex systems with many interacting elements often exhibit non-trivial patterns of statistical dependencies. Modeling these systems as complex networks provides a powerful framework to characterize them at macro, meso, and microscale levels. In this workshop, we will explore the fundamentals of reconstructing such networks, analyze the significance of different structures, and apply these concepts to the use case of gene regulation. Participants will gain hands-on experience in understanding and comparing network structures and learn how these models can be used to uncover the dynamics of complex systems.
Introduction to Interpretable Machine Learning in R
Speaker: Andreas Hofheinz, Andreas is a Data Analytics Consultant at Munich Re, holding a master’s degree in statistics from LMU Munich. In his role, he focuses on designing, implementing, and managing data analytics and AI use cases across the company, as well as delivering international Data & AI training sessions. Before joining Munich Re, Andreas worked in consulting, primarily on digital transformation projects across various industries. He has a keen interest in open source programming and is co-author of the leafdown and counterfactuals R packages.
Description: Interpretable machine learning (IML) methods are crucial for ensuring model trust, accountability, regulatory compliance, and enhancing model performance. This course provides an introduction to key concepts and methods in IML.
We start by exploring the differences between interpretable models, such as linear regression and decision trees, and black box models, like random forests and gradient boosting.
The focus then shifts to interpreting black box models using model-agnostic methods, which separate explanation from the model itself. You'll learn about global model-agnostic methods like Partial Dependence Plots (PDP) and Permutation Feature Importance, which describe average model behavior, and local model-agnostic methods like Individual Conditional Expectation (ICE) plots and Local Surrogate Models (LIME) for individual predictions. The hands-on, code-first approach ensures you gain practical experience with several IML methods using R.
It is recommended to have at least basic machine learning knowledge and some programming experience (ideally in R).
Analyzing Time Series at Scale with Cluster Analysis in R
Speaker: Rami Krispin is a data science and engineering manager who mainly focuses on time series analysis, forecasting, and MLOps applications.
He is passionate about open source, working with data, machine learning, and putting stuff into production. He creates content about MLOps and recently released a course - Data Pipeline Automation with GitHub Actions Using R and Python, on LinkedIn Learning, and is the author of Hands-On Time Series Analysis with R.
Description: One of the challenges in traditional time series analysis is scalability. Most of the analysis methods were designed to handle a single time series at a time. In this workshop, we will explore methods for analyzing time series at scale. We will demonstrate how to apply unsupervised methods such as cluster analysis and PCA to analyze and extract insights from multiple time series simultaneously. This workshop is based on Prof. Rob J Hyndman's paper about feature-based time series analysis.
Standardising R Projects with the ProjectTemplate package
Speaker: Michael Rasmussen is a data analyst based in Melbourne, Australia. Michael is passionate about using data visualization and machine learning to explore data, answer questions and provide insights for decision making. He has a rich background of work experiences with strengths developed as both a psychologist and data scientist, strong theoretical statistical background and experiences in machine learning.
Description: The aim of this workshop is to help attendees understand why standardised R projects are beneficial for the user, colleagues and the wider organisation. Attendees will then be introduced to ProjectTemplate, a package that enables users to support R project workflows, through batch processing the importation of data, preparation and final analysis in a reproducible, effortless manner. Attendees will also be shown how the structure of the project files and workflows can be modified to suit their needs.
Econometrics in R
Introduction to Regression Analysis with R
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will discuss the basics of the regression analysis (what it is, why and when it should be used and how to implement it in R). In particular, we will cover ordinary least squares (OLS) regression, different types of variables, interaction terms, multicollinearity, heterogeneity and how to produce regression tables and interpret results. This workshop may be useful both to those who have never learned regression analysis and those who have but want to learn how to implement it in R.
Introduction to Time Series and Panel Data with R
Speaker: Olha Halytsia, PhD Economics student at the Technical University of Munich. She has a previous working experience in research within the World Bank project, also worked at the National Bank of Ukraine.
Description: In this workshop, we will briefly discuss the basics of the time series and panel data analysis. In particular, we will cover time series visualization in R, simple autoregressive models, and linear panel models (fixed and random-effects models). This workshop may be useful for those who have never learned time series and panel data analysis and those who have very basic knowledge but want to learn how to implement it in R.
Introduction to Causal Inference with R
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will cover the basics of causal inference - a set of approaches used by economists used to establish what is a cause of a certain phenomenon. In particular, we will cover instrumental variables, regression discontinuity design, difference-in-difference, and synthetic controls, and will discuss how to implement each of these approaches in R. It is recommended to have at least basic knowledge of econometrics prior to the workshop.
Bayesian multilevel modeling in R with brms
Speaker: Paul Bürkner is a statistician currently working as a Junior Research Group Leader at the Cluster of Excellence SimTech at the University of Stuttgart (Germany). He is interested in a wide range of research topics most of which involve the development, evaluation, implementation, or application of Bayesian methods. He is the author of the R package brms and member of the Stan Development Team. Previously, Paul studied Psychology and Mathematics at the Universities of Münster and Hagen (Germany) and did his PhD in Münster about optimal design and Bayesian data analysis. He has also worked as a Postdoctoral researcher at the Department of Computer Science at Aalto University (Finland).
Description: The workshop will be about Bayesian multilevel models and their implementation in R using the package brms. At start there will be a short introduction to multilevel modeling and to Bayesian statistics in general followed by an introduction to Stan, which is an incredibly flexible language to fit open-ended Bayesian models. I will then explain how to access Stan using just basic R formula syntax via the brms package. It supports a wide range of response distributions and modeling options such as splines, autocorrelation, or censoring all in a multilevel context. A lot of post-processing and plotting methods are implemented as well. Some examples from Psychology and Medicine will be discussed.
Structural Equation Modeling in R with the Lavaan package
Speaker: Nino Gugushvili is a post-Doc researcher at the Department of Work and Social Psychology at Maastricht University.
Description: In this workshop, we will go over the basics of structural equation modelling (SEM). We will talk about what SEM is and cover the essential steps of SEM. Next, we will learn path analysis (SEM with observed variables), confirmatory factor analysis, and full SEM (SEM with latent variables + observed variables). Along the way, we will also talk about revising our models and interpreting the results, and we’ll do all this in R, using the Lavaan package.
Generalized Additive Models in R
Speaker: Gavin Simpson, Gavin is a statistical ecologist and freshwater ecologist/palaeoecologist. He has a B.Sc. in Environmental Geography and a Ph.D. in Geography from University College London (UCL), UK. After submitting his Ph.D. thesis in 2001, Gavin worked as an environmental consultant and research scientist in the Department of Geography, UCL, before moving, in 2013, to a research position at the Institute of Environmental Change and Society, University of Regina, Canada. Gavin moved back to Europe in 2021 and is now Assistant Professor of Applied Statistics in the Department of Animal and Veterinary Sciences at Aarhus University, Denmark. Gavin's research broadly concerns how populations and ecosystems change over time and respond to disturbance, at time scales from minutes and hours, to centuries and millennia. Gavin has developed several R packages, including gratia, analogue, and cocorresp, he helps maintain the vegan package, and can often be found answering R- and GAM-related questions on StackOverflow and CrossValidated.
Description: Generalized Additive Models (GAMs) were introduced as an extension to linear and generalized linear models, where the relationships between the response and covariates are not specified up-front by the analyst but are learned from the data themselves. This learning is achieved by representing the effect of a covariate on the response as a smooth function, rather than following a fixed form (linear, quadratic, etc). GAMs are a large and flexible class of models that are widely used in applied research because of their flexibility and interpretability.
The workshop will explain what a GAM is and how penalized splines and automatic smoothness selection methods work, before focusing on the practical aspects of fitting GAMs to data using the mgcv R package, and will be most useful to people who already have some familiarity with linear and generalized linear models.
Customizing Regression & Descriptive Tables in R with stargazer and starpolishr
Speaker: Dariia Mykhailyshyna, PhD Economics student at the University of Bologna. Previously worked at a Ukrainian think tank Centre of Economic Strategy.
Description: In this workshop, we will look at how to make and customize regression and summary statistics tables in R using stargazer and starpolishr packages. We will start by making basic tables and then will use a variety of tools that these packages have to see how we can customize them. In particular, we will add or remove different statistics to the tables, change labels of variables, save the output in different formats. We will also look at how we can make tables in style of particular academic journals as well as making somewhat more complicated tables, such as tables with several panels and sideways tables.
Introduction to Mixed-effects Models in R
Speaker: Philip Leftwich is an Associate Professor of Genetics and Data Science at the University of East Anglia, Norwich, UK. He teaches R programming and statistics on various modules and workshops at undergraduate and postgraduate levels. His research interests include genetics, genomics and synthetic biology as tools to help combat agricultural and disease-carrying insect pests.
Description: Mixed-effects models are indispensable in analyzing data with hierarchical or nested structures. Unlike traditional linear regression models, mixed-effects models account for both fixed effects (applying to the entire population) and random effects (varying across groups). This unique capability allows researchers to examine how individual and group-level factors work together simultaneously, providing a comprehensive understanding of the data. In fields like social sciences, education, biology, and economics, where hierarchical data is prevalent, mixed-effects models significantly enhance the precision and reliability of statistical analyses. Mastering these models empowers researchers to extract valuable insights from complex datasets effectively.
In this introductory workshop, we will cover the basics of analyzing hierarchical data. Participants will learn about the difference between fixed and random effects, model formulation, estimation, and interpretation. We will discuss assumptions, model comparison and selection, practical implementation with R, and model validation. We will work through real-world examples to showcase the applications and benefits of mixed-effects models in various fields.
Introduction to Propensity Score Analysis with R
Speaker: Dr. Jason Bryer is currently an Assistant Professor and Associate Director in the Data Science and Information Systems department at the City University of New York. He is currently the Principal Investigator of the FIPSE ($3 million #P116F150077) and IES funded ($3.8 million R305A210269) Diagnostic Assessment and Achievement of College Skills (DAACS), which is a suite of technological and social supports designed to optimize student learning. Dr. Bryer’s other research interests include quasi-experimental designs with an emphasis on propensity score analysis, data systems to support formative assessment, and the use of open source software for conducting reproducible research. He is the author of over a dozen R packages, including three related to conducting propensity score analyses. When not crunching numbers, Jason is a wedding photographer and proud dad to three boys.
Description: The use of propensity score methods (Rosenbaum & Rubin, 1983) for estimating causal effects in observational studies or certain kinds of quasi-experiments has been increasing in the social sciences (Thoemmes & Kim, 2011) and in medical research (Austin, 2008) in the last decade. Propensity score analysis (PSA) attempts to adjust selection bias that occurs due to the lack of randomization. Analysis is typically conducted in two phases where in phase I, the probability of placement in the treatment is estimated to identify matched pairs or clusters so that in phase II, comparisons on the dependent variable can be made between matched pairs or within clusters. R (R Core Team, 2012) is ideal for conducting PSA given its wide availability of the most current statistical methods vis-à-vis add-on packages as well as its superior graphics capabilities.
This workshop will provide participants with a theoretical overview of propensity score methods as well as illustrations and discussion of PSA applications. Methods used in phase I of PSA (i.e. models or methods for estimating propensity scores) include logistic regression, classification trees, and matching. Discussions on appropriate comparisons and estimations of effect size and confidence intervals in phase II will also be covered. The use of graphics for diagnosing covariate balance as well as summarizing overall results will be emphasized.
Introduction to mixed frequency data models in R
Speaker:Jonas Striaukas, Jonas is an assistant professor of statistics and finance and Marie Skłodowska-Curie Action fellow at the Copenhagen Business School, Department of Finance. His main research interests are econometrics/statistics and applications of machine learning methods to financial and macro econometrics. In particular, Jonas research interests are regularized regression models for mixed frequency data and factor-augmented sparse regression models. Before joining the Copenhagen Business School in 2022, he was a research fellow at the Fonds de la Recherche Scientifique—FNRS and Université Catholique de Louvain, where he carried out my PhD under the supervision of prof. Andrii Babii (UNC Chapel Hill) and prof. Eric Ghysels (UNC Chapel Hill).
Description: The course will cover statistical models for mixed frequency data analysis and their applications using R statistical software. First, we will look into classical mixed frequency, called MIDAS, regression models and their applications to nowcasting. We will then cover multivariate models such vector autoregression (VAR) and their application in mixed frequency data settings. Lastly, we will cover regularized MIDAS regressions and its extension to factor-augmented regression case.
Introduction to Causal Machine Learning estimators in R
Speaker: Michael Knaus is Assistant Professor of “Data Science in Economics” at the University of Tübingen. He is working at the intersection of causal inference and machine learning for policy evaluation and recommendation.
Description:You want to learn about Double Machine Learning and/or Causal Forests for causal effect estimation but are hesitant to start because of the heavy formulas involved? Or you are already using them and curious to (better) understand what happens under the hood? In this course, we take a code first, formulas second approach. You will see how to manually replicate the output of the powerful DoubleML and grf packages using at most five lines of code and nothing more than OLS. After seeing that everything boils down to simple recipes, the involved formulas will look more friendly. The course establishes therefore how things work and gives references to further understand why things work.
Optimal policy learning based on causal machine learning in R
Speaker: Martin Huber earned his Ph.D. in Economics and Finance with a specialization in econometrics from the University of St. Gallen in 2010. Following this, he served as an Assistant Professor of Quantitative Methods in Economics at the same institution. He undertook a visiting appointment at Harvard University in 2011–2012 before joining the University of Fribourg as a Professor of Applied Econometrics in 2014. His research encompasses methodological and applied contributions across various fields, including causal analysis and policy evaluation, machine learning, statistics, econometrics, and empirical economics. Martin Huber's work has been published in academic journals such as the Journal of the American Statistical Association, the Journal of the Royal Statistical Society B, the Journal of Econometrics, the Review of Economics and Statistics, the Journal of Business and Economic Statistics, and the Econometrics Journal, among others. He is also the author of the book "Causal Analysis: Impact Evaluation and Causal Machine Learning with Applications in R."
Description: Causal analysis aims to assess the causal effect of a treatment, such as a training program for jobseekers, on an outcome of interest, such as employment. This assessment requires ensuring comparability between groups receiving and not receiving the treatment in terms of outcome-relevant background characteristics (e.g., education or experience). Causal machine learning serves two primary purposes: (1) generating comparable groups in a data-driven manner by detecting and controlling for characteristics that significantly affect the treatment and outcome, and (2) assessing the heterogeneity of treatment effects across groups differing in observed characteristics. Closely related to effect heterogeneity analysis is optimal policy learning, which seeks to optimally target specific subgroups with treatment based on their observed characteristics to maximize treatment effectiveness. This workshop introduces optimal policy learning based on causal machine learning, facilitating (1) data-driven segmentation of a sample into subgroups and (2) optimal treatment assignment across subgroups to maximize effectiveness. The workshop also explores applications of this method using the statistical software "R" and its interface "R Studio."
Structural and Predictive Macro Analyses using the R Package bsvars
Speaker: Tomasz Wozniak, Tomasz is an econometrician who is developing new methods for empirical macroeconomic analyses. He codes these algorithms in C++ for R applications using Rcpp and authors the R package bsvars for Bayesian estimation of structural vector autoregressions. He is a senior lecturer at the University of Melbourne and co-organises the annual Melbourne Bayesian Econometrics Workshop.
Description: Quantifying the dynamic effects of well-isolated shocks on macro and financial aggregates is essential for governing institutions, academia, and business. This workshop presents a complete workflow for such analyses and focuses on various methods that facilitate interpretations and visualisations of data insights. It briefly introduces the necessary background on Bayesian Structural VARs. All this is complemented by a series of exercises, ensuring a hands-on learning experience. Please make sure to install the package following the instructions at https://bsvars.github.io/bsvars/#installation
Advanced Panel Data Analysis in R
Speaker: Tobias Rüttenauer is an Assistant Professor of Quantitative Social Science at University College London. His research focuses on the social aspects of climate change and environmental pollution, as well as quantitative research methods, particularly in spatial and panel data methods.
Description: This course provides a hands-on introduction to advanced panel data methods. It briefly covers the basic concepts of random effects (RE) and fixed effects (FE) estimators. Moving beyond the fundamentals, the workshop offers insights into recent developments and advances in panel data methods, such as the inclusion of individual or group-specific slopes and the identification of time-varying treatment effects via impact functions and novel Diff-in-Diff estimators.
Understanding Difference-in-Differences: Basics and Beyond with Applications in R
Speaker: Tobias Eibinger is a final-year PhD candidate in Economics at the University of Graz, Austria. His research focuses on causal environmental policy evaluation, particularly focusing on transport policies and their impact on emission reductions. He specializes in advanced econometric techniques, including Difference-in-Differences, time-series analysis, and macropanels. He has spent time at the Central European University (CEU) in Vienna and the Vrije Universiteit (VU) Amsterdam to deepen and apply his knowledge in causal identification. His work emphasizes the practical application of these methods to analyze real-world policy effects and to inform policy recommendations.
Description: This workshop provides a solid introduction to Difference-in-Differences (DiD), covering both the foundational concepts and more advanced techniques needed to address common challenges in applied research. We begin by exploring canonical DiD and two-way fixed effects (TWFE) as a starting point. We then move on to more complex scenarios like staggered adoption and multiple treatments. We discuss the limitations of traditional DiD, particularly the issue of forbidden comparisons, and introduce the Goodman-Bacon (2021) decomposition to break down treatment effects. Dynamic settings are then covered through event studies, allowing us to examine how effects evolve over time. Finally, we discuss modern remedies such as the Callaway and Sant'Anna (2021) approach to better handle heterogeneous treatment timings. Throughout, participants will follow detailed R examples to apply these methods hands-on, gaining practical experience alongside the theoretical insights.
Python workshops
Introduction to Python
Speaker: Bohdana Kurylo, Ph.D. student in Economics at CERGE-EI. Recently worked as a research and teaching assistant at CERGE-EI.
Description: This workshop will cover an introduction to Python, an introduction to NumPy and pandas, basic data loading, cleaning, manipulation, and data visualization techniques. By the end of this workshop, you will know how to: 1. create and manipulate arrays using NumPy; 2. use pandas to create and analyze data sets; 3. use matplotlib and seaborn libraries to create beautiful data visualization.
Machine Learning in Python with sklearn
Speaker: Peleg Shilo, Data Scienсe student at Minerva University. Works in Data Science in Sift.
Description: Learn how to train machine learning algorithms from start to finish in Python. In this session, we will attempt to train an algorithm to predict whether patients have heart disease. We'll cover data preparation, feature engineering, model comparison, cross-validation and model tuning, as well as getting to know basic machine learning algorithms such as logistic regression and classification trees.
Deep Learning in Python with Tensorflow
Speaker: Peleg Shilo, Data Science student at Minerva University. Works in Data Science in Sift.
Description: In this workshop, we will learn how to classify images using Deep Learning. We'll cover core deep learning concepts such as how to construct a neural network, different layers, optimization, and regularization. We'll also cover the basics of computer vision, including convolutional neural networks (CNNs) and transfer learning. By the end of this workshop, you should know how to create basic neural networks and classify images using most datasets.
Convolutional Neural Networks in detail (Pytorch)
Speaker: Khrystyna Faryna, PhD Candidate in deep learning for biomedical image analysis at RadboudUMC, Nijmegen, the Netherlands.
Description: During this workshop, we will look in depth at Convolutional Neural Networks(CNN). Our aim is to give a deep and practical understanding of the constituent elements of a CNN. We will look in detail at various kinds of convolutional layers, normalization, and regularization. We will follow each concept explanation with a clear practical example. The workshop exercises will be done in Pytorch. This workshop will be beneficial both for people starting their journey in deep learning and those willing to refresh and deepen their practical understanding of constituting parts of a Convolutional Neural Network (e. g. preparing for job interviews in data science).
Web Scraping in Python
Speaker: Bohdana Kurylo, Ph.D. student in Economics at CERGE-EI. Recently worked as a research and teaching assistant at CERGE-EI.
Description: In this workshop, we will learn how to scrap data from websites by creating our scraping tool (spider) using Scrapy. First, we will discuss the fundamentals of Web Scraping. Second, we will explore the fundamentals of XPath and CSS Selectors and learn how to locate content from the DOM using XPath. Third, we will learn how to store the data in different formats. Finally, we will build a complete spider to scrap the data from the websites.
Retrieving and Parsing Data from an API
Date: Thursday, September 1st, 18:00 - 20:00 CEST (Rome, Berlin, Paris timezone)
Speaker: Katya Vasilaky is an Assistant Professor of Economics at California Polytechnic State University in San Luis Obispo, CA. At Cal Poly she teaches computer programming for economics and analytics, machine learning for causal inference, experimental economics, and data science to solve economic and social problems. She's a big supporter of open-source programming tools. https://kvasilaky.github.io/
Description: API stands for "Application Programming Interface," and companies build APIs because they want to make their data available to you. But how do you actually ping an API and then parse the response into a usable csv file? We'll learn all of that in this workshop. Python knowledge required: dictionaries, lists, indexing, library import statements, loops, functions (and classes if possible).
Image segmentation using deep learning in PyTorch
Speaker: Khrystyna Faryna, PhD Candidate in deep learning for biomedical image analysis at RadboudUMC, Nijmegen, the Netherlands.
Description: In this workshop, we will look at image segmentation in Python with deep learning. We will tackle one of the Kaggle image segmentation challenges. In the process, we will recall basics about convolutional neural networks, discuss generalization and how to improve it, look at loss functions and metrics. We will also cover how to deal with the images in Python, from loading to normalization to preparing the images for further analysis. By the end of this workshop, you should know how to create basic image segmentation pipelines with neural networks. You will also get a curated list of useful resources that should help you further improve your skills.Prerequisites: Basic understanding of Python; If this is your first encounter with deep learning, you can get our previous workshop on “Deep learning with TensorFlow”.
Python for R users
Speaker: Dr. Johannes B. Gruber, Post-Doc Researcher at the Department of Communication Science at Vrije Universiteit Amsterdam and open-source developer.
Description: R users sometimes hear about the fabulous advantages of Python for advanced data science and modelling. While these claims are regularly exaggerated, it never hurts to be able to use more tools. This workshop will teach you to use Python together with R in the same project. That way, you can keep using the data science tool chain you already know and like in R (e.g., data processing and plotting), while employing tools from the Python world where needed, for example, for modelling. The workshop will include unsupervised machine learning with scikit-learn and BERTopic. We use the excellent reticulate package in a quarto+RStudio workflow to accomplish this, yet the knowledge is transferable to other tools.
Sentiment analysis with Python
Speaker: Dr. Adam Ross Nelson, is a career coach and a data science consultant. As a career coach he helps others enter and level up in data related professions. As a data science consultant he provides research, data science, machine learning, and data governance services. Previously, he was the inaugural data scientist at The Common Application which provides undergraduate college application platforms for institutions around the world. He holds a PhD from The University of Wisconsin - Madison in Educational Leadership & Policy Analysis. Adam is also formerly an attorney with a history of working in higher education, teaching all ages, and working as an educational administrator. Adam sees it as important for him to focus time, energy, and attention on projects that may promote access, equity, and integrity in the field of data science. This commitment means he strives to find ways for his work to challenge system oppression, injustice, and inequity. He is passionate about connecting with other data professionals in person and online. If you are looking to enter or level-up in data science one of the best places to get started is to visit coaching.adamrossnelson.com.
Description: In this workshop, Dr. Adam, Ross Nelson will preview content from an upcoming book Confident Data Science (Nelson, 2023) to be publish by Kogan Page Inc in September. The topic of this chapter is sentiment analysis. Specific workshop topics will be comparative discussions between lexicon bas approaches and machine-learning based approaches. The tools this workshop will utilize include the usual suspects Python, Pandas, NumPy, & Seaborn. For sentiment analysis we will use Natural Language Tool Kit (NLTK) and Google Cloud’s NLP API. You can see more info here.
An introduction to network analysis in Python
Speaker: Federico Botta is a Senior Lecturer in Data Science at the University of Exeter, and is also a fellow at the Alan Turing Institute, the UK national institute for data science and artificial intelligence. His research aims to provide a deeper understanding of human behaviour, both at the collective and individual level, and society, by using novel data streams. He uses tools from data science, network theory, behavioural and computational social sciences to analyse large data sets and investigate different aspects of human behaviour.
Description: In this workshop, I will first introduce some concepts of network analysis, such as why networks are important, examples of networks across many applications, and basic measures that can be used to analyse fundamental properties of networks. Then, we will see some examples of how to analyse real networks in Python using the popular network package. The workshop will conclude with an outlook on further topics in network science, with suggested reading and resources.
Spatial Data Analysis in Python
Speaker: Nils Ratnaweera is a research associate at the Zurich University of Applied Sciences and also works as a data scientist at his own firm http://ratnaweera.xyz . His research and his work focuses on the analysis of spatial data and their application in environmental contexts such as hydrology, agriculture, wildlife and vegetation ecology.
Description: In this workshop, you will get a gentle, hands-on introduction to spatial analysis in python. We will touch on raster and vector data analysis, have a look at some spatial data formats and of course make some maps! If there is room, we will also have a look at some blazingly fast commandline tools which are the motor behind most spatial libraries.
An Introduction to Transformers in Python: Train Your Own BERT or GPT
Speaker: Moritz Laurer is a PhD candidate and NLP consultant. For PhD, Moritz's research making machine learning work better with less training data. As an NLP consultant, he helps companies leverage Transformers for their business problems. His models have been downloaded more than 24 million times from the Hugging Face Hub (August 2023, https://huggingface.co/MoritzLaurer).
Description: While Transformer models like BERT and GPTs are becoming more popular, there is a persistent misconception that they are very complicated to use. This workshop will demonstrate that this is not the case anymore. There are amazing open-source packages like Hugging Face Transformers that enable anyone with some programming knowledge to use, train and evaluate Transformers. We will start with an intuitive introduction to transfer learning and discuss its added value as well as limitations. We will then look at the open-source ecosystem and free hardware options to train Transformers. Building upon a high-level explanation of the main components of Transformers in Hugging Face’s implementation, we will then fine-tune a Transformer and discuss important aspects of fine-tuning and evaluation. The code demonstrations will be in Python, but participants without prior knowledge of Python or Transformers are explicitly invited to participate. You will leave the workshop with Jupyter notebooks that enable you to train your own Transformer with your own data for your future research projects.
Note: for this workshop to get a recording, we ask you to donate to Leleka foundation (here)
Creating and Deploying Machine Learning Models with Python and streamlit
Speaker: Simeon Ifalore graduated from the department of Physiology at the University of Benin. Simeon is a dedicated data science instructor at GOMYCODE and a member of the Data Tech company, Statisense. He has over a decade of experience as a tutor and most recently a growth and operations analyst, and aims to transition into AI research for healthcare and Business.
Description: We will study the art of constructing and deploying machine learning models using Python and Streamlit in this workshop. Whether you're a seasoned data enthusiast or just getting started, this course will teach you how to maximise the power of your models. We'll go over every step of the process, from data pre-processing to creating interactive web applications that showcase your work. Prepare to overcome the coding-deployment gap as we transform your data-driven insights into deployable solutions. Participants are recommended to have basic knowledge of Python for data science
Introduction to Directed Acyclic Graphs (DAGs) in Python
Speakers: Jermain Kaminski and Paul Hünermund. Jermain Kaminski is an Assistant Professor of Entrepreneurship & Innovation at Maastricht University. In his research, he studies the strategic value of causal inference and machine learning, as well as new methodological concepts that advance entrepreneurship research.
Paul Hünermund is an Assistant Professor of Strategy and Innovation at Copenhagen Business School. In his research, Dr. Hünermund studies how firms can leverage new technologies in the space of machine learning and artificial intelligence for value creation and competitive advantage.
Description: In this workshop, participants will explore both the theory and practical applications of causal inference. First, essential concepts such as Structural Causal Models (SCMs), Directed Acyclic Graphs (DAGs), Do-calculus, mediators, and confounders will be introduced. Finally, there will be a brief hands-on experience using the DoWhy library in Python on CoLab.
Using Large Language Models for content analysis in Python
Speaker: Indira Sen is a postdoctoral researcher at the University of Konstanz in Germany in the department of Politics and Public Administration. Her research lies at the intersection of Natural Language Processing and Computational Social Science, specifically in developing robust and theory-driven methods for using digital trace data to measure attitudes and behaviors. Substantial topics of her research are assessing hateful, sexist, and political attitudes. She was a PhD candidate at the RWTH Aachen and GESIS and has interned at Snap Inc. and Nokia Bell Labs in the past. You can find more details about her at https://indiiigo.github.io/
Description: In this workshop, we will explore how large language models (LLMs) like ChatGPT and transformer-based models like BERT can be used for textual content analysis. We will specifically look into 'in-context learning' or prompt-based interactions with LLMs and guide them to analyze unlabeled content data without having to train supervised Machine Learning models from scratch. We will also contrast this paradigm with the use of finetuned models like BERT, or semi-automated approaches leveraging sentence embeddings such as SBERT. We will apply these different approaches to analyze digital trace data and investigate social constructs ranging from sexist attitudes to mental health conditions like depression. Aimed at social scientists interested in working with computational methods, this workshop offers an exploration of the latest advancements in natural language processing, providing hands-on experience for participants to incorporate these techniques into their Python-based content analysis workflows.
Multi-class and Multi-label Text Classification in Python
Date: Thursday, February 22nd, 18:00 - 20:00 CET (Rome, Berlin, Paris timezone)
Speaker: Lukas Birkenmaier is a reserach associate and doctoral candidate at the GESIS - Leibniz Institute for the Social Sciences in Mannheim. His research focuses on the application of Natural Language Processing (NLP) for the analysis of political communication. Additionally, he conducts methodological research aimed at enhancing the validation of computational text-based measures of social science constructs.
Description: This workshop provides a comprehensive introduction to Multi-Class and Multi-Label Classification using Python. During the hands-on session, participants will acquire the knowledge and skills necessary to employ both supervised and semi-supervised machine learning techniques for the purpose of assigning discrete labels to text, covering both mutually exclusive (multi-class) and non-mutually exclusive (multi-label) scenarios. Additionally, attendees will gain valuable insights into the validation of classification results. To apply the workshop's content, participants will utilize their knowledge in two common use cases in the social sciences.
Spatial Data Visualization with Python
Speaker: Marcin Szwagrzyk is a geographer with a Ph.D. in geography, acquired at Jagiellonian University in Cracow (Poland). His thesis was focusing on modeling future land use changes.
Over the years, he has accumulated extensive experience in various aspects of the geospatial industry, dabbling in flood risk modeling, air quality measurements, and spatial data analytics for financial institutions.
Description:In this workshop, participants will gain a comprehensive understanding of the fundamental principles behind representing the Earth's surface on a flat, two-dimensional plane. Participants will get to know the most popular cartographic projections, their pros and cons. Emphasis will be placed on the selection of a cartographic projection tailored to specific objectives.
Equipped with this expertise, attendees will craft two professionally polished, publication-ready maps. The raw data, acquired from the freely available sources will be processed and analyzed with the use of the geopandas Python library. Subsequently, we will employ the matplotlib library to visualize the data - by leveraging popular cartographic techniques, including graduated colors and proportional symbols.
Analyzing Faces with Images as Data in Python
Speaker: Clint Claessen is a Doctoral Researcher at the University of Basel. He focuses mainly on party leaders utilizing text-as-data methods. He is currently investigating the relationship between party leader visibility and party leader election outcomes, the impact of career experience on gendered party leaders survival in office and career motivation in youth party members.
Description: Politicians strive to maintain visibility on social media. Detecting where and when they appear and analyzing their emotions is crucial for understanding political images on user-generated content platforms. In this workshop, we will examine image data of MPs from Canada, Germany, and the United Kingdom. The workshop is structured into three main parts: gathering data, preparing it for analysis with face detection, and applying open-source Python-based face recognition and emotion detection algorithms to analyze the faces of politicians. Basic knowledge of working with data frames is assumed, but no specific knowledge of Python is required. Additionally, we will address validity concerns when using the output of deep learning models
Basic Fourier analysis with SciPy
Speaker: Tomasz Steifer is a theoretical computer scientist affiliated with the Polish Academy of Sciences and Universidad Católica de Chile. He is interested in using mathematical tools to understand how and in what sense intelligent beings can learn and obtain knowledge the world around them. In more specialized terms, he does research in computational learning theory, theory of computation, artificial intelligence and so on.
Description: Fourier analysis is a fundamental mathematical tools which finds its use in many technologies that constitute our reality. From telecommunication, through image and audio compression, to medical imaging and brain studies---all these make heavy use of Fourier methods. In this workshop, we will try to get an elementary understanding of Fourier analysis, through a series of toy examples, all of it thanks to some basic python packages (SciPy).
Other topics
A step-by-step guide to the research process in economics
Speaker: Marc F. Bellemare is Distinguished McKnight University Professor, Distinguished University Teaching Professor, and Northrop Professor in the Department of Applied Economics at the University of Minnesota.
Description: In this workshop you will learn everything from how to identify good research ideas to how to publicize published articles to improve citations to your work.
Classification modeling for profitable decisions: Theory and a case study on firm defaults.
Speaker: Gábor Békés is an Assistant Professor at the Department of Economics and Business of Central European University, a research fellow at KRTK in Hungary, and a research affiliate at CEPR. His research is focused on international economics; economic geography and applied IO, and was published among others by the Global Strategy Journal, Journal of International Economics, Regional Science and Urban Economics or Economic Policy and have authored commentary on VOXEU.org. His comprehensive textbook, Data Analysis for Business, Economics, and Policy with Gábor Kézdi was publsihed by Cambridge University Press in 2021.
Description: This workshop will introduce the framework and methods of probability prediction and classification analysis for binary target variable. We will discuss the key concepts such as probability prediction, classification threshold, loss function, classification, confusion table, expected loss, the ROC curve, AUC and more. We will use logit models as well as random forest to predict probabilities and classify. In the workshop we will focus on a case study on firm defaults using a dataset on financial and management features of firms. The workshop material is based on a chapter and a case study from my textbook. Code in R and Python are available from the Github repo, and the data is available as well. The workshop will introduce key concepts, but the focus will be on data wrangling and modelling decisions we make for a real life problem. There will be a follow-up workshop focusing on the coding side of the case study.
Using Google Trends and GDELT datasets to explore societal trends
Speaker: Harald Puhr, PhD in international business and assistant professor at the University of Innsbruck. His research and teaching focuses on global strategy, international finance, and data science/methods—primarily with R. As part of his research, Harald developed the globaltrends package (available on CRAN) to handle large-scale downloads from Google Trends.
Description: Researchers and analysists are frequently interested in what topics matter for societies. These insights are applied to research fields ranging from Economics to Epidemiology to better understand market demand, political change, or the spread of infectious diseases. In this workshop, we consider Google Trends and GDELT (Global Database of Events, Language, and Tone) as two datasets that help us to explore what matters for societies and whether these issues matter everywhere. We will use these datasets in R and Google Big Query for analysis of online search volume and media reports, and we will discuss what they can tell us about topics that move societies.
An Introduction to Bayesian A/B Testing in Stan, R, and Python
Speaker: Jordan Nafa is a Data Scientist at Game Data Pros, Inc. where his work centers around Bayesian A/B Testing, stochastic optimization, and applied causal inference for revenue optimization, promotional pricing, and personalized targeting in video games. He is also a Ph.D. Candidate in Political Science at the University of North Texas where he previously taught undergraduate courses in causal inference, applied statistics, and American political behavior.
Description: This workshop will cover a basic introduction to Bayesian inference, A/B Testing, and decision theory for the analysis of large-scale field experiments in industry settings. After introducing the foundations of the Bayesian approach to A/B Testing, we will work through real-world examples using the probabilistic programming language Stan along with its R and Python interfaces.
Survival Analysis with R and Python
Speaker: Christopher Peters is the Principal Data Scientist and ninth employee at Zapier where the mission is to make automation work for everyone. For the last decade, he’s applied survival analysis in R and Python, along with statistics and econometrics to affect positive change for people. He learned many of his skills through self-study with friends as well as during his education at Louisiana State University where he completed his terminal degree, Masters of Applied Statistics. There he was privileged to be advised by reliability analysis giant, Professor Luis A. Escobar. His committee also included co-founder of Penalized B-splines and co-author of The Joys of P-Splines, Professor Brian Marx. As well as Emeritus Professor of Econometrics R. Carter Hill, co-author of Principles of Econometrics. Christopher was recently invited to review the book Statistical Methods for Reliability Data, 2nd Edition, co-authored by Distinguished Professor William Q. Meeker, Professor Luis A. Escobar, and Emeritus Associate Professor Francis G. Pascual. He also recently reviewed Telling Stories with Data by Assistant Professor Rohan Alexander. He loves being in nature and his interests lie in the interactions of technology and nature and span a wide variety of topics related to business, economics and causal inference. You can find him on Twitter at: @statwonk or at http://statwonk.com.
Description: How can we speed up growth? Bring about or prevent important events? Design technology and human processes for high-reliability? Survival Analysis (time-to-event) allows us to wisely answer these questions by allowing us to accurately and precisely allocate credibility among their possible answers.
Our interest in future events is insatiable for many serious reasons. Through the benefit of systemization, we can use time-to-event analysis to better understand the possibilities of future events and how they can be reconfigured for the benefit of people and ourselves.
Whether it’s causing or preventing important events, or just better understanding them, time-to-event analysis (aka survival or reliability analysis) affords us these abilities through the benefits of systemization.
In this two hour workshop, I’ll give a gentle introduction to industrial and commercial application of time-to-event analysis technology in R and Python side-by-side.
The workshop will focus on how you can best get started with these technologies and begin to answer these questions yourself on a deeper-level for the purpose of innovation.
As part of that, I’ll share what I’ve learned over a decade of applying this high-technology in the SaaS software industry.
Introduction to SQL
Speaker: Mauricio "Pacha" Vargas Sepulveda, a statistician interested in applying statistical methods to address specific policy-relevant questions, particularly in international trade, migration, investments, and theoretically founded empirical work. He received M.Sc. in Statistics from P. U. Católica de Chile.
Description: We'll use the financial database from http://databases.pacha.dev that contains 606 successful and 76 not successful loans along with their information and transactions. The goal is that attendees learn to connect to a SQL database (PostgreSQL) to be able to query information in efficient ways. The only requirement is to install the dedicated SQL software DBeaver before the activity.
Rigorous Impact Evaluations in Practice: A Short Introduction for Policy Implementers
Speaker: Dr. Alexandra Avdeenko works on impact evaluations at the World Bank. She is Research Affiliate at the Center for Economic Policy Research (CEPR), a Senior Lecturer in Economics at Heidelberg University, and a J-PAL invited researcher. As a Principal Investigator, she has designed and conducted numerous impact evaluations and data collections, among others in Azerbaijan, Ethiopia, India, Indonesia, Montenegro, Sudan, Pakistan, and the Philippines. Her work has been published in the American Political Science Review (awarded Best Article in African Politics in 2015 by the American Political Science Association), the World Bank Research Observer and the European Economic Review, amongst others. She is an academic referee for numerous leading academic journals and implementing agencies. She graduated from the Berlin School of Economics, receiving her PhD in Economics from the University of Hamburg.
Description: The workshop is drafted for individuals with no or little prior knowledge of quantitative impact evaluation methods but a high curiosity for such methods. This includes anyone who plans or oversees implementation, sets up (log)frameworks, would like to invest into institutional and program learning, or would like to better understand and assess different types of evaluations. After setting the core methodological foundations, we will discuss the day-to-day challenges faced when integrating impact evaluation methods. The workshop will give insights on how to better plan programs and monitoring systems in a way that impact evaluation can be embedded naturally. Participants will learn how an impact evaluation can make a difference in the assessment of their work. We will discuss impact evaluation methods, including the need to set up a comparison group(s). We will explore how to set up a randomized control trial (RCT). The participants will be presented with the richness of the method, explore different variations thereof and its limitations while discussing several case studies.
Manipulating, Cleaning and Analysis of Data in Stata
Speaker: Olha Halytsia, PhD Economics student at the Technical University of Munich. She has a previous working experience in research within the World Bank project, also worked at the National Bank of Ukraine.
Description: This workshop will be useful to enhance your ability to organize your data and prepare it for analysis in Stata. In particular, we will cover the following subtopics: manipulating/reorganizing data, merging and combining to create larger data, and removing data. Once our data is prepared, we will perform its basic analysis. Please note that we are not able to provide STATA licenses to the participants, therefore if you would like to run the code on your own computer during the workshop you would need to have STATA license from another source.
Working with Big Data with Hadoop and Spark
Speaker: Jannic Cutura is an economist turned data engineer turned software engineer who works as a Python developer at the European Central Bank’s Stress Test team. Prior to his current position he worked as research analyst/data engineer in the financial stability and monetary policy divisions of the ECB. He holds a masters and Ph.D. in quantitative economics from Goethe University Frankfurt and conducted research projects at the BIS, the IMF and Columbia University.
Description: Big data --- datasets that are difficult to handle on standalone retail-grade computers --- are rapidly becoming the norm in social science research. This is true both in academia as well as for policy-oriented research in central banks and similar bodies (let alone industry application). Yet traditional econometrics (and econometrics training) tells us little about how to efficiently work with large datasets. In practice, any data set larger than the researchers computer memory (~20- 30GB) is very challenging to handle as, once that barrier is crossed, most data manipulation tasks becomes painfully slow and prone to failure. The goal of this presentation is to (i) explain what happens under the hood when your computer gets slow and (ii) show how distributed computing (in particular Hadoop/Spark) can help to mitigate those issues. By the end, participants will understand the power of distributed computing and how they can use it to both tackle existing data handling challenges and as well as new ones that were previously prohibitively expensive to evaluate on retail grade computers.
Introduction to Topic Modelling in R and Python
Speaker: Christian Czymara is a postdoc fellow at Tel Aviv University. His research focuses on attitudes, immigration, and political communication using quantitative and computational methods as well as natural language processing.
Description: This workshop offers an in-depth exploration of topic models, which allow extracting meaningful insights from extensive text corpora while minimizing the reliance on prior assumptions or annotated data. The workshop will start with the basics of text data preprocessing and progress to a general understanding of the underlying principles of topic modeling. It will cover a range of topic modeling techniques, such as Structural Topic Models, BiTerm, and Keyword Assisted Topic Models in R, and BERTopic in Python. We will explore the cases where each model is particularly promising. Participants will learn about the practical considerations when choosing a topic modeling algorithm, and how to apply these techniques to their own data. The lecture will be of interest to researchers and practitioners who are interested in extracting insights from large volumes of textual data, such as social media, news articles, or scientific publications.
Using ChatGPT for Exploratory Data Analysis with Python, R and prompting
Speakers: Gábor Békés is an Associate Professor at the Department of Economics and Business of Central European University, a research fellow at KRTK in Hungary, and a research affiliate at CEPR. His research is focused on international economics; economic geography and applied IO, and was published among others by the Global Strategy Journal, Journal of International Economics, Regional Science and Urban Economics or Economic Policy and have authored commentary on VOXEU.org. His comprehensive textbook, Data Analysis for Business, Economics, and Policy with Gábor Kézdi was publsihed by Cambridge University Press in 2021.
Seth Stephens-Davidowitz is a data scientist and New York Times bestselling author. His 2017 book, Everybody Lies, on the secrets revealed in internet data, was a New York Times bestseller; a PBS NewsHour Book of the Year; and an Economist Book of the Year. His 2022 book, Don’t Trust Your Gut, on how people can use data to best achieve their life goals, was excerpted in the New York Times, the Atlantic, and Wired. Seth has worked as a data scientist at Google; a visiting lecturer at the Wharton School of the University of Pennsylvania; and a contributing op-ed writer for the New York Times. Seth has consulted for top companies. He received his BA in philosophy, Phi Beta Kappa, from Stanford, and his PhD in economics from Harvard.
Description: How can GenAI, like ChatGPT augment and speed up data exploration? Is it true that we no longer need coding skills? Or, instead, does ChatGPT hallucinate too much to be taken seriously? I will do a workshop with live prompting and coding to investigate. I will experiment with two datasets shared ahead of the workshop. The first comes from my Data Analysis textbook and is about football managers. Here I'll see how close working with AI will get to what we have in the textbook, and compare codes written by us vs the machine. Second, I'll work with a dataset I have no/little experience with and see how far it takes me. In this case, we will look at descriptive statistics, make graphs and tables, and work to improve a textual variable. It will generate code and reports, and I'll then check them on my laptop to see if they work. The process starts with Python but then I'll proceed with R.
Seth Stephens-Davidowitz is writing a book in 30 days using ChatGPT's Data Analysis. The book is called Who Makes the NBA? and is a statistical analysis of what it takes to reach the top of basketball. Seth will illustrate his experience with one of the case studies he had worked on. The Workshop will end with Seth and Gabor chatting about their experiences in what works well.
Customizing slides and documents using Quarto extensions
Speaker: Nicola Rennie is a Lecturer in Health Data Science based within the Centre for Health Informatics, Computing, and Statistics at Lancaster Medical School. Her research interests include applications of statistics and machine learning to healthcare and medicine, communicating data through visualisation, and understanding how we teach statistical concepts. Nicola also has experience in data science consultancy and collaborates closely with external research partners. She can often be found at data science meetups, presenting at conferences, and is the R-Ladies Lancaster chapter organiser.
Description: Quarto is an open-source scientific and technical publishing system that allows you to combine text with code to create fully reproducible documents in a variety of formats. The addition of custom styling to documents can make them look more professional and recognisable. In the first half of this workshop, we'll look at ways to customise HTML outputs (including documents and revealjs slides) using CSS, and ways to customise PDF documents using LaTeX. In the second half, we’ll discuss the use of Quarto extensions as a way of sharing customised templates with others, demonstrate how to install and use extensions, and show the process of building your own custom style extension.
Introduction to Rust
Speaker: Luca Barbato, long time OpenSource developer. Politecnico di Torino alumnus. He contributes to Linux distributions such as Gentoo and OpenWrt. He is part of the VideoLan and the Xiph organizations to foster open source and open standards in multimedia and he is involved in the W3C Web of Things efforts to counter fragmentation in the IoT world. He offers opensource-related services through his company Luminem Srls.
Description: Rust is one of the most loved languages according to the Stack Overflow surveys over the years and it found its ways in nearly every field, from kernel programming both in Linux and Windows, to web browsers such as Chrome and Firefox, to high assurance fields such automotive. Part its success is because it provides top notch performances, prevents large classes of common mistakes from happening and its toolchain makes more straighforward to write and deploy complex software. In data science usually the programming part is done in two kind of programming languages: focused languages that shine on their niche such as R for statistical analysis, general purpose languages that provide useful tools for the tasks at hand, such as Python with pandas, numpy or pytorch. Both approaches usually end up with writing at least portions of the codebase in lower level languages such as C or C++ because of performance. Nowadays many argue that Rust is a nicer language even for such purpose and there are already interesting tools that prove that point such as polars is a fast dataframe library that you can already use from Python and R, deltalake, candle and many more.This introduction will cover the language syntax, its fairly unique approach to memory safety and how to use the ecosystem (toolchain and online resources).
Automating updates to dashboards on Shiny Server
Speaker: Clinton Oyogo David is a data scientist with 7 years of experience currently working with Oxford Policy Management (OPM) under the Research and Evidence, data innovations team. Prior to joining OPM I was working at World Agroforestry Centre as a junior data scientist in the Spatial Data Science and Applied Learning Lab.
Description: In this workshop, we will talk about the configurations and set ups needed to have an automated update to R Shiny dashboards deployed on a shiny server. The talk will touch on GitHub webhooks, API (Django) and bash scripting. With the set-up in place one will not need to manually update the code on shiny server, a push event to github will be enough to have your changes to the code reflect on the dashboard in a matter of seconds.
Data Analysis through Spatial Technology with GIS
Speaker:Andrea Matranga is a researcher at the economics department of the University of Turin. Prior to that he was an Assistant Professor at Chapman University and the New Economic School. For his research he has used GIS to analyze such diverse topics as the Neolithic Revolution, Russian defense lines against slave raids in the 16th century, and Roman aqueducts.
Description: This course provides an in-depth introduction to Geographic Information Systems (GIS) for professionals in economics, data science, and other social sciences. It covers the significance of GIS in multidisciplinary research, platform selection, technical aspects of spatial data (like raster/vector datasets and map projections), distance measurement techniques, data visualization, and addresses econometric and spatial challenges in data analysis. Tailored for economists, data scientists, and social scientists, this course equips participants with practical skills in GIS for enhanced data analysis and visualization, aiding in producing innovative research and insightful conclusions in their respective fields.
An applied introduction to Mathematica
Speaker: Matteo Broso is a 4th year PhD Student in Economics at the Collegio Carlo Alberto and at the University of Turin. He uses game theory to answer questions in political economy, industrial organisation and labor economics. He has served as a Teaching Assistant for several classes at any university level, from Undergrad to PhD. He holds a MSc in Economics from the University of Bologna, and is currently a visiting student at Imperial College Business School.
Description: This workshop is a beginner's guide to Wolfram Mathematica, a program used by mathematicians, physicists, engineers, economists, and various scientific professionals. Mathematica stands out for its exceptional symbolic computation capabilities, enabling precise and intricate mathematical analysis. The workshop will cover practical demonstrations of Mathematica’s fundamental functions in analytical calculations and graphical visualizations, highlighting its ability to handle complex symbolic tasks with ease. Attendees will gain experience with Mathematica's features, which simplify and enhance the exploration of mathematical concepts. No prior experience with Mathematica or programming is required, although a grasp of high school-level math could be advantageous. Interactive participation and questions are encouraged throughout the session, making it an ideal opportunity for beginners eager to learn how Mathematica's computation power can be applied in their respective fields.
Version control with Git
Speaker: Olexandr Konovalov is a lecturer in the School of Computer Science at the University of St Andrews in Scotland, where he is leading the Research Software Group. Being a pure mathematician by training, he moved into this area through starting to contribute to open source mathematical software after completing PhD in the Institute of Mathematics of the National Academy of Sciences of Ukraine in Kyiv. Olexandr is an Instructor and a Trainer for The Carpentries - a global volunteer-based organisation whose members teach foundational coding and data science skills to researchers worldwide, and also a Fellow of the Software Sustainability Institute - a national UK facility to cultivate best practices of working with code and data. You can find more details about him here.
Description:Version control systems like Git allow you to track changes in any files (such as code, data, research publications, websites, etc.) and synchronise them across multiple computers. They facilitate collaboration with others by sharing repositories (i.e., projects with complete histories of changes), and are essential tools for developing reliable and sustainable scientific software. This workshop will be based on the Software Carpentry lesson "Version Control with Git" and will cover the basics of using Git: setting it up on your computer, creating a repository, recording changes, viewing changes history, interacting with remote GitHub repositories, and collaborating with others. No prior knowledge of Git is required, but we encourage participants to install Git following these instructions and to create a GitHub account as explained here, in order to be able to replicate instructor's actions on their computers as much as possible. To make this most convenient, we recommend using a wide or an additional screen during the workshop, if possible.
Making good presentations
Date: Thursday, April 4th, 18:00 - 20:00 CEST (Rome, Berlin, Paris timezone)
Speaker: Heather Lanthorn, Heather is the Co-Director of the Mercury Project at the Social Science Research Council in the United States. She is also on the Board of Advisors at IDinsight, the Board of Directors at Feedback Labs, and the Advisory Board at the Clarity Foundation. She teaches as adjunct faculty at the University of North Carolina Gillings School of Global Public Health.
Description:No matter how good your research, it won't sell itself. In this workshop, we will cover key steps in making a presentation that can help audiences understand and remember your research. There will be opportunities throughout to think about an upcoming presentation; attendees are encouraged to come with a concrete upcoming presentation opportunity in mind.
(Pretty) big data wrangling with DuckDB and Polars
Speaker: Grant McDermott is a Principal Economist at Amazon, where he helps lead data-driven projects across different parts of business. He is an advocate of reproducible and open science, and has authored a number of software (mostly R) packages and other pedagogical tools. Before returning to the private sector, he was a faculty member at the University of Oregon, where he continues his research affiliation as a Courtesy Assistant Professor. You can find out more on his website: https://grantmcdermott.com/
Description: This workshop will introduce you to DuckDB and Polars, two data wrangling libraries at the frontier of high-performance computation. (See benchmarks.) In addition to being extremely fast and portable, both DuckDB and Polars provide user-friendly implementations across multiple languages. This makes them very well suited to production and applied research settings, without the overhead of tools like Spark. We will provide a variety of real-life examples in both R and Python, with the aim of getting participants up and running as quickly as possible. We will learn how wrangle datasets extending over several hundred million observations in a matter of seconds or less, using only our laptops. And we will learn how to scale to even larger contexts where the data exceeds our computers’ RAM capacity. Finally, we will also discuss some complementary tools and how these can be integrated for an efficient end-to-end workflow (data I/O -> wrangling -> analysis). You can find more information about the workshop here.
Academic and Personal Website Creation: A Quarto Tutorial
Speaker: Brier Gallihugh is an incoming fifth year doctoral candidate in social psychology at Ohio University. Broadly speaking, research interests involve prejudice and discrimination of minoritized groups. An advocate for all things open science and statistical, Brier spends countless time inside of RStudio and using R for statistical analyses and document/manuscript generation. Post PhD (likely Spring 2025), Brier hopes to gain employment as either a data analyst or data scientist.
Description: In the digital world having an online presence is at worst a strong suggestion and at best a firm requirement for anyone who wishes to advertise what they do. This is true both in academic circles (i.e., lab websites) and industry circles (i.e., portfolios) alike. However, creating websites can often require a vast knowledge in CSS and HTML coding in order to put together a professional product. Thankfully this is where Quarto comes in handy. This workshop will show participants how to get going quickly on creating and hosting a website for professional or personal use tailored to each participants individual needs using Quarto. Participants will need to have the latest versions of both R and RStudio installed prior to the workshop. Further, a GitHub and Netlify account (used to host the website) is also advised.
An Introduction to creating data visualisations in Tableau
Speaker: Serena Purslow is a data consultant based in London with a background in international relations and quantitative methods. Her passion for data visualisation stems from its ability to bridge the gap for those with non-technical backgrounds, empowering everyone to engage with data regardless of their skill level.
Description: This workshop will cover the very basics of creating a data visualisation in Tableau. After introducing you to Tableau as a BI tool, connecting to data in Tableau and understanding your data, you’ll be taken through creating some basic charts in Tableau, and how to bring them together in a dashboard.
Transforming Data into Visual Narratives: A Hands-On Workshop with the Flourish Data Visualization Tool
Speaker: Dr. Jonathan Schwabish is an economist, writer, teacher, and data communications expert. He is considered a leading voice for clarity and accessibility in how analysts, researchers, and scholars communicate their findings. Across four books, he has provided a comprehensive guide to creating, communicating, and distributing data-rich content. Better Presentations coaches people through preparing, designing, and delivering data communication products; Elevate the Debate helps people develop a strategic plan to communicating their work across multiple platforms and channels; and Better Data Visualizations details essential strategies to create more effective data visualizations. His most recent book, Data Visualization in Excel, hit bookshelves in May 2023 and helps readers create better graphs and charts in the Excel software tool. He is on Twitter @jschwabish.
Description: In this two-hour workshop, participants will gain hands-on experience with Flourish, a powerful data visualization tool designed to create interactive and engaging charts, maps, and stories. The session will cover the fundamentals of importing data, selecting appropriate visualization templates, and customizing visual elements to effectively convey complex data insights. Attendees will learn how to leverage Flourish’s intuitive interface to build dynamic visualizations without needing advanced coding skills, and will explore best practices for data storytelling to enhance their presentations and reports. By the end of the workshop, participants will have the skills to transform raw data into compelling visual narratives, improving their ability to communicate data-driven insights.
Introduction to data donations as digital trace collection method – the why and the how
Speaker: Felicia Loecherbach is an assistant professor in political communication and journalism at the Amsterdam School of Communication Research. Prior to this, she has been a postdoctoral fellow at the Center of Social Media and Politics (CSMaP) at the NYU and a PhD student in Computational Political Communication Science at the Vrije Universiteit Amsterdam. Her research interests include (the diversity of) online news consumption and using computational methods in the social sciences. Specifically, she uses computational approaches to study when and where users come across different types of news – collecting digital trace data via innovative approaches such as data donations, analyzing different dimensions of diversity of the content and how it affects perceptions and attitudes of users. Apart from this, she has been involved in studying the challenges of different modes of news access, for example via news recommender systems, private messaging, and smart assistants.
Description: In the last few years, access to platform data has become messier than ever before. Application programming interfaces have closed or are only accessible for certain organizations only to high costs. Apart from this, certain data such as private messaging or smart assistant data has never been accessible via APIs or scraping while still holding important information for better understanding a variety of phenomena for example around (mis)information spread and exposure. Since 2018, users have a right to request and download their own data from platforms and subsequently “donate” them to other entities the way they see fit. This workshop introduces different frameworks explaining why data donations are a good way forward for transparent and open engagement with digital trace data but also discusses the limitations and downsides. In the second part, different (open source) frameworks and tools are being introduced including some hands-on experience on how to use them to set up a data donation collection for various platforms
Generative AI for Social Scientists – A Crash Course
Speaker: Miklós Sebők is a research professor at the HUN-REN Centre for Social Sciences and the Artificial Intelligence National Lab in Budapest and the principal investigator of poltextLAB (poltextlab.com). He serves as the convenor of the COMPTEXT conference and is a co-creator of the ParlLawSpeech dataset (parllawspeech.org) and the Hungarian Comparative Agendas Project datasets (cap.tk.hun-ren.hu). His work appeared in, inter alia, Computational Communication Research, European Journal of Political Research, European Political Science, European Political Science Review, International Political Science Review, Journal of Computational Social Science, Policy Studies Journal, Political Analysis and Social Science Computer Review.
Description: Topics to be covered in two parts: Part I: Large Language Models (LLMs) and genAI: capabilities and opportunities; - Basics of AI and generative AI: How does it work, what does it do, and what does it have access to?;- General models (such as ChatGPT) vs. specific solutions; - Limitations and threats; - Validity; - Costs; - Ethical use; - Data protection;- Tips for tools and solutions that can help scientific writing. Part II.:Advanced prompt engineering techniques for social science use cases
Report
As of October 12th, we have held 112 workshops in support of Ukraine. So far, we have raised a total of 83 180 euro. In particular, we raised the following amounts for the following organizations:
Leleka Foundation: 49 948 euro
National Bank of Ukraine Special Account to Raise Funds for Ukraine’s Armed Forces: 22 927 euro
Come Back Alive Foundation: 6 206 euro
Kyiv School of Economics 1 998 euro
Global Support Fund by the Embassy of Ukraine in the UK: 357 euro
Kárpátaljai Sárkányellátó Alapítvány: 110 euro
Group 35: 75 euro
Marikate Foundation: 75 euro
Prytula Foundation: 60 euro
Volunteer Batallion: 60 euro
Demsokyra: 55 euro
Liberty Ukraine Foundation: 53 euro
Aerorozvidka: 50 euro
DeepStateUA: 50 euro
Open eyes foundation: 50 euro
Hurkit: 25 euro
Dignitas Fund: 25 euro
Boryviter: 25 euro
Fund Khorobrogo: 25 euro
VOL: 25 euro
Reformation NGO: 25 euro
Donor UA: 25 euro
B-52: 25 euro
eppoua.com: 20 euro
SAB UA: 20 euro
Other individual fundraisers: 75 euro
Our workshops were attended by a total of 3760 people. In particular,
2465 people registered directly
286 joined through the special registration form for Ukrainians
520 were able to attend through the waiting list as their registration fee was sponsored by someone else.
489 people purchased access to recordings and materials after the workshop.
Frequently Asked Question
I have registered for a workshop, but haven't received anything, is that normal?
Yes. I usually send out registration confirmation emails to all registered participants around 1-2 days before the workshop, so if you registered well in advance you may not receive anything for a while. If you have registered after the initial bunch of registration confirmation emails have been send, I will send you registration confirmation within 1 working day (but before the workshop, so don't worry about missing it!). Please make sure to check spam folder as the registration confirmation may end up there. It is recommended to register in advance, so that if there are any issues with your registration we can solve them before the workshop.
Do workshops take place online or in person?
All workshops take place online via Zoom.
Which language are workshops taught in?
All workshops are in English.
Who can register for the workshop?
Anyone can register for the workshop via the standard registration form and anyone can sponsor the participation of a student! Only students can apply for the waiting list. Only Ukrainians can sign up via registration form for Ukrainians.
How does waitlist work?
Please make sure to use your university email when registering (your email at the university domain) or provide some other proof of student status. Note that signing up for the waiting list does not guarantee your participation: you will only be able to participate if someone will decide to sponsor student participation (places from waitlist are allocated on first come first serve basis, conditional on preferences of sponsors on whether the give preference to sponsoring students from developing countries.
How does sponsorship form work?
You can sponsor participation of student(s) by making a donation of 20 euro/USD per student that you want to sponsor and filling in the sponsorship form for a particular workshop. You have an option to name a specific student that you want to sponsor by providing their email or allow the sponsored place to be allocated to the students from the waitlist. If you choose the option of allocating places to the students on the waitlist, you would also have an option of choosing whether we should prioritise students from developing countries when allocating the places. The places are allocated to the students on the waitlist on the first come first serve basis, taking into account your preference of whether to prioritise students from developing countries. If more students were sponsored than signed up for the waitlist for a particular workshops, the remaining spaces are allocated to students who sign up for the waitlist in the next workshop. Please note that sponsoring students to participate does not automatically entitle the sponsor for their own participation in the workshop. If you want to participate in the workshop yourself in addition to sponsoring the students, you must make an additional donation of at least 20 euro and fill in the registration form.
I forgot to enter my email when making a donation, or did not receive an email, what should I do?
If you have not received a donation receipt, you can submit an excerpt from your bank statement where the donation amount, recipient and date are clearly visible. If you made a donation via PayPal, you can also attach a screenshot from the PayPal transaction, where the amount, recipient and date or transaction id are clearly visible.
Can I use someone else's credit card for the donation or make a donation for myself and someone else when registering?
Yes, as long as you donate at least 20 euro/USD per person per workshop.
I have registered for a workshop, when will I get the recording?
Usually, I send out the recording next day after the workshop, but sometimes it may take a couple of days to upload and share the recording link. Please check your spam folder, as the emails sometimes end up there. You can email me if you have not received it after a week after the workshop.
How can I help with this initiative?
If you are proficient in R/Python/other tools for data science, data analysis or research in general, you can volunteer to teach a workshop. Please email me at dariia.mykhailyshyn2@unibo.it so that we can discuss the details. You can also share information about these workshops on social media or directly with people who can be interested. You can also register for a workshop yourself or sponsor participation of students.