Create your 23andMe ancestry map using R and Bigsnpr (2025)
This tutorial is intended to estimate genetic ancestry using R. It is a code reproduction of Prive et al., “Using the UK Biobank as a global reference of worldwide populations: application to measuring ancestry diversity from GWAS summary statistics.” Bioinformatics (2022). Compared to other ancestry tools (e.g., RFmix), this is a programming-driven approach that uses only R and the bigsnpr package. There is not better between the former and the latter, however, this is a good option for those more versed in programming languages.You’ll learn:- How to estimate ancestry proportions from GWAS summary statistics (using only allele frequencies).
- How to estimate ancestry proportions from individual-level data (Requires genotyped data).
- How to perform ancestry grouping using Euclidean Distance.
- How to create your 23andMe ancestry map (Bonus!)
Statistical Genetics Repo-> URLAncestry in R Notebook -> URL#ancestry #precisionmedicine #statisticalgenetics #geospatial A Primer in Polygenic Risk Scores using PLINK and R (2025)
This tutorial is intended to perform QC using plink and R. It is an adaptation of Choi et al., Tutorial: a guide to performing polygenic risk score analyses. Nat Protoc (2020).You’ll learn:- How to perform QC on base and target genomic data.
- How to estimate PRS with different thresholds.
- How to include your PRS on your model.
Statistical Genetics Repo-> URLQC in R Notebook -> URLPRS in R Notebook -> URLPRS in LDPred -> Tbd#PRS #precisionmedicine #statisticalgenetics A Primer in GWAS using Hail and Python (2024)
Creating bivariate maps using Kriging Interpolation in R and leaflet (2021)
Disease mapping visualizes geographically indexed data for analysis. Interpolation methods are required to predict unknown values by giving more weight to nearby points. Kriging considers spatial autocorrelation using a semi-variogram, enhancing accuracy by factoring in spatial continuity and redundancy. Here i wanted to model to variables (migration vs HIV prevalence):You’ll learn:- How to prepare the spatial datapoints dataset.
- How to generate a interpolated surface across sampling units via Kriging's semi-variogram.
- How to create a pixel-to-pixel bivariate web map from two interpolations in leaflet.
R Notebook -> URL#geospatial #interpolation Analyzing Complex Survey Data in R (2021)
This tutorial illustrates the analysis of complex survey data in R. My motivation comes to the several challenges I faced in the past with this kind of data such as lack of documentation, fragmented information in forums (dhsforum, nhanes, stackstats, books) and datasets availability. Two stage sampling is a little bit tricky at the beginning, so this tutorial tries to speed up the learning curve I faced in the past. You’ll learn:- Extract, transform and load datasets (ETL).
- Create your survey design including sampling weights.
- Perform statistical analysis (summary statistics and GLMs).
- Advance topics (population stratification+raking).
R Notebook -> URLPaper -> URL#complexsurvey #twostagesampling #appliedstats Talk: Introduction to GIS and Disease Mapping with R (R Users Group, 2022)
When: Wednesday, Dec. 14, 2022, Noon – 1 pm About RUG: The R Users Group (RUG) is a series of organized talks by and for anyone at CCHMC who uses the statistical programming language R. In this talk, I introduced maps in the context of disease mapping.You’ll learn:- Reasons you might want to create your own maps
- How to perform spatial and geometry operations
- How to project your geographic data
- Where to find more geospatial variables
Code & Slides -> URL#geospatial #datascience Raster processing using Raster imagery in R (2021)
Tutorial for learning annual Mean PM2.5 Concentrations by County in RI have used the North American Regional Estimates (V4.NA.02.MAPLE) for the surface pm 2.5 [1]. You can get the raw images from here. For the county-level spatial data, I am using the CDC's US-ADM2 map due to my analysis [2], but you can use getData boundaries too as long as it has the proper column code (FIPS, GEOID for the join. You’ll learn: - Key geospatial tasks (map projections, clipping, vector manipulation, zonal statistics, animation) in R.
Python notebook -> URL Paper: van Donkelaar, A., et al. (2019). Regional Estimates of Chemical Composition of Fine Particulate Matter..., 2019, doi:10.1021/acs.est.8b06392.CDC's Social Vulnerability Index (SVI) https://svi.cdc.gov/data-and-tools-download.htmlHow to customized your albers projection https://github.com/hrbrmstr/rd3albers#geospatial #datascience GDAL has been the core geospatial library for years, but its steep learning curve—especially for vector data—makes it challenging. To address this, I’ll share practical geospatial example in Python 3, adding them incrementally to my playground repo.You’ll learn: - GDAL library.
- Key geospatial tasks (NDVI, transformation, clipping, vector manipulation, zonal statistics) in Python.
Python notebook -> URL For the proper installation of gdal env you can check this tutorial #geospatial #datascience The Algorithms' book you need
The free time of summer during my phd lead to me to refresh data structures and algorithms concepts of my early career.Algorithms are one of those computer science topics that everybody wants to learn but can't do it. Algorithms are a tough but necessary topic for every computer scientist engineer. From array sorting, searching, graph theory, and string processing. All digital technology that we take for granted now has numerous famous algorithms within. For example, the internet and its graph, all databases with their searching and sorting algorithms, or Waze (now part of google maps) with its edge-weighted shortest path strategy. By implementing this book now more as a senior, I've got not only better in problem-solving, but also, in code design and data structures. Noteworthy are the several historical quotes about each algorithm, although they are short in an extension I feel like reading Statistical Rethinking book. Just for my learning purposes, I have ported almost the entire book to python in my GitHub if your are interested or stuckRepository -> URL#programming #algorithms GIS & Public Health ArcMap 10.x
(2020)
The goal of this course was to exemplify the role of ArcGIS in the analysis of spatial data in public health applications. In the beginning, we reviewed the fundamentals of the use of the ArcMap interface, basic file structures, and operations. Then, we explored the capabilities of manipulating information in ArcGIS, and finally the use of ArcGIS in real-world scenarios.Format of the Lab sessions:- This is a class about the use of ArcGIS in health-related problems
- Subtitled videos (No audio) with a detailed explanations about the topics.
References