Teaching Resources for Workshops

This page will support my students who are aspiring researcher and folks I teach at various workshops. This page is under construction, and I hope to complete this page by the end of February. Primarily these resources will be for species distribution modelling, biogeography, and avian ecology. I hope to include more topics in future as I navigate through different parts of the academia.

Visualization of Data in R

At the start of your work in R, set up a working directory where your files are saved. this working directory should be the same path where you save the R script as a .R file. For example, I have my files saved my files in the folder "Workshop", within folder "Teaching", "within "D" drive. 

So I'll set my working directory as -

setwd("D:/Teaching/workshop")

Here, setwd() is the function within which we are giving the name of file path ("D:/Teaching/workshop") as an argument. One thing is to keep in your mind that whenever we write a name in the R script, we write it within quotation marks. R software understands anything written within quotation mark as a character. 

We have set the working directory. Now to get access in files kept in this directory, we write code-

> getwd()

[1] "D:/Teaching/workshop"

Now to read your files, the functions are read.csv(), read.table(), read.xls() etc. I am giving here the examples of reading csv and txt files.

For csv:

df1=read.csv("demodata.csv",header=T)

Here, demofata.csv is the name of the csv file to be read. The logical argument, header =T, tells the softwareread the column names in csv file.


If you wanna know what are the names of the columns, run the following code:

> names(df1)

[1] "Number.of.Participant" "Attendance"            "Answer"   


If you wanna print the first six data points, run

> head(df1)

  Number.of.Participant Attendance Answer

1                     1          y      a

2                     3          y      a

3                     5          n      b

4                     4          n      c

5                     6          n      d

6                     7          y      e


If you are interested to view your data completely

View(df1)


#A basic statistical summary

summary(df1)


#Volume/dimensions of the data 

dim(df1)#row and columns

nrow(df1)

ncol(df1)  

For more download the HTML file

For today's hands-on  assessment download the data by clicking here

Biostatistics 

Regression types

Regression analysis is a statistical method used to identify and analyze the relationship between a dependent variable (response variable) and one or more independent variables (predictor variables). It is commonly used to make predictions or to identify the factors that influence an outcome.

There are several types of regression analysis, including:

In each of these examples, simple linear regression can be used to determine if there is a linear relationship between the independent and dependent variables, and to estimate the strength and direction of that relationship.

Steps of regression

Linear regression:

Multilinear regression

Polynomial regression:

Logistic regression:

  Time series regression


Application of GIS and remote sensing in spatial ecology through Species Distribution Modelling (SDM)

Ecological modelers apply remote sensing and GIS in four major fields of Ecology: Ecosystem monitoring, Biogeography, Land use/land cover monitoring, and Ecological Informatics. All these four areas finally contribute to conservation policy-making and urbanization. So, in a greater sense, these two are integrative parts of the outcome from remote sensing and GIS in Ecology. As a researcher, you generally ask the following questions in each field that needs GIS and remote sensing data to find the answer:

Ecosystem monitoring:

What are the components of a functioning ecosystem?

How is their abundance changing in different locations?

What are the impacts of the changing components?


Biogeography:

(a) Where and when to find your species of interest?

(b) How your species of interest is distributed: Patchy, continuous, seasonal, etc.?

(c) Where can the species go and invade?


Land use/ Land cover monitoring:

(a) How much land is covered for a particular activity

(b) What is the landscape with conflicting activities

(c) Where is conservation necessary?


Ecological geospatial Informatics:

(a) What and where will be the impact of your species of interest

(b) Where management will be necessary

(c) Integration of metadata to device conservation plans


If you think about these questions, you will notice a common pattern. All these questions ask for the location of something you are interested in and how that location may change over time. This information about where and when the object/ action/ or species of your interest will occur is called spatiotemporal distribution. Now, you may be interested in either spatial or temporal or both distributions in your research. In either case, you may not always go and survey everywhere to obtain the data of their distribution. This is when you need the modeling part. You can model the data you obtained from surveying some part of the field at some time, and use this information to predict where the species can be at other parts of the field at other times. 

Understanding how your species is distributed over space and time, requires an understanding of what are the resources of your species, and how the resources are available for your species over space and time. For example, chalks, dusters, a blackboard, lights etc. are the resources I need inside a classroom to teach you a topic. There is plenty of chalk, dusters, and light in the classroom, but only one blackboard. So I tend to stand near the blackboard. So a species may utilize many resources, but only the scarcest resource decides where the species will be located. This concept is known as Liebig's law of the minimum. Now there can be more than one such multiple resources with limiting abundances. There can be cases where a resource is more abundant than other resources but consumed at a much higher rate than the other resources by a species. Then these resources with high consumption rates are also important to decide the location of the species.

Therefore, to understand a species' locations, all the limiting resources and their consumption/ utilization rates are important to know. This concept of how many resources and how much of them are utilized by a species is known as the niche of a species. This is why we often refer to species distribution models as environmental niche models. 

 There are three concepts of niche in Ecology:  

(a) Grinnellian niche: The habitat or geographic space that has all the suitable resources for your species. Suppose wildlife scientists, autecologists, and conservation scientists may try to predict and estimate where these habitats are located and how much of the habitat the species is occupying. In that case, they will use a species distribution modelling framework known as the habitat suitability model and habitat utilization model. These models provide a habitat suitability index and utilization score.

(b) Eltonian niche: Elton, in 1927, suggested that the niche is the role of a species in a biotic community. The biotic role, such as a competitor, cooperator, prey, predator, etc. may vary over a large scale of geography. This variation can be random too. This is why conventional SDMs may ignore the biotic interactions over large geography as a noise (Eltonian noise) in species distribution. I'll discuss the concept of Eltonian noise elaborately later in this series. Nevertheless, there is some scenarios where you may need to model the Eltonian niche to understand the species distribution. For example, I modeled both the Eltonian niche and the Grinnelian niche to predict the distribution of the successful nesting sites of Merops philippinus (read Ghosh et al. 2022).

(c) Hutchinsons' niche:  If a species utilizes n number of resources all at different amounts, plotting the resources and their amount utilized will be an n-dimensional hypervolume. That n-dimensional  hypervolume is Hutchinson's niche. One example of modelling this niche in SDM is predicting the distribution of an endemic community. Integrative species distribution models are a good predictor of Hutchinson's niche



Data collection for SDM

Geospatial data is a key component of Species Distribution Modeling (SDM), which is a technique used to predict the potential distribution of a species based on environmental variables. Some of the types of geospatial data used in SDM include:

Regenerate response

We ecologists and Geo statisticians often collects these datasets using remote sensing.

Remote sensing refers to the process of gathering information about an object or area without physically contacting it. This is typically done using sensors mounted on aircraft, satellites, or other platforms that capture data in the form of electromagnetic radiation. The data can then be processed and analyzed to generate information about the object or area being studied.

There are different types of remote sensing data, including:

Optical data: This is the most common type of remote sensing data and is captured by sensors that detect visible and near-infrared light. This data can be used to generate information about vegetation health, land use and land cover, and other features.

Thermal data: This type of remote sensing data captures information about the temperature of the Earth's surface. It can be used to generate information about areas of high heat flux, such as volcanic activity or wildfires.

Radar data: This type of remote sensing data uses radio waves to detect objects on the Earth's surface. It can penetrate clouds and vegetation, making it useful for generating information about topography, soil moisture, and other features.

The collection process for remote sensing data typically involves the following steps:

Platform selection: A platform, such as a satellite or aircraft, is chosen based on the specific needs of the study.

Sensor selection: A sensor is chosen based on the type of data needed and the specifications of the platform.

Data acquisition: The sensor captures data by scanning the Earth's surface from the platform.

Data transmission: The data is transmitted to a ground station for processing.

Pre-processing: The data is processed to correct for atmospheric effects and other factors that can affect the quality of the data.

Image processing: The data is processed to generate images and other information about the Earth's surface.

Analysis: The data is analyzed to generate information about the object or area being studied, such as land use and land cover, vegetation health, or soil moisture.