Quantifier en Sciences Sociales

Un rappel sur les probabilités est ici. Le cours de Julien Jacques sur les statistiques inférentielles est ici.

Un très bon cours sur les statistiques Bayesiennes est ici. Un très bon cours sur les chaines de Markov est ici.

Un cours sur les statistiques Bayesiennes inspiré des transparents de Stephen Bates est ici. Des compléments sont ici.

L'application au modèles hiérarchique temporels et spatiaux est ici.

Le contrôle en distanciel de l'année dernière est ici.

Les projets de cette année sont ici:

Projet 1, Projet 2, Projet 3, Projet 4 et Projet 5. Certains ont été proposé sous une autre forme par Mathieu Ribatet. Vous pouvez également proposer une idée de projet si vous avez exploré des sujets intéressants de votre côté.

Let projets de l'année dernière sont ci-dessous

Go through the examples of this text : https://plsc-31101.github.io/course/collecting-data-from-the-web.html (il n'est plus possible d'accéder à l'API de Twitter donc lire avec attention et éventuellement, regarder comme faire pour LinkedIn!)

Go through the examples of this other text : https://plsc-31101.github.io/course/text-analysis.html

Explain what kind of analysis is performed.
Interpret the results
Try to replicate with other datasets.

R for Psychology

Psychological methodologists jumped on the R bandwagon early, and now have a very wide range of tools to work with from social sciences broadly speaking, to psychometrics specifically, and others that would be of notable interest such as multivariate analysis, experimental design, meta-analysis, and even brain imaging.

Example

I’ll start with a latent variable model, as many psychologists use such models to study the underlying constructs they attempt to measure, such as for personality and clinical diagnoses. To begin, there are a great many tools in the psych package to engage in exploratory factor analysis, even beyond the standard way most approach such an analysis. I show an example of some data that has 3 underlying factors. It takes one line to run the model, summarize it or visualize it (latter not shown). To extract factor scores or loadings for further processing takes no effort at all.

library(psych)

standardFA = fa(Thurstone, nfactors = 3)

summary(standardFA)

## Factor analysis with Call: fa(r = Thurstone, nfactors = 3)

## Test of the hypothesis that 3 factors are sufficient.

## The degrees of freedom for the model is 12 and the objective function was 0.01

## The root mean square of the residuals (RMSA) is 0.01

## The df corrected root mean square of the residuals is 0.01

## With factor correlations of

## MR1 MR2 MR3

## MR1 1.00 0.59 0.54

## MR2 0.59 1.00 0.52

## MR3 0.54 0.52 1.00

fa.diagram(standardFA) # visualize

faScores = factor.scores(Thurstone, standardFA) # extract factor scores

faLoadings = loadings(standardFA) # extract the loadings

## cor.plot(standardFA)

Explain what kind of analysis is performed.
Interpret the results
Try to replicate with other datasets.

R for Sociology

Sociologists have plenty of ways to benefit from using R. Along with general social science packages, there are packages pertaining to official statistics and survey design, and graphical modeling includingsocial networks. I will provide an example of the latter.

Example

For network analysis there are a great many tools in R such as the packages sna, igraph and so forth. As a simple example we can begin with a data matrix, and from it create an adjacency matrix in which a 1 (or more) represents a connection for two observations, and finally plot the network graph. Let’s say we have a data set for 10 individuals and 6 individuals.

First we take a look.

colnames(snData) = paste0('V',1:6)

snData

## V1 V2 V3 V4 V5 V6

## Yelberton No No No No No No

## Billy No No No No No No

## Fred No No No No No No

## Henrietta No Yes No No No No

## Janie Yes No No No No No

## Bertha No Yes Yes No No No

## Willie No No No No No No

## Iggy No No Yes No No No

## Flozelle No No Yes Yes No No

## Louise No No No Yes No Yes

For our purposes, let’s say that if anyone has ’Yes’ as a value for a particular variable, then, based on that, individuals are connected to anyone else with a ’Yes’, and more so the more variables in which this occurs. Now there various distance metrics we could employ for such data that would only take a single line of code to create an adjacency/distance matrix (e.g. binary distance, Hamming’s distance etc.). For demonstration though, we’ll do it ourselves. For a moment you might think about how you would go about this in a standard statistics package. I can imagine your thoughts might take you through a double loop to calculate adjacency, figure out where to store or write out the matrix, do all this as a completely separate enterprise, then bring in your matrix to possibly another specialized package entirely that can better handle such things. This is an area traditional statistical packages have only recently begun to even offer much regarding, but regardless there would be a number of steps In the following, I create an object containing all possible pairwise connections among the rows 1 through 10. I then make an empty matrix that will eventually become the adjacency matrix. In one line of code I’m able make the all calculations by implicitly creating my own function that will sum all occasions in which both inputs are ’yes’, and this is applied to the object with all the pairwise combinations, whose elements serve as a row index for the original matrix. I then fill in the lower triangle of the matrix with those values, and as a shortcut, use the dist function (for distance matrices) to fill in the rest.

pairwise = combn(nrow(snData), 2)

adjmat = matrix(NA, 10, 10)

adj = apply(pairwise, 2, function(x) sum(snData[x[1],]=='Yes' & snData[x[2],]=='Yes'))

adjmat[lower.tri(adjmat)] = adj

adjmat = as.matrix(as.dist(adjmat, diag=T, upper=T))

adjmat

## Yelberton Billy Fred Henrietta Janie Bertha Willie Iggy Flozelle Louise

## Yelberton 0 0 0 0 0 0 0 0 0 0

## Billy 0 0 0 0 0 0 0 0 0 0

## Fred 0 0 0 0 0 0 0 0 0 0

## Henrietta 0 0 0 0 0 1 0 0 0 0

## Janie 0 0 0 0 0 0 0 0 0 0

## Bertha 0 0 0 1 0 0 0 1 1 0

## Willie 0 0 0 0 0 0 0 0 0 0

## Iggy 0 0 0 0 0 1 0 0 1 0

## Flozelle 0 0 0 0 0 1 0 1 0 1

## Louise 0 0 0 0 0 0 0 0 1 0

Now we can plot the graph. The plot shown has some additional argu-

ments added.

library(igraph)

g = graph.adjacency(adjmat)

plot(g)

Note that we were able to not only able to do this efficiently in our code, we never had leave the current R environment, use multiple data files or multiple programs, and we have plenty enough functionality to calculate network metrics and model the network itself. We also have code that would be trivial to change to other parameters of determining adjacency.

Explain what kind of analysis is performed.
Interpret the results
Try to replicate with other datasets.