UGBS 652 Multivariate Data Analysis

EXAMS! EXAMS! VERY URGENT Breaking News 1:

Exam Duration is: Two Hours, Thirty Minutes (2.5 hrs)

I have attached the "MPhil 2015 Presentation Guidelines." and for some, the data set. You have 2 weeks to prepare. BEST!!! It is labeled below as: MPhil 2015 Presentation Guidelines.docx

COMMANDS TO RUN SALESPANEL DATA

sap=read.delim("clipboard") #Load data set the usual way

head(sap) #show top six observations

y = cbind(sap$sales);x = cbind(sap$price, sap$repairs) #combine DV & IVs. For some of you, it did not work because you did not use the $ sign. Please don't use attach & detach.

require(psych) #load psych package for descriptives

describe(sap) #run descriptives

pooling = plm(y~x, data=sap, index=c("firm", "year"),model="pooling" )

summary(pooling)

pooling <- lm(y~x, data=sap)

summary(pooling)

VERY URGENT Breaking News 3:

#wagepan or Unions data set in R

#Install pglm = panel generalized model in R via the usual way

library(pglm) # invoke the pglm package

data(Unions) # load the Unions data set

head(Unions) # view top part of the data

Unions # view whole data set. This will be too big

write.csv(Unions, "C://Users/ohene/Dropbox/SkyDrive/LEGON/UGBS 652 Multivariate Data Analysis/Unions.csv")#export data set back to csv excel, well-arranged. You will have to change your directory path

Urgent Breaking News:

There is another important software called PAST that you want to freely download at http://folk.uio.no/ohammer/past/ For PAST, you don't need to install it. You can work with it even from a folder on your pen drive. You want to start with the old Past (version 2.17), found at the bottom of that website before trying the new version 3.06 (March 2015). Also download the PDF manual at the bottom of the page. Please it is crucial you do this. This does mean we have left our master R software. We will continue to use R. But, more is better!!!

Please bring laptops to lectures each time.

Research Papers:

Berger, A. N., I. Hasan, et al. (2004). "Further Evidence on the Link between Finance and Growth: An International Analysis of Community Banking and Economic Performance." Journal of Financial Services Research 25: 169-202.

Announcements:

  • Urgent Latest News: Latest News: PRINT OUT & BRING t -TABLE TO CLASS THIS WEEK
  • Recent News: Please urgently download the Brukutu Ventures data set. We shall use it for regression and dummy regression next week.
  • Running News: Please bring laptops to lectures
  • Hot News: New R Guide has been added. Please download it and peruse.
  • Important News: Please under construction
  • News:
    • For those of your interested in doing a research on performance assessment of firms or organisations or want research topics for their MPhil theses, I suggest you attend my other MPHIL lecture on Decision and Risk Analysis in semester 2. It is rigorous and can help those who want to pursue PhD.

Downloadables:

All the documents will be password-protected. To download, click on the arrow (which is to the right of the file) pointing downwards. Students have been given the passwords.

Slides/Lectures to download, more at end of the page:

UGBS 652 MVA syllabus or outline 2015 students.docx

R LECTURE.pdf

AnovaTable F

Lectures slides below

Key Texts:

  • Hair J, Black B., Babin B., Anderson RE, Tatham RL, 2010 Multivariate Data Analysis 7e
  • S. Everitt and G. Dunn 2001, Applied Multivariate Data Analysis, 2nd edition.
  • Wooldridge, J.M. (2009), Introductory Econometrics: A Modern Approach, 4th Edition
  • Tabachnick, B.G. & Fidell, L.S. (2001). Using Multivariate Statistics, 4th ed. Allyn and Bacon.
  • Raykov.Marcoulides 2008 An Introduction to Applied Multivariate Analysis, South-Western

Grading:

Class Participation 5%

Assignment 5%

IA 20%

Final Exam. 70%

TA:

    • Charles Turkson

FOR PCA: https://www.youtube.com/watch?v=Heh7Nv4qimU

FOR FA: https://www.youtube.com/watch?v=Ilf1XR-K3ps

PCA and FA QUESTION on European Jobs data:

Look at the European jobs data. The data set is from Euromonitor (1979) and can also be found in e.g. Manly (1986) or Hand et al. (1994) : http://www.dm.unibo.it/~simoncin/EuropeanJobs.html

The data concerns the percentage employed in different industries in 26 European countries during 1979 and contains the following variables:

    • Country: Name of country
    • Agr: Percentage employed in agriculture
    • Min: Percentage employed in mining
    • Man: Percentage employed in manufacturing
    • PS: Percentage employed in power supply industries
    • Con: Percentage employed in construction
    • SI: Percentage employed in service industries
    • Fin: Percentage employed in finance
    • SPS: Percentage employed in social and personal services
    • TC: Percentage employed in transport and communications

In order to be able to use the data set in further analysis we need to reduce the number of variables, ideally we also want to be able to interpret the factors.

• Run head function to see just top of the data set

• compute correlations correct to 2.d.p

• Using the rule of thumb in PCA method, which variables appears to be strongly correlated with:

• Agr? __

• PS? __

• Con?__

• Fin?__

• SPS?__

• Can we proceed to do PCA or FA. Why? Prove numerically!

• What's the proportion of variance accounted for by the 1st PC and the 3rd PC? correct 2 d.p.__

• Interpret the 2nd and 5th PCs__

• What's the cummulative proportion of variance accounted for by the 3rd PC?__

• Find the eigenvalues of the 1st, 2nd, 3rd and 4th PCs__

• How many new PCs that can explain the same information as the original variables did the PCA method find?__

• What linear combination of the original variables accounts for the largest variance (correct to 2 d.p.)?

• What are the equations of the of the 2nd, 3rd and 5th PCs (correct to 2 d.p.)?

• How much each of new variables has the power to explain the information that the original variables have. Use any table?__

• How many PCs explain the information that the original variables have? Use the screeplot__

• Using the biplot, how many PCs will you choose and which variables project unto which PC?__

Can we proceed to do FA? Why? Compute the communalities for Agr, PS, Fin, SPS, and TC

• Which 2 variables might be mostly unique & why?

• Which variables seem to load well on factor 1?

• Write the first factor model

• How many percent of the total variance do factors 1, 2 & 3 account for?

• What does the p-value suggest? Using the p-values, how many factors do you extract and explain the reason.

Lecture times:

Mon 13:30–15:20; at 1E1

Wed 9:30-11:20; 1W1

Free Important Softwares: R http://www.r-project.org/

RStudio at http://www.rstudio.com/ide/ & PAST athttp://folk.uio.no/ohammer/past/

Download the manuals as well.

RStudio is the premier integrated development environment for R. It is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or over the web with RStudio Server.The RStudio console includes a variety of features intended to make working with R more productive and straightforward.

http://www.andrewheiss.com/blog/2012/04/17/install-r-rstudio-r-commander-windows-osx/

For PAST, you may download the old version 2.17c HERE: http://nhm2.uio.no/norlex/past/download.html and

IMPORTING DATA FROM EXCEL TO R

bp<-read.delim('clipboard') ## Copy your excel data from excel, come to the source here, type the command and wala!

For an intro to R see the youtube videos (in fact, you can subscribe to this guys videos):

http://www.youtube.com/watch?v=WJDrYUqNrHg

http://www.youtube.com/watch?v=U69k3hjDlM0

http://www.youtube.com/watch?v=R5Z22gwnpCk

For R commander see

http://www.youtube.com/watch?v=V52baivx26w

REMEMBER "A little learning is a dangerous thing;

Drink deep, or taste not the Pierian spring". So you either drink managerial economics deep or you taste not.