Data Sets for Statistics
1
2
3
4
5
6
7
8
9
10
11
12
13
14
16
17
18
19
21
22
23
24
25
26
27
28
29
30
32
33
34
35
36
37
38
39
40
41
42
43
44
45
To
50
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
70
71
72
74
75
76
78
79
80 to 84
85 to 88
89 to 92
Description
Red Mountain Pre- and Post-Burn Soil Study
A sagebrush site was prescribed burned to try to increase the browse for elk. The question here is: does burning change nutrients at depth within the mineral soil (and ultimately, soil processes)? There are a lot of different things to look at: different cations, bulk density, rocks and roots, etc.
Wood Decomposition in soils
Wood stakes are buried in the ground and after some time, the amount of decomposition is measured. (This looks like a regression analysis problem.)
The Impact of Potato Virus Y on Russet Burbank Potato Yields
This 3-year study measures the virus level in potatoes vs. the yield at harvest. The question is whether the virus adversely affects the marketable yield. (This is likely another regression analysis problem.)
Awned vs. Awnless Wheat Yield Study
The farmers think that by blending two lines (varieties) of wheat together they get better grain yields than if they have just one or the other or both, but in separate fields. This study has multiple chances to compare two populations.
The Effect of an Agricultural Fungicide on Rat Genetics
Pregnant female rats were exposed to a fungicide. This study looked at whether the effects of the fungicide lingered on to subsequent generations. This is a very large study and can be broken into multiple data sets for multiple students.
How Far Do MSD Faculty and Staff Walk?
MSD faculty and staff kept track of how many step they took each day for two weeks. Various variables (age, gender, ~weight, building, etc.) could be examined to see if they affect the number of steps. Comparing week #1 to Week #2 is also possible.
Do Fatty Acid Supplements Change the Chemistry of Breast Milk?
Lactating women are given a regimen of dietary supplements and the chemistry of the breast milk is examined to see if there is any difference. We are studying whether changing one variable affects others.
Do Fatty Acid Supplements Change the Cell Content of Breast Milk?
Lactating women are given a regimen of dietary supplements and the types of cells in the breast milk are examined to see if there is any difference. Again, we are studying whether changing one variable affects others.
Fatty Acid Modifying Enzyme (FAME)
Bacterial strains that survive in abscesses in the host tissue produce FAME. Does sodium phosphate regulate the production of FAME? Once again, we are studying whether changing one variable affects others.
Cation Exchange Capacity of Soils after Different Logging Actions
This is a 5 yr. study in the Priest River Experimental Forest. Different logging actions and different amounts of soil compaction are analyzed to see if there is any correlation with the cation exchange capacity of the soil. Here we have a combination of variables that when changed, may cause a property of the soil to change.
Water Quality Assessment Before & After “an Event” Based on Insect Populations
This study measures the insect populations in a river in Northern Idaho before and after “an event” in July, 2008. This study is an example of how populations may or may not change over time and whether outside events affect the insect life in a river. This is a large set of data with many possible analyses available. Change in means over time, change in proportions over time, change in ratios over time, etc.
Plant Regeneration After a Forest Fire
This study measures the impacts of pre-fire site conditions, burn severity and post-fire treatments on plant regeneration. This data has been provided by a graduate student who is currently working on her degree. This is a large data set that could be divided into a couple of different studies.
The Best Places to Live in America
This is a study that compares 9 variables (such as cost of living, crime, climate, education, etc.) for 329 metropolitan areas. This data can be used to determine “The Best Place in America”.
Tree Seedling Growth
This study measures the growth of tree seedlings in different mediums. This is a good example of comparing two populations.
The Effects of Global Warming on Plant Growth
This is a huge study where multiple climate variables were measured along with multiple tree and plant growth variables. The question here is whether there is any measurable correlation between the climate and plant growth.
Butter vs. Margarine
This study measured the effects of butter vs. margarine in the mother’s diet on breast milk. Dairy products contain trans fatty acid and margarines contain either high or low amounts of partially hydrogenated vegetable oils. The question is whether the diet of the mother (in regards to fat content) affects the milk for the baby.
Fat Intake in Children
This study documents the dietary fat intake of children over a three day period.
Dietary Effects on Cognition and Memory
This is a study done on Post-Menopausal women to study potential dietary effects on memory and cognition. The women were divided into 3 groups – soymilk, cow’s milk & isoflavone tablets. Thyroid functions were measure and correlated with the groupings.
Consequences of Cheating in School
This study looks at how the different “rewards” and penalties of cheating in school affect the amount of cheating.
U. of I. Female Faculty and Staff and Campus Climate Issues
This is a survey of female staff and faculty at the University of Idaho regarding participation in Athena (a women’s professional organization) and potential/perceived gender issues on the campus. We are interested in knowing if there is a correlation between things like job title, family structure, etc. and job satisfaction.
Survey of Idaho Residents Regarding Immigrants and Immigration
This is a phone survey conducted to determine Idahoans feelings about immigrants and immigration. We will be interested in knowing whether there is any association between political leanings, socio-economic status, income, etc. and opinions regarding immigration.
Guesses of the Population of Turkey and Canada ***
A class of 78 students completed an experiment to see if the information on a data collection form influences the responses. The data will have to be manually retyped into Excel. Two populations of independent samples.
Music Habits of Students ***
A "sample" of 227 students completed a survey about their interest in music. There is a lot of data here which will have to be manually retyped into Excel.
Body Measurements ***
Data on a class of 213 students includes gender (0 = male, 1 = female), GPA, height, weight, left and right arm length, and left and right foot length.
Tree Seedling Growth II
This study measures tree seedling growth base on container size, fertilizer and application of copper to the container. This is a multiple regression analysis but it could be 3 single regressions.
Tree Seedling Growth III
This study evaluates seedling growth in a peat substrate amended with biochar (raw and pelletized). Biochar is basically carbon similar to what might be left on the ground after a forest fire.
Tree Seedling Growth IV
This study measures tree seedling growth as influenced by fertilizer and inoculation levels. Multiple regression seems to be the way to go here.
Tree Seedling Growth V
Growth of outplanted longleaf pine grown with different nitrogen rates inside a greenhouse or outdoors. Longleaf pine has a “grass stage so it may take several years before it starts to grow taller. This is an adaptation to fire, common on longleaf pine sites. Seedlings are considered to be out of the grass stage (the condition foresters want) when they are 25 mm in diameter and more than 10 cm tall. This is a study that compares two populations.
Eco Apples
This data set measures the sale of eco-labeled and regularly labeled apples. We are trying to determine if there is a correlation between the prices of eco-labeled and regular apples.
Getting to the Final 4
This data tries to indicate what factors about a school lead to success in men’s college basketball.
Hourly Wages
What determines how much money you make? Union membership? Experience? Good Looks? This data set examines different variables that influence your hourly wage.
Campus Crime Rates
This data tries to determine which factors have an influence on college campus crime. This is a good regression problem.
CEO Salaries
Do CEO’s really earn their keep? This data looks at the variables that influence the salaries of corporate CEO’s. This data was collected in the late 1980’s and early 1990’s.
Hail To The Chief
Can the next presidential election be adequately predicted? This set looks to predict the election based on a number of variable.
College versus the Labor Market
What factors contribute to the likelihood that a worker stays in the labor market or stays in school? This study looks at gender, race, education level, work experience, test scores and geography.
Hourly Wages 2
This is another survey that tries to determine the factors that influence a worker’s wage. This looks like a good multiple regression problem.
Is Wine Good For You Or Bad For You?
Conventional wisdom says that drinking wine is good for your heart but bad for your liver. This data tests that hypothesis. More regression.
Was Sir Isaac Newton Right?
This data set is from observations of a falling object. Does this data support the null hypothesis that
Student(s)
1.
2.
3.
4.
5.
6.
1.
2.
3.
4.
5.
1.
2.
3.
4.
1. testes
2. prostate
3. kidney
4. ovary
or do we reject the and accept the alternate hypothesis that the exponent for t is
?
Factory Output
How much a factory produces is a function of how much equipment there is (capital input) and how much the workers contribute (labor input). This data set explores that relationship.
Noxious Weeds – Canada Thistle
Canada thistle seeds have evolved to be spread by the wind. The harder the wind blows, the farther the seed flies. A weather station was set up in a weed patch and it recorded the weather, including wind, minute by minute. Use this data to predict in which direction and how far the seeds were dispersed.
Eating Habits within Vietnamese Households
This study looks at the various factors that may affect the eating habits and nutritional intact of Vietnamese people.
Forest Regrowth – Trees and Shrubs
This is a study of areas of forest that were measured in 1966 and restudied in 1996. Some of the areas were left undisturbed, some clearcut, some were burned (both proscribed burn and natural forest fire). The study in 1996 measured tree and shrub regrowth in each region. This is a huge study that has 6 distinct sets of data that can be worked on by 6 different students.
Snowmold Effects on Soft Winter Wheat
This study looked at the effects of snowmold on yields at harvest. This also measured the resistance of the wheat to snowmold.
Lodging (blowdown) and Stripe Rust Effects on Hard Red Winter Wheat
This study looked at the effects of lodging and Stripe Rust on yields at harvest. This also measured the resistance of the wheat to stripe rust.
Predict Your College Costs
This study tries to predict how much your college will cost based on enrollment, private/public, number of applications, number accepted and the acceptance rate.
Pizza Taste Tests
Different pizzas from different shops are rated. Your job is to try to determine what affects ratings the most.
House Prices in Spokane, Washington
This study examines some of the variables that affect the prices of houses that are for sale.
Snake River Water Temperature Monitoring
This data set is the result of monitoring the water temperature of the Snake River at 24 different locations. Correlations between the different locations will be one of the interests.
Snake River Fish Survey
This data set is from a study of the populations of various types of fish in the Snake River during 2002-2003. This data is good for a multiple regression analysis.
Housing Values in Moscow, Idaho
This data set comes from the Latah County Assessor’s office. It compares the sale price of homes in Latah County and cross-references these values with sales data, age of the house, size of the house and house type. This will be a multiple regression-type analysis.
Fatty Worms (C. elegans)
Different genetic mutations in the C. elegans worm produces different amounts of fatty acids in those worms. The amount of fatty acids controls various functions in these worms. This data studies the variations between wild-type worms and mutant-type worms. This data is good for t-tests and multiple regression analysis.
Rabbits & Cineole
Cineole is a chemical that occurs in sagebrush that can be toxic to animals. In the wild, pygmy rabbits eat more sagebrush than cottontail rabbits. We mixed cineole in rabbit pellets at different amounts and examined the amount of food the rabbits would eat. Analyses could include the comparison between species and % cineole, and body mass and food intake.
Grouse Nests
Nests of 2 species of grouse were monitored in eastern Washington. If at least one egg hatched, the nest was "successful", if not, it was unsuccessful. The number of eggs and hatched chicks were counted. A variety of habitat measurements were made at each nest and at a paired "non-nest" site nearby as a comparison. Try to determine which factors help or hinder successful grouse nesting.
Elk Forage
We measured the total biomass of forage available to Roosevelt elk on the Olympic Peninsula within tree stands in different habitats and different months. We also measured the nutritional quality of the plants, including the dry matter digestibility (ddm) and the digestible protein (dp) content (both in %). We then used a model that uses the nutritional requirements of elk to estimate the nutritional carrying capacity of elk (NCC, which is the number of elk per hectare per day that could be supported at a minimum nutritional level. We calculated the proportion of the total biomass of plants that could be used by elk as food.
Fawn Growth
We measured growth of mule deer fawns in relation to the diet fed the mother and her nutritional status. The goal is to determine if there is a relationship between a fawn’s growth and a number of other variables related to the doe and her nutrition. This is probably a multiple regression type analysis.
Kahalu’u Bay
This is a study of water conditions in various parts of a bay in Hawaii. There are several things that could be done with the data. There will be differences between the sites, since some are next to fresh-water springs that enter the bay, and one is a pond that is (partially, depended on storms and whatnot) separated from the ocean. So some sites will have much more variation in pH, temperature, dissolved oxygen, and turbidity. There will also be differences due to season, tides, and storms.
Effects on Wheat Production
We tested 186 wheat lines at different locations over three years with different water applications (drought, irrigated, rainfed). You could analyze locations, water, years, market class (club or common). You could also look at correlations between traits. Such as did flowering date, or presence of awns, or canopy temperature affect bushels per acre or test weight, etc.
Tree Regrowth after Soil Compaction from Logging
This is just height and diameter data for trees from some study plots near Council ID. If your students are really enthusiastic they can calculate tree volume based on the height and diameter and see how that differs from the straight numbers.
This will be a multiple regression analysis to see how soil compaction and harvest strategy effect future tree sapling growth.
Risk of Stroke
This data set asks the student to use multiple regression to determine the statistical significance of the risk factors for stroke at the 0.05 and 0.01 levels of significance.
Wheat Breeding
This data set has a number of different analyses that could be run. Data was collected on many different breeds of wheat grown at different locations in Washington. Two things that could be analyzed: which breeds give the best yields and which are most disease resistant?
College Admission based on SAT Scores
This data set could be used by college admissions departments. It compares college freshmen’s GPAs with their SAT Math and Reading scores. The goal is to assist admissions departments in selecting good applicants based on their SAT scores. In other words, can we predict college success based on High School SAT scores?
Fish Hatchery Output
This data set allows us to determine what factors effect fish hatchery success. For example, how do various input variables, such as feeding strategies, water flow, temperature, etc. effect fish size, weight, mortality, etc.? Which hatchery is most efficient?
Substance Abuse in College Students
This is a study of first year college students enrolled in 2- and 4- year colleges. The study is trying to determine whether there is a difference between 2- and 4-college students in alcohol (and other substances) use, parent support, life stresses and neurocognitive factors that account for links between sleep deprivation and high-risk alcohol use. There are a lot of analysis techniques that can be used here including hypothesis testing, ANOVA and multiple regression.
Farming Practices and Climate Change
This data set includes two large surveys of Pacific Northwest farmers. The surveys ask about their farming practices, their beliefs about climate change and whether or not they are/have changed their farming practices because of climate change. There are many options that you can choose from to study here.
Obesity and Fat Physiology 2
Obesity and the regulation, function and biosynthesis of unsaturated fatty acids in animals (including humans) is modelled in c. elegans. This data shows relative fatty acid composition of various strains of C. elegans (a microscopic nematode worm used as a model organism for biomedical research). Wild type, a mutant strain and two transgenic strains of nematodes were grown at two different temperatures. These different temperatures may have changed the ratios of the fatty acid composition. T-tests will be the predominant tests used in this data set but other correlations may be found.
Substance Abuse in College Students A-E
This is a study of first year college students enrolled in 2- and 4- year colleges. The study is trying to determine whether there is a difference between 2- and 4-college students in alcohol (and other substances) use, parent support, life stresses and neurocognitive factors that account for links between sleep deprivation and high-risk alcohol use. There are a lot of analysis techniques that can be used here including hypothesis testing, ANOVA and multiple regression. Five different students will look at different aspects of this study.
Power Grid Fluctuations
This data records power grid values from all over the U.S. The goal here is to identify and evaluate “events” or anomalies in power production. Four different students will work on this data but from different cities.
Epigenetic Effects on Cell Morphology
This study follows four generations of rats after the original generation was exposed to pesticides. The cell morphology of the following generations was studied to determine the lingering effects of the exposure over multiple generations due to epigenetic changes in the original generation. Each of four students will look at the whole 4-generation study for one of the organs. That is, one student will study the testes data, one will study prostate data, one will study the kidney data and one will study the ovary data.