Problem Set 2

Exploratory Data Analysis

Part One: R

The following data below is aimed at answering one major question. Is there a reason cow-calf operations are located where they are and does available natural forage have anything to do with it? Cow-calf producers unlike commercial feedlots rely heavily on the use of pasture to graze their cows. In fact cattle are often the most economical solution to land use dilemmas in which land isn't being utilized. Grazing cattle can have many benefits including its positive maintenance of top soil, how is promoted diversity while also reducing competition from nonnative plant species, it reintroduces and preserves carbon, and also helps cattle producers provide a relatively cheaper feed alternative.

In analyzing this question, we take a look at cattle inventories that include calf numbers by county along with the number of available pasture acres per county to assess the possibility of a correlation. Additionally, we also take a look at the amount of precipitation within those counties, which in theory should play a direct role in usable pasture acres and also cattle inventories due moistures role in the production process of feedstuffs. If moisture is low, we expect cattle inventories to move lower and the same with available pasture acres. The opposite can be said with higher precipitation.

Summary Statistics

ggpairs Plot

Processing the Data:

In order to process the data in R a few things had to be done. Number one, we needed to ensure that all of our pieces were coming back as numerical values, we need these pieces to report in a numerical format because we are asking R to run a series of calculations, and character values do not allow for any mathematical conversion. Secondly, we needed to ensure we were reporting only those counties of which had reported data present within the data frame, we don't want counties with no information in our model that could potentially skew the data. Where this was especially a problem was within the Census data from the USDA. Not every county reports what is occurring for each piece of the industry, so we needed to organize the data into a format that allowed for us to look at only cattle inventories, pasture acres, and precipitation for those valid counties within the proper states. Any other value needed to be removed from the analysis. This coincides with the presence of heavy tails, also called fat tails, which suggested that we had a few very high values or low values that skewed our bell curve in one direction. Thus we logged the data to minimize that skewness and doing so helped correct the problem, however, there were a few low values still skewing the data left. So, we included a constraint on each variable to help shorten up our range of values while keeping the structures of our bell curves.

To help add structure to our EDA, we then performed a cluster analysis in which we standardized the data by first subracting the mean and dividing the values the standard deviation. This standardization process creates structure and helps to maintain the data quality. Following this step we then estimate the kmeans clustering algorithm within our three clusters to help us identify patterns within our data, merge the data back given the attributes of the data like the state and county in which it represents, and then run that newly formed data into a data frame. That then provides us with the proper data frame to then input into our regression model.

Running the Regression:

In running the regression from our previously processed data we are now looking to identify the relationship between some of our variables. We previously mentioned that we anticipate for there to be positive relationships with our variables. That being that when cattle inventories are up we expect to see more pasture acres and higher precipitation, and vice versa. So to test this hypothesis we are going to control for precipitation amounts, which simply means that we are looking to estimate the correlation between cattle inventories with calves and pasture acres conditional on the correlation between pasture acres and precipitation. To do this we estimate using an ordinary least square model.

As we can see below in "My cattle model" that the correlation between pasture acres and cattle inventories with calves is statistically significant at the 1% level. Our precipitation variable, however, did not appear to share that relationship with pasture acres. Overall, sorting through our data, 3,006 observations we made to which 52% of the variation in our data was explained by the model. This leads us to the understanding that we see significant correlation between where cattle are held in the nation and where there is more or less access to pasture land. For cow-calf producers specifically, we know that they often rely heavily on the use of pasture as a forage source. This model proves that we would expect for cow-calf producers to be more present in areas that have higher densities of pasture land.

Part Two: Tableau

Following input from our instructors, we were told to take a look at a cluster analysis comparing cattle inventories and pasture acres, as well as pasture acres and precipitation. As such, these were our new findings aside from our previously run regression.

As we can see based off of our cluster analysis of pasture acres and cattle inventories, we would want to develop a cow-calf operation in the orange region of the scatter plot, or cluster 2. This cluster in particular shows us regions in which we have greater cattle inventories relative to accessible pasture acres. Overall suggesting that on a county basis, we anticipate to see greater numbers of cattle in areas that have more available pasture land. This intuitively makes sense because we often use rangeland and pasture land that isn't suitable for cropping, for the use of grazing cattle. This is the most economical use and actually promotes plant health and biodiversity.

Clusters 1 and 3 give us a similar story, but in that cluster three would be regions where we would absolutely not want to produce cattle and cluster 1 would suggest areas of acceptable, but not optimal conditions. For cluster 3 especially, we see some higher cattle inventories despite low pasture acres. A reason behind this could be feedlot and stocking operations. Feedlots and stockers serve the purpose of feeding cattle a specialized ration that allows them to grow. This often happens in confined animal feeding operations, or CAFOs, and we see this phenomenon where the operations are placed more strategically near processing facilities that are closer to large municipal regions. This gives them greater access to infrastructure and labor capital for ease of production.

We can also see that by just looking at the observed data, there is a statistically significant correlation between cattle inventories including calves and access to pasture land, and that we assume a positive relationship (i.e. more pasture acres means more cattle present). We can also see that 52% of the variation in our model is explained by the data.

Effective representation of calf inventories (color) in relation to pasture acres (size of dot). This is based on every county in the US for how many pasture acres they have and how many calves they have. The map tooltip function is being utilized.

Aside from cattle inventories, we also want to understand where we are seeing areas of low and high precipitation, this may help us to make sense of where specifically would be an optimal region for a cow-calf operation. Looking at our scatter plot above, we can see that cluster one has areas of the higher precipitation levels, but not necessarily the highest inventory of available pasture land. We compare this with our prior cluster analysis and we can see that much of this region has a lack of cattle inventories, suggesting it may not be the most optimal of locations. Additionally, cluster two has areas of moderate precipitation while having the lowest recorded pasture acres. Cluster three, our most of interest cluster, has the most pasutre acres, but also the lowest precipitation. Again referencing back to our prior cluster analysis. We can see that where we have high amounts of pasture land, we also have high cattle inventories and low precipitation. This goes back to previous conversations on rangeland grazing. Most rangeland is within areas of low precipitation as it is all dryland that may not necessarily be optimal for crop conditions, that not considering the application of irrigation water.

As such, looking at both of our cluster maps, we would want to produce calves in the orange counties where we compare cattle inventories and pasture acres, and in the red region of where we compare precipitation to pasture acres.

As opposed to our prior analysis, here we have the opposite, negative relationship. Here we see that we have more access to pasture land where there is lower precipitation levels. The general relationship here also poses a statistically significant result and shows us that 19% of the variation in our data is explained by the data.

2. Cow Inventories based on calf inventories using the map tooltip function.

Page updated

Google Sites

Report abuse