Introduction
An important aspect of working in a laboratory is being able to faithfully keep track of data. There are many ways to keep and store data (see Lab Notebooks). One way to carefully store data and quickly analyze it, in an organized way is to use a computer program like Microsoft Excel. This spreadsheet program allows us to enter data in specific ways that not only make it easy to organize but also allows for many different mathematical operations to be performed in a few keystrokes. Why should we use Microsoft Excel in our workshop? One reason: it's included with our university account (full access to the Microsoft Office suite can be found at this link, using your UofSC username and password: portal.office.com). In this session, we’ll be learning how to use Microsoft Excel by logging data that we collect from cell count data. Which cells are we using? Where can I find more information on Microsoft Excel? Let’s find out.
Microsoft Excel Basics
One can think of spreadsheet software like a notebook, allowing us to input data into cells (cell example: A1; Figure 1). These cells are arranged vertically into alphabetized columns (A-F are shown in Figure 1) and horizontally into rows (1-16 shown in Figure 1). We can easily identify cells by typing their column then row: for example, A1. What can we type in a cell? It could be numbers, text, or mathematical formulas. The following is a list of important hints and tricks for using Microsoft Excel for easy, organized data entry:
1. Label everything.
a. In the moment, it may seem easy to realize what the data means and how it’s organized, yet returning later, we may be confused.
b. It’s crucial to include titles for columns. Some even label each row if they are unsure (or at least include a small key that clearly states what data is in each cell, row, or column).
c. When in doubt, label it!
d. Units are crucial. Ensure that units are clearly stated.
2. Making a list? Work in columns!
a. When collecting/inputting data, work in columns. For example, if we have 5 different cell density values to input from the same treatment, place them all in one single column (Figure 2).
i. This makes mathematical operations and statistics much easier.
3. Navigate with arrow keys or “tab” and “enter”
a. We can click on specific cells to enter data, and often we’d like to quickly move to another cell. This can be done in a variety of ways, the arrow keys, tab, and enter/return are some of the quickest.
4. To input a formula for a mathematical operation, start with “=”.
a. Once we’ve typed “=” Microsoft Excel recognizes this as the start of a formula.
b. There’s a wide variety of things that can be done now, we’ll mention several below when discussing statistics.
i. Example: To find the average of a particular set of data, simply type “=AVERAGE(data start:data end)”. i.e. “=AVERAGE(B3:B7)”
c. Note that once “=” is typed, everything else we type is included in that equation until we hit enter/return or otherwise exit the cell.
d. Tip: we can drag our cursor over data that we want to include in a formula. This works well with sets of data that can be easily viewed on the screen. For extremely large data sets, it’s easier to physically type the range (for example B3:B1898).
5. Statistics in Excel
a. What are statistics? Simply the collection, analysis, interpretation, and presentation of numerical data. Often what occurs is we collect data in the form of a sample: hopefully a representative sample, meaning the data represents the larger population. Statistics is very powerful, allowing us to view trends, differences, and similarities in data. Descriptive statistics summarizes a specific data set. Inferential statistics allows us to make predictions from a data set or data sets (remember we’re hoping our sample is representative of the population). What are some examples? When would we use statistics? How can we do this in Excel? Let’s find out!
b. Some basic descriptive statistics that will be used often include average (mean), median, and mode of the data set.
i. Average is the sum of the data divided by the number data points (for example, if we have a data set of 3,4,4, and 5; the average would be 3+4+4+5=16/4= 4). It can be thought of as a value to show the central tendency of a data set.
1. To find average in excel use the formula: =AVERAGE(datastart:dataend).
ii. Median is the middle value of a set of data ordered from smallest to largest. In our example set, 3,4,4,5; the median would be 4.
1. To find median in excel use the formula: =MEDIAN(datastart:dataend).
iii. Mode is the most frequent number in a data set. For our example set, 3,4,4,5; the mode would be 4. This may or may not be useful for describing our data set.
1. To find the mode in excel use the formula: =MODE(datastart:dataend).
iv. Standard deviation is a measure of variation in a data set. Usually the smaller the standard deviation, relative to the data set, the more similar the values are in the data set.
1. To find the standard deviation in excel use the formula: =STDEV(datastard:dataend).
c. Inferential statistics that we’ll cover is the t-test.
i. T-tests allow us to determine if there’s a statistically significant difference between the average of two data sets.
ii. Note: there are many different inferential statistic tests that can be performed and there’s an entire field devoted to this, we choose to highlight one of the most utilized in molecular biology: the t-test.
iii. To perform a t-test in excel, we first need to see if we have the Data Analysis Toolpack installed.
1. Click on “Data” tab at the top of the window.
2. Look for “Data Analysis” in the Analyze section (usually to the right of the window).
3. Don’t have it? No worries, click on “File” then “Options” and “Addins”. Search for “Analysis Tookpack” click on it then “Ok”.
iv. Select “Data” then “Data Analysis”, scroll down and select “t-test: Two-Sample Assuming Equal Variance” if our variance is similar in the data we’re comparing (Figure 3).
1. For the sake of our example and the data we’ll be working with in our workshop, this is the best option.
v. We’ll now be able to select each variable range and set our alpha level and where the output of the test will display.
1. This is where it pays off to have our data in columns as the t-test will require our data to be presented in column format.
2. What’s an alpha level?
a. The probability of rejecting the null hypothesis.
b. Alpha levels are typically 0.05 but can be set depending on the data and the stringency desired.
3. What is a null hypothesis?
a. For this test, the null hypothesis is the two sets of data we’re comparing have the same mean.
b. In general, a null hypothesis is one of no difference between the data sets we’re observing.
4. For inferential statistics, we are testing hypotheses. The opposite of the null hypothesis is called the alternative hypothesis and for the t-test states that there is a difference between the means of the two data sets being compared.
5. When selecting output, depending on the spacing of the spreadsheet, we may select an entire additional worksheet or simply an output range on the current worksheet.
vi. T-test output gives us several pieces of information.
1. We’re given the mean, variance, number of observations, and several important pieces of information related to statistics.
a. P-value: arguably the most important piece of information given in the test.
i. The p-value tells us, in this case, how likely it is that the average of the two sets of data are the same.
ii. The alpha level sets the p-value level in terms of whether we accept or reject the null hypothesis. Note this is arbitrary to some degree, 0.05 is generally used but sometimes smaller values are used.
iii. In our case, if the p-value is less than 0.05 then we reject the null hypothesis and the difference between the means is statistically significant.
b. One-tail or two-tail p value? Which one should we use? If our alternative hypothesis is about showing a difference between the two then the two tailed p value is appropriate, if we are using a statistical test and hypothesis that implies one is greater than another, we could use the one-tailed p-value.
6. Creating Graphs in Microsoft Excel
a. Data in columns is great, but often we want to visualize our data in a way that appeals to us and potentially, an audience.
b. Luckily Microsoft Excel has many different graphing options that we can use to visualize our data.
c. Please note that we’re going to cover creating a bar graph, yet there’s so many options in excel. If we ever have specific questions, a great resource is the internet! YouTube has numerous tutorials that go through the entire process of creating specific graphs in Microsoft Excel. **Note: not all these tutorials or information online are accurate, proceed with caution. (Hint: websites that include .edu are often fairly accurate).
d. Creating a Bar Graph
i. Click on “Insert” tab in Microsoft Excel. Then click on the icon that looks like a bar graph (see Figure 5). Select “2D Column” and “clustered column”.
ii. A small rectangle will now appear in the worksheet. From here, click on “Select Data” to choose which data will be graphed.
iii. In this pop-up window, we’ll be able to select our data (Figure 6). We’ll want to create separate series for our data. To do this, select the “plus sign” under “Legend Entry: Series”. Then enter the name of the series and for the Y value, select the mean of our cell densities. Complete this step again to add the second set of data, the hydrogen peroxide treated cell densities. Select Ok when ready to graph the data.
iv. The graph that’s produced is generally not the best. We’ll want to make some adjustments to help our graph look better. (Figure 7).
1. Make sure there are titles for both the x and y axis.
a. This can be adjusted under the “Chart Design” tab. Depending on how the data is represented, we may need to adjust a few settings under “Select Data” as well.
2. Units should be included.
3. Labels are needed for all displayed data.
4. A graph title must be included.
5. We can also include error bars and specific labels that designate the statistically significant difference between the graphed data as well.
6. **Most changes that can be made will be found under the “Chart Design” tab.
Microsoft Excel provides a useful program that can not only help us organize and store data but can provide powerful mathematical operations and even statistical analyses. With a few keystrokes and some careful attention to detail we can utilize this program to create nice visual representations of data as well. One thing to note: Microsoft Excel isn’t the only program out there. There are numerous programs devoted to graphing alone that make stellar graphics (we chose Microsoft Excel for its availability and practically ubiquitous implementation).
One of the best ways to learn about Microsoft Excel is to use Microsoft Excel! Below is a file that we can download as a Microsoft Excel to test ourselves!
Figure 1. Spreadsheet example. Note each individual block is called a cell and arranged into vertical columns (A-F shown) and horizontal rows (1-16 displayed).
Figure 2. Example spreadsheet with data from treating HL-60 cells with hydrogen peroxide and no treatment (control). Note how data is arranged in columns and everything is clearly labeled.
Figure 3. Performing a t-test in Microsoft Excel. Click Data, Data Analysis, and then select the appropriate t-test. If variance is similar, we can choose the t-test: Two-Sample Assuming Equal Variance option.
Figure 4. T-test output from our example data. Note it provides several pieces of information including the mean, variance, observations, and p values. In our example, we reject the null hypothesis that the means of our two data sets are the same (which means there is a statistically significant difference between our two data sets).
Figure 5. Creating a Bar Graph in Microsoft Excel. Click Insert, Column Chart, and 2-D Column to create a bar graph. There’s plenty of other graph variations and customizations to select.
Figure 6. Options for creating a bar graph in Microsoft Excel. We can have the program try to create a graph for us or manually enter the information (this is the preferred manner).
Figure 7. A graph made using Microsoft Excel. Note we had to adjust the axis titles, title of the graph, and even the labels of the bar graphs themselves. Plenty of customization options are available under “Chart Design” tab.
I. Communication tips
A. Presenting your data
1. It is important to be able to explain your research project, the results that you obtain, and the importance of these results.
2. This is done in both oral and written formats.
a. oral presentations: lab meetings, research seminars, journal clubs, poster presentations
b. written presentations: journal articles, grant submissions
B. Finding your way in the lab
1. Each lab has its own culture and thus differs in important characteristics. For example, different labs differ in how much time you will spend with the principal investigator (PI), whether there are common solutions or whether each person makes their own solutions, etc.
2. It is important to get along with your fellow lab members. Be considerate of your lab mates. Keep your work area clean and do not leave a mess in common areas. Complete any lab jobs you have been assigned in a timely manner.
3. Don't assume that people can immediately stop what they are doing to help you. Plan ahead and request time with your mentor or other lab members.
C. Lab courtesy
1. Don't use another lab member's reagents or buffers without permission.
2. If you use the last amount of a solution or kit, let someone know.
3. Do not change the location of equipment or common reagents in the lab. If you do need to transiently move/use something, return these items to their original location.
4. Cleanup after performing experiment. Make sure reagents are back in their original location.
II. Oral presentations
A. Types of oral presentations
1. There are many different situations in which you might be discussing your data with others. This might be a one-on-one meeting with your mentor or PI, a more formal lab presentation, an oral talk at Discover UofSC, or a presentation at a scientific meeting. In all of these cases, you need to spend time preparing what you want to say.
2. Preparation
a. Learn as much as possible about the topic to be discussed.
b. It is important to tailor your presentation to your audience. Consider who will be in the audience and how much background information you need to present. Why should the audience care about what your work?
c. Present your data in a clear, logical, and organized manner that is easy for the audience to follow.
d. Practice your talk. First practice on your own. Time your talk and make sure that it is the appropriate length. Make changes and reorganize after you practice it. Then practice it in front of other people (your mentor or friends) and get feedback. Do this far enough ahead of time so that you can make additional changes.
3. Presentation
a. Speak clearly. Do not speak too fast or too slow.
b. Try to eliminate crutch words like "um", "you know", "ok".
c. Be yourself. It is fine to show your own personality.
d. It is ok to be nervous. This is normal.
B. Visual Aids (PowerPoint presentations)
1. Some tips for PowerPoint presentations
a. use sans serif fonts (like Arial, Helvetica)
b. make the font big enough so that it is viewable from the back of the room
c. each slide should make one point; personally, I like this to be stated as the title of the slide
d. figures should be labeled so that they are understandable without much explanation
e. estimate one slide per minute of talk; e.g. a 10min talk consist of 10 slides
f. your talk should include background slides, data slides, conclusion slide, and an acknowledgement slide
B. Posters
1. A poster is a visual communication device. It is used to engage other scientists in conversation about your work.
2. Tips for making an effective poster
a. Use more graphics than text
b. Explain why your work is important, what you did, and what you found
c. Make sure your poster is pleasing to look at
d. Organize your poster into different sections: Introduction, Results, Conclusions
e. Make sure the font is large enough to read from 3 feet away
f. Use dark letters on a light background
g. Arrange in columns that are read left to right
3. Presenting your poster
a. practice your presentation ahead of time
b. as people come up to your poster, ask people if they would like to hear about your work
c. introduce yourself, give some background about the work, describe the research question you are addressing, use the figures to explain your results, and end with your conclusions
III. Written presentations
A. Manuscripts
1. Manuscripts/Papers are how we communicate our results and conclusions with the rest of the research community.
2. Typically, a manuscript includes a Title, Abstract, Introduction, Materials and Methods, Results, Discussion, References, Figures and Tables.
a. Title: what the paper is about; most people decide whether to read a paper based on the title
b. Abstract: brief overview of the manuscript
c. Introduction: brief background of the research area that puts the work in a larger research context
d. Materials and Methods: description of the methodologies used in the work
e. Results: presentation of data in text and in Figures and Tables
f. Discussion: description of what the results mean in a larger context and the significance of the work
3. Typically as an undergraduate student, you might be asked to make a figure or write up the materials and methods that you used. Usually the PI, postdoctoral fellow or PhD student will write the first draft of the manuscript.
Resources
"How to give a great scientific talk"
https://www.nature.com/articles/d41586-018-07780-5
Designing effective scientific presentations: using PowerPoint and structuring your talk
https://www.youtube.com/watch?v=Hp7Id3Yb9XQ
Poster resources
https://www.sc.edu/about/offices_and_divisions/undergraduate_research/documents/poster_resources.pdf
Creating effective poster presentations
https://sites.google.com/ncsu.edu/effective-posters/home?authuser=0
Reference
Barker, K. (2005) At the Bench: A Laboratory Navigator
Remember, we had perfomed a DNA extraction from Arabidopsis leaves and then performed a PCR. We pick up with our PCR products ready for the next step. (Continued from Session 4!)
3. Restriction enzyme digestion (Appendix III)
a. Restriction enzymes are proteins that can cleave the DNA backbone at sequence-specific sites. They are commonly used in molecular biology for DNA cloning. Here, we will use them to distinguish two different DNA sequences.
b. The dCAPS procedure introduces a restriction enzyme site by using primers that contain one or more mismatches to the template DNA (Figure 1,2 from session 4). In this case, the ag-1F primer contains a mismatch (C underlined above) and shown at position 28 in the sequence below, compared to the template DNA. All PCR products will contain a C at this position. Because the ag-1 allele has a single nucleotide change (mutation) of the G at position 32 to an A (boxed below), an AflII site (5'-CTTAAG-3') is created in the PCR product amplified from ag-1 DNA but not in the PCR product amplified from WT DNA.
c. positions 28-33 of the AG sequence shown below for WT and ag-1:
WT: 5'-CTTAGG-3' ag-1: 5'-CTTAAG-3'
3'-GAATCC-5' 3'-GAATTC-5'
d. Digest PCR product with AflII. WT DNA will not be cut with AflII and will be 142 bp in length. ag-1 DNA will be cut with AflII and will be 114 bp in length (Figure 4, session 4).
4. Agarose gel electrophoresis (Appendix IV, see session 6)
a. To distinguish the 142 and 114 bp DNA pieces, we will run the digests on a NuSieve 3:1 agarose gel.
b. An example gel is shown above (Figure 8). Molecular weight standards were loaded in the leftmost lane. Lanes labeled ag-1 and WT are the controls and correspond to digests of PCR products that amplified DNA isolated from ag-1 homozygotes and wild-type plants, respectively. Unknowns 1, 2, 3, and 5 are heterozygous for the ag-1 mutation while unknown 4 is wild type.
c. We'll cover the basics of gel electrophoresis in session 5.
Figure 8. Example Gel of Resulting Restriction Enzyme Digest. Molecular weight standards were loaded in the leftmost lane. Lanes labeled ag-1 and WT are the controls and correspond to digests of PCR products that amplified DNA isolated from ag-1 homozygotes and wild-type plants, respectively. Unknowns 1, 2, 3, and 5 are heterozygous for the ag-1 mutation while unknown 4 is wild type.
A scientist from the Department of Biological Sciences here at USC will join us to discuss their journey to becoming the amazing scientist they are today. Please be courteous of our guests as they have given up part of their weekend to be present and showcase their path to their current career in STEM.