This page gives a number of simple programming exercises which will help you learn to write for loops, while statements, if-else statements, and other basic programming tasks. Most of the exercises are drawn from mathematics. Solutions can be written in any programming language. There are solutions to some of the exercises in a few selected languages, but use them only after you have made a good attempt to solve the exercises yourself. The point is to introduce a minimum amount of syntax for these languages and get you solving small programming problems. This page is not attempting to teach much about any particular programming language and deliberately avoids using specialized commands in favor of simpler but universal programming ideas.
This page was written by Craig L. Zirbel. If you find this page useful and would like to contribute additional exercises or solutions, perhaps in other languages, please contact me at zirbel@bgsu.edu.
The sample programs below generally run on a website called replit.com. REPL means read–eval–print loop; the idea is to make it easy for people to write and run programs. When you click a link from this page, at replit.com, you will need to click Show Files and then click main.py. If you want to edit a program, click the Fork Repl button; like a fork in the road, now you will be on your own path. Make changes and Run. To keep your replit.com programs for the future, click Sign up in the upper right, set up an account with replit.com, and then start making new repls and giving them the names I suggest. Note: the Run button only runs main.py, not other files you make in the same repl. When you start a new program, add a new repl.
To learn about Python and how to install it, see the Python page. To learn about Matlab and Octave, see the Matlab page. To learn about R, see the R page. For JavaScript, see the JavaScript page.
To learn other things like dynamic programming, see my main web page.
You can use letters and strings of letters (without spaces) to name variables. You can set their values and then change their values. Consider this example:
a = 5
b = 3
a = a + b
At first, a is set equal to 5 and b is set equal to 3. But then the value of a is changed. What is the new value of a?
Suppose we continue like this:
a = a + 2*b
a = a*b
What is the value of a after these five steps? Hint: the answer ends with 2.
When you want a program to do something a fixed number of times, use a for loop.
Write a program called hello_world that uses a for loop to print "Hello world!" to the screen 5 times, and then prints "I will work on programming every day." to the screen 8 times. Python | R
Write a program called sum_to_300 that uses a for loop to calculate the sum 1+2+3+...+300. The idea is to create a variable, s, that will store the current value of the sum. Set it to zero, then use a for loop that is indexed by another variable, i, which will run through the numbers 1, 2, 3, ... and add each of these to the current sum. After the for loop, print the value of the sum to the screen. Hint: the correct answer ends with 150. Solution: JavaScript | Matlab/Octave | Python | R
Write a program called sum_to_n which first defines a variable n, and then calculates the sum 1+2+3+...+n using a for loop. Set n equal to 1000, 10000, and 100000 to see what sum you get. You should only need to change the value of n, not change the logic of your program after that. Solution: Matlab/Octave | Python
Write a program called sum_squares to calculate the sum 12 + 22 +32 + ··· + 4002 using a for loop. Check: The sum of the digits in the final answer is 15. Note: You cannot use the notation i^2 in Python; use multiplication to calculate term i of the sum. Suggestion: Print each term that you are adding to the sum using a print statement inside the for loop to make sure you are adding the right thing.
Write a program called sum_products to calculate the sum 1*2+2*3+3*4+ ... + 249*250 using a for loop. Check: The correct answer ends with 250. Hint: Print the two numbers you are multiplying with a print statement in the for loop.
Write a program called factorial_10 to calculate 10! ("10 factorial"), which is defined to be 1*2*3*4*5*6*7*8*9*10. Use a for loop.
Write a program called factorial_n to calculate n! ("n factorial"), where n is a non-negative integer. Use a for loop. Your program should use the same logic for whatever value of n is set in the program. Note that 0! is defined to be 1.
Write a program called factorial_table to print out 0!, 1!, 2!, all the way up to 15!. Do this by writing a for loop over the value of n, and within that for loop, use the code you wrote earlier to compute n!. Then you will have one for loop inside of another for loop, which is called a nested for loop. Solution: Python | R
Challenge: Write a program called sum_taylor_series to calculate 1/0! + 1/1! + 1/2! + 1/3! + ... + 1/10!, which is an approximation of the number called e, which equals 2.71828182845904523536028747135266249775724709369995.... (it's an irrational number). (Note: 1/3! means 1 divided by 3!, not the factorial of the fraction 1/3.) Display the sum after each term is added. Use a for loop to create and sum the 11 terms in this sum. Within the for loop, use another for loop to calculate the factorial. Add more than 11 terms to get a better approximation. How many decimal places can you get and be confident that they are correct? Python
Challenge: Write a program called n_to_power_n to calculate nn by repeated multiplication in a for loop, not by using an exponent operator. That is, use the fact that nn = n*n*...*n, where there are n factors of n. Here, n is a positive integer.
Challenge: Write a program called sum_n_to_power_n to calculate the sum 11 + 22 + 33 + ... + 1010 without using an exponent operator. Do this using nested for loops. I suggest that you print out the values you are calculating for 11 , 22 ,33 , ... 1010 to make sure you are adding the right values. Check: The sum equals 10405071317.
Once you have decided about setting up an account on replit.com, please fill out a very short survey about yourself and your interest in programming. Survey link.
When you want a program to continue doing something as long as some condition is met, use a while loop.
Write a program called square_greater_than_1777 that will figure out the smallest number n whose square is greater than 1777, without using the square root function. The idea is to start a variable n at 1, check if n*n is greater than 1777, and if not, increase the value of n, check again, and continue. Hint: the square is 1849. Python | R
Write a program called sum_hits_one_million that will figure out how many terms in the sum 1+2+3+... it requires for the sum to exceed one million and what the value of the last term is. The idea is to create one variable that will store the current value of the sum, another variable that keeps track of what number you are adding to the sum, and yet another variable to count the number of terms. Use a while loop to add the current number to the sum repeatedly, and increase the count by 1 each time. The while loop should stop once the sum exceeds one million, then you print the value of the last term that was added to the sum and the value of the counter variable. Hint: the sum will be 1000405. Matlab/Octave | Python| R
The sum 1 + 1/2 + 1/3 + 1/4 + ... is called the harmonic series and is known to diverge to infinity; it gets bigger than every target number you have in mind. The first four terms sum to 2.08333333333. How many terms does it take for the sum to be greater than 15, and what is the value of the last term? Write a program called harmonic_series to find the answer. For R and Python, format the value of the sum with %0.30f to see lots of decimal places.
Write a program called Fibonacci that will find and print to the screen all Fibonacci numbers up to the first one that is greater than 10,000, one per line. The Fibonacci sequence is 1, 1, 2, 3, 5, 8, 13, ..., where each next number in the sequence is the sum of the previous two. Each line should look like this model: "Position 6 in the Fibonacci sequence is 8." This exercise differs from previous ones because here you need to keep track of both the "previous" and "current" Fibonacci number, and then update them over and over inside the while loop. Hint: 10946.
When you want a program to do different things based on the value of one or more variables, use an if or if-else statement.
Write a program called if_else_order which defines variables a, b, and c (you can use the values 12, 7, and 9 for now) and uses if-else statements and print statements to print them on the screen in increasing order. Your program should work if you redefine the values of a, b, and c so that they need to appear in a different order. Python | R
Write a program called sum_to_300_first_sums which will use a for loop to calculate the sum 1+2+3+...+300 and which will print the value of the current sum if that value is less than 40000, and which will print "The sum is over 40000" if not. Python
Write a program called sum_to_300_updates to calculate the sum 1+2+3+...+300. Display the total after every 20 terms by using an if statement to check if the current number of terms is a multiple of 20. The basic idea is this: calculate i/20 as a decimal and also calculate i/20 rounded to the nearest integer. If they are equal, then i is a multiple of 20, and then you print the sum. Different programming languages use different commands to round to the nearest integer. Python
The game "3N+1" goes like this. If N is odd, multiply it by 3 and add 1. If N is even, divide it by 2. Repeat until N equals 1, if ever. Every value of N that anyone has ever checked eventually leads to 1, but it is an open mathematical problem (known as the Collatz conjecture) whether EVERY value of N eventually leads to 1. You can read about it on Wikipedia. Before you start working on these programs, take a sheet of paper and write out what happens when you start with N=5, N=6, N=7, N=9, to get a feel for it. For example, starting with N=6, the next number is 3, then 10, then 5, then 16, 8, 4, 2, 1 and we are done.
Write a program called three_n that defines a starting value of N and plays the game "3N+1" and prints to the screen the values of N visited along the way. Use a while loop and stop when N=1. Investigate some starting values of N. N=27 is particularly interesting. Python.
Introduce an additional variable to count how many steps are needed to get to 1, and have the program print that counter after it arrives at N=1.
Introduce an additional variable to keep track of the maximum value of N before the program reaches N=1. Here is the logic: Before the while loop, set m=0. Within the while loop, if N > m, set m equal to N. JavaScript.
Above, we have used variables to store individual numbers. Sometimes, you have a bunch of numbers you want to store at once, in a particular order, and for that, a list is the right kind of variable to use. (In some programming languages, these are called vectors, in others, they are called one-dimensional arrays.) For example, my_list = [1,1,2,3,5,8] is a Python list and my_vector = c(1,1,2,3,5,8) is an R vector. The programs below will show different ways of making a list.
Write a program called graph_function which allows you to define and name a function and then graph its values. Python | R
Modify graph_function to define variables a and b for where the graph should start and end, and graph the function on n intervals as x runs from a to b. Plot x^2 as x runs from 1 to 10,000 using 1000 intervals.
Modify graph_function to graph y = x^2 for x running from 1 to 10,000 using a log-log plot. That is, the horizontal and vertical positions on the screen will be log(x) and log(y), respectively. Describe the graph you get.
Return to the program sum_to_300, and make a new program sum_to_300_graph. Now, store each sum, after 1 term, after 2 terms, etc. and graph the sum versus the number of terms. Here you know ahead of time how many entries the list will need to have, so you can pre-allocate the list. Python | R
Return to the program harmonic_series, where you are summing terms that get smaller and smaller and seeing how many terms it takes to get to a sum of 15. Write a program called harmonic_series_graph to graph the terms as they grow. You don't know ahead of time how many terms it will take, so you don't know how many (x,y) pairs you will need to plot, and with this particular series, it could take a lot of terms. It's dangerous to write code that requires a list or vector to grow in size with no clear idea how big it could get; this can run ridiculously slowly. There are two reasonable approaches.
Run the while loop to determine how many terms are needed, then pre-allocate the list or vector to store x and y values, then run a while loop or for loop to fill in the x and y values, then graph. But if the number of x and y values is more than a few thousand, better to use the second approach.
The sum goes from 0 to 15, and you can use that fact to limit the number of y values you plot, and thus the number of x values you plot. Pre-allocate term_counter and sum_values to have 16 values, and use lines like this to store the values:
term_counter[int(s)] = i
sum_values[int(s)] = s
This way, only 16 values will be stored in each list/vector. The graph won't be perfect, but the lists/vectors won't get too big and slow.
Write a program called banded_matrix that uses nested for loops to produce an n by n (square) matrix A with 0's down the main diagonal, 1's in the entries just above and below the main diagonal, 2's above and below that, etc. Solution in Matlab/Octave | JavaScript | Python | R. The matrix should look like this when n = 5:
0 1 2 3 4
1 0 1 2 3
2 1 0 1 2
3 2 1 0 1
4 3 2 1 0
The matrix should look like this when n = 8:
0 1 2 3 4 5 6 7
1 0 1 2 3 4 5 6
2 1 0 1 2 3 4 5
3 2 1 0 1 2 3 4
4 3 2 1 0 1 2 3
5 4 3 2 1 0 1 2
6 5 4 3 2 1 0 1
7 6 5 4 3 2 1 0
Write a program called banded_matrix_2 which creates an n by n matrix with 1 on the main diagonal, 1/2 above and below the main diagonal, 1/3 above and below that, and so on.
Write a program called almost_diagonal_matrix which creates an n by n matrix with 2 on the main diagonal, 1 on the diagonals above and below the main diagonal, and 0 everywhere else. Do this with nested for loops, and also do this with just one for loop.
Write a program called circulant_matrix which creates an n by n circulant matrix with 0 on the main diagonal, 1 above the main diagonal and in the lower left corner, 2 above the 1's, and so on, following the example below. In other words, the first row contains the numbers 0,1, 2, 3, 4, the next row contains the same numbers, but with 4 at the beginning, and so on.
0 1 2 3 4
4 0 1 2 3
3 4 0 1 2
2 3 4 0 1
1 2 3 4 0
Revise circulant_matrix so that it will repeat a given first row vector by cycling its values as above. Use first_row = [0 2 3 1 7 5 6 4], so that you can't just use arithmetic to fill in the values.
Up to this point, we have put all code into a single block of code. But some code would have been nice to have separated out, so it's easier to run it. Here we will simplify some of the code that was defined above. The idea is to mark the code as being part of a function and to tell what values need to be supplied to the function in order for it to do its job.
Above, when we wanted to calculate 10!, we wrote a for loop to calculate it. Now that we know how to write that for loop, it would be nice to set that aside, give it the name "factorial", and be able to just type "factorial(10)" to request the value of 10!. The syntax for doing this is different in different languages, but the idea is always the same. See the factorial_function examples: Python | R. Note that in Matlab, the factorial function is already defined.
Write a function called banded_matrix that creates and returns the banded matrix described above. The input to the function should be the number N, which is the number of rows and columns. Then, you can produce a 7x7 matrix simply by typing banded_matrix(7) in your main block of code. Solutions: Matlab/Octave. Note that, internally, matrix_maker calls the matrix A, but you are free to give it another name once matrix_maker returns it to you. You can store the matrix you get by typing Z = banded_matrix(7).
Change the program to allow two inputs, the first being the number of rows, the second being the number of columns. For example, you could then request banded_matrix_rectangular(5,7).
Change banded_matrix so that if only one input is given, it makes a square matrix.
Consider the numbers 4,9,13,2,5,6,4,11. It's easy enough to spot the maximum and minimum and to notice that the number 4 occurs twice. But if the list is very long, it's harder to do that by eye, and you would like a program to do that.
Write a program called list_max_min_count which will do all of these things without using built-in functions:
Find the maximum element of the list. Hint: make a new variable to keep track of the maximum number found so far. Don't use the built-in min function.
Find the minimum element of the list and tell at what position the minimum occurs. For example, The minimum value of 2 occurred at position 4 in the list. Don't use built in functions, write code to do this yourself.
Count how many elements of the list are strictly greater than 8
Sample program in Python.
Write a program called in_list to check if a given number is in a given list. You want to do this as quickly as possible; if you find the number in the list, you don't want to continue checking the rest of the list. Don't use built-in functions that find the minimum and where it occurs, rather do this yourself with more basic loops and if statements. Python
Write a program called unique_elements which will make a new list called unique_elements containing only the unique elements of the original list, called values. The list unique_elements will start out empty. You will iterate over the elements in values, and each time you encounter a new element, you'll add it to unique_elements. Note: in R, it's generally inefficient to make vectors which grow; we'll see more efficient solutions later. Python
Write a program called graph_shapes which will draw a rectangle with corners at the points (0,0), (1,0), (1,3), (0,3) using red lines. Organize the x-values into one list or vector, and the y-values into another; the plotting command will connect these points with lines.
Add to graph_shapes lines of code which plot 25 points around the circle with center at (1,3) and radius 2 and color green. Do not connect the points with lines. How? Calculate 25 values of the angle theta from 0 to 2*pi, then calculate and store x and y coordinates which are the coordinates of the center plus r*cos(theta) and r*sin(theta). Then plot the x and y values.
Now issue a second plot command to draw lines between the points, outlining the circle, with color black. If the lines cover over the points, issue the plot commands in the other order, so the points are drawn on top of the lines.
Draw a straight blue line between every pair of points; that will fill in many chords in the circle. Only plot each chord once. Save as .pdf, then open with Adobe Reader (not with a browser) and zoom in to see the detail.
Read the article https://www.quantamagazine.org/mathematicians-prove-ringels-graph-theory-conjecture-20200219/ to see animations with such complete graphs.
Write a program called numerical_integration which will do the following: Define a function f(x), for example, f(x) = x^2, define values of a and b to start and end the interval of integration, and define the number n of subdivisions to use. The program will compute the values of f(x) at n+1 evenly spaced points from a to b, then calculate and display, to 20 decimal places:
L, the left-hand endpoint rectangle area
R, the right-hand endpoint rectangle area
T, the trapezoid area, which is the average of L and R
min_sum, the rectangle area approximation you get by taking the minimum value between the left and right hand endpoint
max_sum, the rectangle area approximation you get by taking the maximum value between the left and right hand endpoint
S, Simpson's approximation of the area, provided that n is even; 1-4-2-4-...-2-4-1 pattern of coefficients. Make sure to check that n is even before computing Simpson's approximation
Elaborate on your work by using increasing values of n, until min_sum and max_sum agree to a pre-specified number of decimal places. Use a while loop to do that, and a reasonable strategy for increasing n.
For a fixed value of n, draw vertical lines under the curve and horizontal lines to indicate left-hand-endpoint rectangles. Color these red, while the graph of the function is black.
For a fixed value of n, draw diagonal lines to represent the trapezoidal approximation. Draw these in cyan.
The sample program sort_list defines a variable, values, which has the list of 10 US presidents. The program shows how to print the whole list, how to access individual elements in the list, how to get the length of the list, and how to compare two elements of the list. Your job will be to put the list in alphabetical order using an approach called a bubble sort. The steps are broken down for you. While most programming languages have built-in functions for sorting lists, it is very helpful for you to code a sorting program yourself. Starting programs: Python | R
Write a for loop that has i go from 1 to the end of the list, and which compares element i to element i+1, and if they are out of alphabetical order, replace element i with element i+1, and replace element i+1 with element i. You will need to use a new "temporary" variable to do this. Be careful not to compare the last element of the list with one beyond the end of the list. I'll refer to this for loop as being one pass through bubble sorting. Items "bubble up" to their appropriate place in the list.
Print the list after one pass of bubble sorting. It won't be completely sorted yet, but it should be better. Note how far the name "Wilson" moves in one pass of bubble sorting, and how far the name "Adams" moves.
Write another for loop which will do 10 passes through bubble sorting. That is, the new loop should be "outside" the first loop that you wrote. Print the whole list after each pass, so you can see how the names move.
Note that the list becomes completely sorted before 10 passes are completed. Your program can recognize this. Before each pass of bubble sorting, define a variable called swap_made and set it equal to 0. (That may seem a bit strange, but hold on.) If two elements of the list need to be swapped, then change the value of swap_made to 1. At the end of the for loop, print out a message saying whether or not a swap was made.
Whenever a swap is made, another pass of bubble sorting is called for, but if no swap is made, the list is in order. Instead of a for loop, use an outer while loop so that your program does passes of bubble sorting only if some pair of names was swapped on the previous pass. You can use the variable swap_made to keep track of this, and continue doing passes of bubble sorted "while swap_made == 1". You will need to define swap_made before the while loop, and set it equal to 1 so the while loop makes sure to run the for loop within it. However, just before the for loop, you will need to set swap_made back to 0. This can be confusing, take the time to think it through and to print messages to the screen to make clear how it works.
There are several commonly used approaches to sorting. You can see animations of how they work. That site also discusses the complexity of the algorithm, how many operations they take to run in the worst case. It also has pseudocode for each sorting approach. A good exercise would be to read the pseudocode and then write your own version in whatever language you are learning.
Write a program called generate_list_random to generate a list of 20,000 random numbers and store them in a list (or vector, or one-dimensional array). Python | R
The rest of this section supposes that you have written a sorting procedure.
Adapt your sorting procedure to sort the list of numbers.
Adapt your sorting procedure to count how many passes it takes through the list to get them all sorted. It should be less than 20,000. How much less than 20,000 is it?
Above, I suggested that you pass through the list from 1 to the end of the list. That can sweep one item all the way to the end of the list if that is what is required, but it takes many passes for an item to move from the end to the beginning of the list if it belongs there. Add code to your procedure that follows a "forward" pass with a "backward" pass through the list, from the end of the list to the beginning. Count both forward and backward passes toward the total number of passes. Is this better?
Write a for loop that will run 1000 times. Each time, within the for loop, generate a new list of 20,000 numbers, sort it, and add the number of passes to a list. When the for loop is done, print out the average value of the numbers in the list and also the minimum and the maximum. This gives you some idea of the variability in the runtime of a sorting procedure. Do this with and without a backward pass through the list.
Adapt your sorting procedure to record how many swaps it requires. As above, run the sorting procedure repeatedly and print out the minimum, the maximum, and the average number of swaps required.
To investigate the best and worst possible runtime, do the following tests:
Generate a vector of 20,000 values ordered from 1 to 20,000. How many passes does it take your sorting procedure to sort this vector? How many swaps are performed?
Generate a vector of 20,000 values ordered from 20,000 to 1. How many passes does it take your sorting procedure to sort this vector? How many swaps are performed?
Does a line of text contain a particular word or phrase? We store a line of text in a variable called a string, and we can use a for loop to search to see what characters and substrings it contains. This happens often in biology when searching DNA sequences, so we'll use examples like that.
Write a program called string_contents to count how many times the letter A occurs in the string TTAGGCGGCCACAGCGGTGGGGTTGCCTCCCGTACCCATCCCGAACACGGAAGATAAGCCCACCAGCGTTCCGGGGAGTACTGGAGTGCGCGAGCCTCTGGGAAACCCGGTTCGCCGCCACC. This is the sequence of the 5S ribosomal RNA from H. marismortui. Python | R
What are the positions at which the letter A occurs in the string? Accumulate those as a list or vector. Since the sequence itself is short and the list/vector cannot be longer than the original sequence, you can make a list or vector that grows longer.
Use one for loop and an if-else statement to count the number of A, C, G, and T letters in the string.
Store the letters A, C, G, T in a list/vector and use two for loops to count the number of A, C, G, T letters in the string.
Does the sequence GTTC occur in this string? How many times? Where does it start? Within the program string_contents, write a function called string_contains which has two inputs, the original sequence and a pattern. Use nested for loops to check sequence for pattern, letter by letter, and return all starting points of the sequence. For practice, in R, only use substr to retrieve one character at a time. In Python, check equality only one character at a time.
Above, you have seen a list of names and a list of numbers. Here, you will make a list of fractions, each of which has a numerator, a denominator, and a decimal value. That's a triple: (numerator, denominator, decimal value).
Learn about triples and other tuples by reading list_of_triples: Python | R
Write a program called fractions, which does the following things.
Use for loops to make all possible denominators from 1 to 20, and for each denominator, to make all possible numerators from 1 to denominator minus 1. Print these to the screen.
Organize each fraction into a triple: frac = (numerator,denominator,numerator/denominator)
Append these triples to a list of all fractions. It's OK if you store 1/2 and 2/4 and 3/6, even though they have the same decimal value.
Sort the fractions by decimal value.
At the top of your program, define a new variable called "unique" and set it equal to True. In the for loops, when unique is true, only store a fraction if it cannot be reduced to a lower form. Hint: To tell if fractions a/b and c/d are equal, cross multiply to see if a*d and b*c are equal.
Write a program called graph_vertices_edges which sets up an n by n matrix, starting with all 0's, and then for entry i,j, if i is not equal to j, with probability 1/n it fills in the value 1 in entry i,j and also in entry j,i, to represent there being an undirected edge between vertices i and j. Such a matrix is called an adjacency matrix, because it shows which vertices are adjacent to, or next to, each other.
Draw n points around a circle of radius 1 centered at the origin, to represent the n vertices, and draw a line between all pairs of vertices that have an edge. Label each vertex with text labels from 1 to n, but don't worry too much about exactly where the labels are located. Store the x,y pairs in a list of pairs called vertex_location.
The trouble with displaying the vertices around a circle is that it does not group the vertices together in a natural way. A solution is to move the vertices that are connected by an edge closer to one another, and the vertices that are not connected by an edge away from each other. Here is a scheme that may work. Loop over the n vertices using index i. Set current = vertex_location[i] to represent the current location of vertex i. Now loop over all vertices j not equal to i. Calculate the vector displ = vertex_location[j] - vertex_location[i]. The vector displ points from vertex i to vertex j. Calculate dist = length of displ. If there is an edge between i and j, add 0.1*(dist-1)*displ to current; when dist > 1, this will push vertex toward vertex j, when dist < 1, it will push them apart to, and when dist=1 it will leave them alone. If there is no edge between i and j and the distance between i and j is less than 2, add -0.1*displ to current. That will push the vertices apart. Once done looping over j, save current as new_vertex_location[i]. Once done looping over i, save new_vertex_location as vertex_location, graph the vertices, draw edges with lines, and repeat.
A dictionary is a very useful way to store data. Like a list, it's a way of storing and accessing a bunch of values all in one variable. Unlike a list, the keys do not need to be non-negative integers.
Given a list of numbers, keep track of how many times each number appears in the list in the program dictionary_from_list. Python.
Return to sorting a list of names. Use a dictionary to keep track of how many times each name gets moved in the list, then print this after the sorting procedure is done. Python
Write another program called multiple_three_N that runs the three_N process for N values going from 2 to 400. List to the screen the numbers of steps taken and the maximum values for each starting value of N.
Write a function that contains the while loop that plays 3N+1, so you can easily call it for different starting values of N.
Modify the function three_N to return a vector vals which lists the values visited, from N all the way to 1. For instance , when N = 5, vals = [5, 16, 8, 4, 2, 1].
It is interesting to make a graph of the maximum values as a function of the starting value of N. It is also interesting to plot the maximum value versus the number of steps. However, making graphs differs from one programming language to another. If you reach this point and need help, ask me to write some sample programs.
Keep track of how often each different maximum value is reached. A good way to do this in Python is to use a dictionary.
Keep track of how many times each number 1, 2, 3, ... is visited as N goes from 2 to 400.
One-page handout to lead students through the steps above, with Python code.
Flip a coin twenty times, keeping track of the individual outcomes (1 = heads, 0 = tails). (Use the random number generator in whatever language you are using.)
Record several things in this sequence:
1. the number of heads observed
2. the number of trials where the running count of heads exceeds the running count of tails
3. the number of switches in the sequence (a switch is where one changes from heads to tails or from tails to heads)
Repeat your simulation 1000 times and construct a frequency table of each outcome in part 1, 2, and 3.
In base 7, the digits 0 to 6 are used. A number like 125 means 1*7^2 + 2*7 + 5, just like base 10 but with 10's replaced by 7's. When you count in base 7, you start 0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 16, 20, etc. Write a program called base_7 that will count from 0 to 7^4 in base 7. The program should set up an array or a list of length 5, starting with [0,0,0,0,0] to represent the number 0, and repeatedly add 1 to the rightmost entry, carrying to the next digit to the left when adding 1 to a 6, and then stop when it reaches [1,0,0,0,0]. Print the array to the screen after each number.
A key task in biology is to find the best set of correspondences between two sequences, whether they are DNA sequences made of the letters A, C, G, and T, or RNA sequences made of A, C, G, and U, or protein sequences, made of the 20 letters that are used to designate amino acids. The idea is to list out the two sequences, one above the other, with aligned letters vertically aligned, and "gap characters" like the hyphen - to indicate where one sequence has a letter that the other does not. It is not allowed to change the order of the letters in the sequence to make an alignment. (However, when you want to align whole genomes, like human to mouse, then rearrangements are also necessary, but at a large scale.)
The most computationally efficient way to align sequences is to use a technique called dynamic programming. Print out and work through the activity below for an introduction to dynamic programming. Page 3 gives an example where the result of the activity is that you make an alignment between two DNA sequences.
Next, I highly recommend following the route of the optimal solution of the sequence alignment in the previous activity to rebuild the alignment. You can see the solution at this link. You can read more about dynamic programming at another site I made.
Now you can write your own program called sequence_alignment to do sequence alignment. Then you can check an example solution. Python.
Now, some further refinements.
Scoring inexact matches. For protein alignment, this is essential because there are 20 amino acids, and they are known to substitute for one another based on their sizes and chemical properties.
Scoring long runs of gaps with a smaller penalty than if the same number of gaps came in separate places.
[I should add more explanation here, but I don't have time right now. I should also improve the quality of the graphics in the example grid below.]
See an example grid with most scores filled in. In this example, the score for matching identical letters is +5, the score for starting a new gap is -2, and the score for extending a gap is 0.
See an example program to do sequence alignment with scoring for inexact matches (including BLOSUM62 scoring for protein sequence alignment) plus gap open and gap extension scoring. If you set the variable Local to be True, the program will produce a "local" alignment rather than a "global" alignment, which is better when aligning a short sequence to a longer sequence, for example when searching for a short sequence within a long sequence.
Write a function called generate_discrete which takes as input a vector p of non-negative numbers that sums to 1 and a number n, and returns a list/vector of numbers which are generated randomly with the probabilities in the vector p. Optionally, the function will take a third parameter which is a list of the same length as p which contains the numbers or letters that have each of the corresponding probabilities in p. For each random value being generated, generate a uniformly distributed random number between 0 and 1, then compare to the first element of p, then the sum of the first two elements of p, etc., and when it is less than the sum, you have found the correct value to output. Practice with the vector p=[0.4, 0.3, 0.2, 0.1] and n=100.
Write a function called my_sqrt which takes as input a non-negative number x and returns an approximation of the square root of x. If x > 1, define a=1 and b=x; if x < 1, define a=0 and b=1; if x=1 return 1. Using a while loop, compute the average (a+b)/2. Square this number. If the result is smaller than x, replace a with (a+b)/2. Otherwise replace b with (a+b)/2. Repeat this process until the difference between a and b is small. To avoid saying that the square root of 9 is 2.9999999, round the last value of a to an integer and see if a squared equals x; in that case, return the integer.
Given a sorted list and a new list element, use binary search to find where the element should be inserted. Use this idea to write a new sorting method.
Use a list to accumulate a list of prime numbers, starting with 2. For each number above that, check to see if it is divisible by the prime numbers accumulated so far. If not, the new number is prime. To go faster, you only need to check prime numbers whose square is less than the current number.
To generate a random ordering of the numbers from 1 to n, generate n random numbers and sort them, keeping track of the positions where the first random number ends up in the list, where the second one does, etc. This mapping of original positions to final positions is a random ordering of the numbers from 1 to n.
Make a list of numbers 1, 2, 3, ..., n. Choose one of these randomly and remove it from the list, then choose another randomly, etc. Keep the list of the numbers you removed; this will be a random ordering of the numbers from 1 to n.
Compare the two methods above. Which one runs faster?
Write a function called complementary_DNA which, given a character string of A, C, G, T, will return the complementary DNA string, replacing A with T, G with C, etc.
Write a function called DNA_to_RNA which will turn a DNA string into its complementary RNA string. A is replaced with U, G with C, C with G, T with A.
Write a function called my_string_split which, given a string and a specific character, return a list which is substrings of the original string from one instance of the specific character to the next. Of course, do this without using built-in functions to the extent possible.
Search a long string for a single character, return the indices where it appears
Search a long string for a pattern, return the indices where it appears; nice for biology students searching a genome
Search a long string for a pattern, split the string where it appears.
Read a text file line by line; write a text file
Run one of the algorithms you developed above for different size inputs. For example, sorting a list of n random numbers, for n=100, 1000, 10000, etc. Record the time your code takes to run, then plot the runtime versus the input size. Use a log-log plot to see how the runtime depends on n; often, it will be a polynomial power of n, and then the log-log plot will be a straight line. Compute the slope of that line to determine the exponent.
Given the adjacency matrix of an undirected graph (a symmetric n by n matrix with entries 0 and 1, 1 when there is an edge between two vertices), find all connected components of the graph and return them in decreasing order of size.
SVG is an image file format, short for Scalable Vector Graphics. It tells what text and geometric objects should appear in the image, and then a web browser or other program can render those things, in a way that has unlimited resolution. Write a program that generates a figure by writing text that can be stored as an SVG file, or that can be pasted into a web page. See a tutorial on SVG.
Generate a random graph. Put the vertices of the graph around the a circle, and draw edges between any vertices that have an edge. Write an SVG file that displays the graph.
Given a number n of data points and a dimension d, generate n data points in d dimensions (uniformly is OK, but better to first generate c "center" points and then n/c clusters of data around these center points).
Calculate a symmetric matrix D of Euclidean distances between them, and then implement hierarchical clustering to successively group the n points into n-1 clusters, then n-2 clusters, etc. down to 1 cluster. Use single linkage first, then also code average and complete linkage. Analyze the runtime as a function of n.
Implement k-means clustering for the dataset you generated.
Organize functions with, import or source from other files
I wrote 16 methods to generate sequences of H and T with various different probabilities and statistical dependencies. Each method generated one sequence, but the methods and the sequences are numbered differently. Can you match each sequence to the method that generated it? A 17th sequence that where the characters are independent and identically distributed is also included.
Write a program to read each of the 17 sequences of H and T characters from: https://github.com/clzirbel/Random_Processes/tree/master/R/head_tail_sequences
Write code to analyze each HT sequence to identify the statistical dependence in the sequence, if any, or the change in H and T probability. One idea is to convert H to +1 and T to -1 and graph the running sum of these +1/-1 values. Different sequences will have very different graphs. If the letters in the sequence are independent and identically distributed, the graph will be a random walk.
Calculate the number of (H,H), (H,T), (T,H), (T,T) digrams. One can count trigrams and higher order ones.
Compute a 2x2 matrix of one-step transition probabilities, like in a Markov chain.
Download the 17 sequences and compress them with a file compression program. The more random the sequence, the harder it is to compress. The folder contains two examples of very simple compression methods.
Try to match the sequences to the 16 generation methods in generate.R; the numbers are scrambled between generation methods and sequence labels!
Analyze a long string of characters for statistical dependence
Generate a long string of characters with a given statistical dependence
Make histograms, say 100 histograms from 100 columns of data, save the files with names generated in the program
Given two matrices, check their dimensions to see if they can be multiplied. If so, multiply. If not, notify the user.
Calculate monthly payments and present value of all future payments. C
As soon as you can, find something that you personally want to code, something that solves a problem you have. Then you will have your own goals, and find your own solutions, and then think of additional goals, additional solutions, and off you go. Good luck!
There are many, many places online to learn to program and to find programming exercises. I'll mention some that have been mentioned to me. I'm happy to add more!
To learn R syntax, you can install R and then use the swirl package to learn R syntax interactively.