Assignment 08

Due: Monday, April 6, 2015 at noon 100 points

Instructions: You know how to submit; do it in the usual way. You will submit multiple files for this programming project. The cssubmit script will work as usual, picking up all .cpp and .h files in the current directory. So, make sure you have created a separate directory for your hw 8 program. (Do not create sub-directories in the hw8 directory.)

Also, for this assignment, you may work in a team of two persons. It is assumed that you both will code and that not one of you will sponge off the other. Both teammates will contribute to both code and documentation. Submit under only one name, but make sure that both teammates' names are in the comment headers. Work only with someone who has the same instructor. We encourage you to work together with someone; this is not an easy assignment!

Background: The unfortunate aspect of the science of forensics is that the forensics workers must sometimes work with dead, decaying and decomposed remains. Eeeeeyuck! (Perhaps this is the reason you are studying engineering? computer science? chemistry? ) It is common that the bodies of the deceased individuals have to be identified at times when there isn't much information to go on.....like a wallet with a driver's license. The identification process relies on various technologies, and this assignment will address some of those. One of the more recent developments in that area is DNA testing. If a sample of DNA can be extracted from a cadaver and be matched with a known sample (a sample of DNA known to belong to a missing person), then the identification is made. Likewise, if dental information is obtained from a victim and matched with dental records of a known missing person, then we can assume that we have identified the victim.

I am fully aware of the delicate nature of this topic. Since some of you might be "grossed out"1 by discussing human remains, I will couch this program in another culture.

Foreground: In the past two years, 5 cattle have gone missing and foul play is suspected in the bovine community! Now, one body has been found (see picture at right) and the job is to try to see if there is sufficient evidence to match the identity with one of those missing cows. The body is very badly decomposed past the point that gender can help us, so that criteria will not help. Thus, you're going to run two tests:

first test: A snippet of DNA was obtained from the body. See if you can match it with a piece of DNA from each of the five people ...uh...cows that are missing. Now, since we don't have real data, your program will generate the data at random1. First generate the 5 samples of DNA representing the 5 missing cows. Each sample is to be a string of 128 characters that are 'C's, 'T's, 'G's and 'A's only. C's make up 30% of the chars, T's are 20% of the chars, G's are 15% of the chars, and the rest are A's. Then generate the sample from the dead body. This sample is to be 16 characters also made up of C's, T's, G's and A's, but also '-'s. The '-' represents a missing character in the DNA string due to the poor quality of the sample. Follow the same percentage of assignments in this string as in the other strings. Make it a 2% chance that any character is missing (a '-'). Now, if 90% or more of the characters in the body DNA matches exactly any substring of 16 characters of the missing cows' DNA, then we consider it a matching identification. A match cannot include a '-'. If more than one match is found, take the one with the higher percentage of matching characters.

second test: IF the first test doesn't yield a match, you are going to have to resort to another test: dental test. Your program is to generate a string of 32 characters to represent the teeth of the 5 missing cows, and then a string of 32 characters to represent

the teeth of the deceased cow. The characters to include in the arrays are shown in the table. On average, 50% of the teeth are N, and 20% are W; the remainder are equally probable A, G, and L. NOTE: A match of dental records is a 100% match; all teeth have to match.

Specifications: Write your program to implement the above description in Foreground. Have your program generate the DNA and dental records (null-terminated character arrays) for the missing cows, then loop the generation of a dead cow's data thrice (as if you had three dead bodies) and see if you get an identification with the data generated for the missing cows. In each case (for each of the three dead cows in the loop), you should output any failure to match, and any match for either test. If you get a match in any test, output that a match is found and announce who the match is for, and the matching 16-char array and 128-char sample. Who? Yeah, who. The 5 missing cows you generated the DNA and dental records for have names. You make them up. Here are some ideas: Betsy, Old_Jumper, Bossy. I'm sure you can come up with some interesting names. And how are you going to keep track of this information? Think about how you gather information into one object.

When you submit: Run the program. You should have just one set of 5 missing cows DNAs and dental records (all ntcas, or c-strings), and 3 such sets for the three dead cows to be identified (if possible).

Note: All the arrays of characters in this assignment are to be null-terminated character arrays (a.k.a. c-strings). You are not to use the standard string class, except for the names of the cows if you want. Also, as stated at the beginning of this assignment, you may work in a team of two.

As always, if you have any questions, be sure to ask.

1Scientific, very scientific.