Big Data - datasets or compilations of data so large and/or complex that traditional data processing algorithms can't process them (ex. data compilation of every google search query)
Crowdsourced Data - the process of obtaining inputs, opinions, or data in general from a large group of people using the internet, physical surveys, or other means. (ex. Mrs.Lane sending out a google form to get student's opinion on the next spirit week)
Citizen Science - similar to crowdsourcing, citizen science is scientific research or analysis done completely or partially by distributed individuals, who are not always scientists, by contributing data to research using their own devices or conducting controlled experimentation or data collection (ex. Globe at Night database project, where individuals collected light data at night at their residence to determine effects of light pollution)
Machine Learning - subset of artificial intelligence: computer systems/algorithms that use preset "training data" to learn, adapt, and modify their algorithms for a specific task. Machine learning algorithms analyze and draw patterns and inferences from data they are given to extract and understand the specific data better and refine themselves with more and more data. (ex. during cancer tumor analysis, machine learning algorithms are given photos and told if a tumor is present or not, over many sets of training data, it can determine with high accuracy if a cancer tumor is present by looking at the photo)
Malware - software intended to damage a computing system, take partial control of its operation, or steal private information. (ex. keyloggers, viruses, IP stealers)
Data Bias - data that does not accurately reflective of the common real-world distribution. Can be caused by things like collection algorithm error, sampling bias, or improper randomization. (ex. Amazon biased facial recognition technology didn't properly identify people with darker skin, due to a lack of darker-skinned individuals in the training data)
Computing Innovation - an innovation that includes a computer or computer code as a central part of its functionality (ex. 5G Cellular/Wireless, the internet, VR and AR)
Keylogging - the use of software or programs that track or record every keystroke made by the user/computer to gain access to passwords or personal information. (ex. a hacker could track key presses when the user in on a bank account website to steal their username and password)
Phishing - a type of malicious attack where the attacker attempts to disguise malware or viruses and trick the user into downloading malicious programs or revealing private information (passwords, social security), by using emails or other forms of communication. (ex. a fake email from someone pretending to be "your bank", asking you to give you your username and password to your bank account to fix an "attack")
Virus (Computer) - A type of malware that is designed to spread between connected devices which can damage the device, control its functions, or steal its data. Computer viruses are similar to real life viruses given that they infect their host and spread from host to host. (ex. WannaCry ransomware virus attacks that exploited a windows bug and affected 150 countries.)
Symmetric Key Encryption - A type of encryption that uses a single key for both encryption and decryption. This makes it a lot riskier than other types of encryption as the key needs to be kept secret from any malicious third-parties. (ex. ZIP files use symmetric key encryption by using a key to compress and encrypt the file and the same key to decrypt and unzip the file.
Encryption - the process of encoding messages or information to keep them secret, making sure only authorized parties can read them. Usually uses algorithms or ciphers to encrypt the data. (ex. Apple encrypts your username and password information to make sure hackers can't steal it when you log in)
Decryption - A process that reverses encryption, taking the message, using a specific algorithm or key, and converting the message into its original form. (ex. once Apple gets the encrypted login information, it uses its own algorithm to decrypt and make sure that it is correct and that you can log in.)
Public Key Encryption - Uses a public key and private key for encryption and decryption. A receiver has a public key, which anyone can view and use and a private key, which no one knows except for them. Any sender can encrypt their message with the receiver's public key, but only the receiver can decrypt and read the message using their private key.
You can think of it as the receiver having a key to a lock (the private key) and a lock (the public key). The receiver gives everyone the lock, so if they want to send a message, they can use the receiver's lock to lock it without any malicious third-parties being able to unlock it. However, only the receiver has the key to the lock, so only they can unlock the lock and see the message.
Public Key Encryption is good for two reasons:
Anyone can send the reciver a message with confidence that only the receiver can open it
If the reciver sends out a message using their private key to encrypt it, everyone can be sure it is from them because only the public key can decrypt that specific private key
Unit 9 Reflection -
Google Trends Activity
The google trends activity was a really interesting experience as I got to see how "big data" is actually used. Due to the sheer amount of google searches (5.6 billion daily approximately), I found it really cool that google actually was able to compile data from many years and create graphs with it. I also learned how big data could be useful, as our group was able to find many interesting relationships, such as the search "Unblocked Games for School" spiking up during the beginning of school, fall during Winter Break, and spike up again when we came back. The google trends activity taught me a lot about the sheer scale of big data, the special ways people represent it, and how we can use big data. I also just thought google trends in general was really cool and fun to use and look for cool patterns you never would've expected.
Measure of America Activity
Similar to the Google Trends activity, the Measure of America activity showed the sheer amount of data that is really collected and the innovative ways that people use to make it readable (in this case an infographic map of the United States). But most importantly I think I learned how big data is really important for transparency. Anyone can view data on their state and county, to see problems or things that need change, as well as if the government is on its promises and how a place is doing compared to the places around it. Me and my partner got to explore and see the disparity between different places around the country, and also see and create our own slogan to advocate for an issue that we found using the data. We ended up deciding to focus on the lack of education in southern Texas as we saw that a lot of the counties in that area had a lot lower of an education percentage compared to the northern areas around it. The Measure of America activity taught me a lot about the ways data can be displayed and sorted for ease of use, and more importantly, how big data is important for transparency with people. I personally learned a lot of Virginia, and our education standards and wealth compared to the rest of the United States.
Machine Learning Slides and Code.org
I really liked the machine learning lesson and especially the code.org fish machine learning module as it showed the capabilities of the algorithms we have today and outlined the future of how our society could work. I found it really cool how, like humans, algorithms could learn and practice using multiple prelabeled data sets. It was also cool to see that as the program was trained with more and more data, the better it was at finding fish and trash, It was also cool to see how it was able to identify the factors it determined and used to see if the object was fish or trash, such as shape, color, or size. Another important lesson I learned during the lesson was that even though the algorithm is very good at determineing bias from the data we give it, human bias and training data bias can greatly influence the algorithm and give it bias as well. For example, when we were training the algorithm to recognize "weird" fish, human bias was involved as our idea of "weird" might have been very different from someone else's. In the TedED video, we also learned how real-world machine learning can be biased, as for example the facial recognition algorithm were not given enough samples of darker colored individuals and couldn't identify them properly. I learned a lot about what machine learning is, how the algorithms learn, and also how bias can be introduced and coded into the algorithms and how we can prevent it from happening. I personally though machine learning was really cool, and saw how it could be one of the major innovation for the world in the future, as machine learning could replace a lot of the things that normal people do.
Unit 10 Reflection -
Private Policies Investigation
The private policies investigation was really eye opening for me as I got to see actually how much data companies really collect on you and how it actually is used. While most people normally just check "agree" and don't read the private policy, I realized that reading and understanding a company's private policy is really important to protect yourself and make sure things you don't want to get spread around don't get spread around. I researched the Amazon private policy and found that they keep a lot of your search information, even down to the time you spend on each product page, scroll speed, the product filters you use, and the price range of products you look at. I also learned a lot about how Amazon uses your data, in some cases distributing it to other companies for advertising or using it to optimize the products you see. Finally, I realized that a majority of companies actually allow you to determine what data you want collected, and if it can be distributed to other third - parties. The private policy investigation taught me a lot about data that companies collect on you, how your data is used, how your information can be distributed, and what you can do to limit what companies can track.
Security Risks slides and Class Activity
The security risks lesson and activity taught me about how dangerous the internet can be if you aren't careful and how cyberattacks are becoming more and more common. One of the things I remember the most about that lesson was the bubble infographic, which showed how over the years, the amount of security breaches has increased exponentialy. I also learned about a lot of the different tools that hackers use to infiltrate your devices, and how they are a lot more different than the "NIGERIAN PRINCE WANTS TO GIVE YOU MONEY" ads that we associate with viruses and attacks. I research phishing, and I learned that the scam emails are much more disguised and hard to see that the nigerian prince example I just talked about. When I took the google phishing quiz, I got a 2 out of 6, which was really surprising that the emails that looked so legitimate were actually scams or viruses. I learned that many scam emails use gmail.com for their domain, and to make sure to check for grammatical mistakes, and always be careful of downloads that are attached to emails before opening them. The security risks lesson taught me about how prominent cyberattacks really are, what the different types of breaches are, and different ways to prevent them.
Protecting Passwords Lesson
This was probably one of my favorite lessons for the unit. When we went over how the passwords we use are not strong enough and different ways to increase security, I found it very applicable to myself. For example, I learned the passwords I use are actually pretty weak, and that I used a lot of the same passwords for a lot of different things. I only had 2fa and 3fa enabled for one or two sites, that were not even that important. After the lesson, I ended up changing my passwords for differnet websites, using a pattern I created for each one. I also enabled 2fa for a lot of my important accounts, such as my gmail, bank account, college board, and social media websites. The lesson was very helpful for me personally. I also learned about what cyphers are, and had a great time decoding the encrypted text. For Science Olympiad, I actually participated in the codebusters event, where contestants have to decode cyphertext, and I had a lot of fun doing this activity. Finally, I also learned about symmetric and public key encryption, and I thought it was really cool how people were able to figure out a method that information could be sent using a public key that anyone knows, but a private key no one knows. The demo with the marbles was also really fun and helped me understand a lot more about how my information is kept safe while sending across the internet. I learned a lot about password security, cyphers, and encryptions methods that we use in the internet today.