We're wrapping up the research for our project and preparing to begin drafting our final presentation. It will be hard to condense a year's worth of research into a single 20 minute presentation, but I'm hopeful we can lean on the presentations we have made so far as a guide to draft this cumulative presentation.
For the past week we've been working on our final presentation. We've decided on which objects to focus on, and have chosen two objects that allow us to discuss a wide segment of what we've studied this year. Tomorrow, we'll be practicing our presentation in front of others in MARC, so we need to get it done today.
-
I just updated my data table with the new data published from Gaia, and I now have 16 false positive candidates to analyze. I have been making progress going through each one, and have confirmed one false positive for sure so far. I am still looking for any additional patterns that may connect to false positives in general, but I am confident in my analysis of this specific set.
Again, I have continued making progress analyzing the 16 Kepler objects of interest that I have identified. I have analyzed 8 so far and had one of my False Positive analyses verified by my mentor. I also conducted a quarter progress presentation where I presented my research so far. Going through and explaining my research and the background for it is always a challenge, but I have observed that I have definitely gotten better at it each time I have to do it. I am confident in my research process, and given that I have already made one verified False Positive classification, I know that my research will produce worthwhile results.
A lot has happened since the last time I updated this. We are no longer able to come to school in person, so it's been very different continuing with this project from home. I have been meeting continuously with my mentor and further refining my project as I approach the finish line. With my individual research that I was conducting before, I identified several patterns that I have further explored with my mentor and the other students in my group. For my final project, I aim to write up the findings from the patterns we identified with my group members in a final paper.
In the next few weeks, I will fully dive in to my research. I plan to continue learning as much from my mentor as I can while beginning my own research. To begin, I am going to go through the main Kepler KOI and FPWG parameters to see if anything stands out to me. If I do discover any false-positive patterns, I will of course dive in to that. If I cannot discover any patterns with obvious parameters, I will begin going through parameter by parameter until I do. I anticipate discovering a clear pattern that I can dive into within the next several weeks. If this ends up not being achievable, I will have to rethink my methodology and consult with my mentor as to how to improve my process.
I have continued learning about the various aspects of exoplanets, the Kepler Mission, and the Kepler KOI data. I have identified one possible KOI to focus on with three possible others as well. When I get back from Winter Break, I will began analyzing the data sheets for these KOIs.
I analyzed the data sheet for the first of the 4 KOIs I have identified. These 4 KOIs are all currently planet candidates in the Kepler database system but I am reasonably confident that there is a good chance that they are not planets due to their size. Right now, I am analyzing their data sheets and trying to find other distinguishing factors that could point to their status as non-planets.
Over the summer I spent a lot of time thinking about my project, and I came to the understanding that I need to change my overall research question and project. I still believe that my original project has merit, but I doubt that I have the time, resources, and knowledge to effectively carry it out in the next year. Because of this, I spent much of my time over the summer reading journal articles in various other areas I could move into in machine learning. One area that I kept coming back to is computer image generation, specifically using GANs. I am now on the verge of coming up with a project idea that will be more realistic for my limited experience and resources. By using a pretrained GAN, I will be able to sidestep my limited computing power. By the end of this month, my aim is to have finalized my idea and most of the steps I will need to take to realize it. I also aim to reach out to outside scientists for help with this project.
I'm shifting my project towards false-positives in exoplanet transit identification with the Kepler Space Telescope. This is fairly new, so I am still learning and figuring everything out, but I am confident that this will be a good project to pivot towards. Once I am more familiar with the Kepler data structure, I will be sure to update this website with my new project information.
To be honest, I didn't really have any key takeaways from the study design workshop. My experimental method is fairly clear to me, at least at the beginning. The Kepler mission already has a clear standard operating procedure for identifying exoplanets and labeling false positives. For my project in particular, looking at correlations between various datapoints from the Kepler Objects of Interest dataset and false positive exoplanets, my experimental plan is also clear, at least for now.
For my end-of-year project this year I will be building a residual UNET for semantic image segmentation. I am using the COCO dataset to train and evaluate the model. I chose this project because it forms the basis for my final MARC project. If I want to be able to look at an image and determine what it is based on its individual components, I need to be able to reliably figure out what those components are. Custom object detection will be the perfect way to accomplish this because it allows me to fine tune my model to detect the certain types of objects that matter to me. For example, if the end goal is to correctly classify a table, I first need to be able to identify a large flat surface, and 3 or more table legs supporting it. Then, based on those data points, I can determine what the overall structure represents. Based on the results from this preliminary project, I will adjust the scope and methods of my project next year.
After two weeks of work, I've decided to switch the dataset for my final project over from COCO to Camvid. This decision is based on two main factors. One, I just don't have enough time to do this project with COCO, especially not the full dataset. Even with the NVIDIA v100 I've been using, it would take from 6-12 hours to train each batch of epochs. This would add up fast, especially because I still have to experiment and figure out the best hyperparameters and numbers of epochs in the different stages. Second, I've just been having a lot of problems getting COCO to work with FastAI. I found a way to translate the COCO masks to FastAI masks, and it seemed to be working. However, I trained for 6 hours only to be greeted with an error message about different tensor lengths. I simply don't have enough time left to keep troubleshooting these problems, and I need to work with something that I know will work. Camvid fits this, because its both already included in the standard FastAI datasets, and its small enough for me to have time to train it. I know I could use less categories of the COCO objects, but that will take time to convert again, and I want to finish my data collection in the next day or so. That's all for now, hopefully my project goes more smoothly with this minor switch.
Over Spring Break, I have two main objectives. One, to finish the FastAI 2019 course once and for all. I've set up a specific schedule to get this done that I plan to stick to. Two, I want to read up more on Faster-RCNN and specific implementations in PyTorch. If I have time, I want to begin writing my own implementation. That's all there is to say for now, I'll just have to see how well I can stick to my plan.
This week, I spent most of my energy building a CNN for CIFAR-100. I was able to get to an accuracy of 50% with relatively low loss on both my training and validation sets. I was fairly proud of this result, because of how I was able to go from my initial model with a 10% accuracy all the way up to 50%. I could not get a higher accuracy with more epochs or a changed learning rate without overfitting. I was able to substantially increase my accuracy by training the lower layers at a lower learning rate. With time, and more experience from the later FastAI lessons, I hope to be able to return and improve on my current model in the future.
For my final project this year I've decided to build the first step of my final project: an object detection model. Right now, I'm planning to implement the Faster-RCNN model in PyTorch with the FastAI library, and I think this will work fairly well.
I also decided to start over with the FastAI course and take the 2019 version. I already like it much better than the 2018 version, because they improved on almost everything about it. I already understand things that I had trouble understanding before the 2019 course. For these reasons, I think this was the right choice. My plan is to try to do 2 or 3 lessons a week while still practicing on my own. I think this is definitely achievable. With this time-frame, I'll be done with the course by Spring Break, so then I can start to focus more on the specifics of object detection. I still need to decide what kind of objects I want to be able to detect.
Now that my lesson's over, the next step is just to keep working on the FastAI course. With 4 lessons left, I should be able to finish it before Spring Break. I don't know if I'm going to watch all of them, because not all of the subjects pertain to my project, but I probably will. I'm also still figuring out the FastAI 1.0 library. I finally got my first original CNN to work today in class which was pretty exciting. I was able to get an accuracy of 98.6% with only a single epoch of training which was surprising. My next step with that test is to use changing learning rates. I think I'm in a good place to start work on my actual main project after Spring Break.
Everything went pretty much according to plan. The only problem was that I noticed halfway through that I forgot to make a specific slide going over my project in particular and how it addressed the problems I was describing. I know my lesson was probably pretty hard to understand, but I think everyone walked out of it with a basic understanding of what I was trying to describe. I think that's about all I could hope for because of how difficult it is to explain to someone with no base knowledge in the subject.
I finished my lesson plan for my class on neural networks and deep learning. Deciding on what to teach and how to do it made me realize how much I've already learned and how well I actually understand my research area even though I often don't feel like it. I was planning to run a test neural network in class, but recent updates to the FastAI library combined with some technical problems of my own didn't give me enough time to get that sorted out. I actually think my presentation will be stronger without it, because I'll have more time to explain the things that actually matter rather than just showing a blank terminal window with 1 or 2 print statements. I'm feeling fairly confidant for my presentation tomorrow and I hope I'm able to explain everything as well as I think I can. I know it will probably be a little confusing in the beginning, but I've planned for that. I won't try to go into too much depth, because I do not have enough time. I plan to go over the Universal Approximation Theorem to explain a bit of the practicality of neural networks and deep learning in particular. Then I'll probably just go into gradient descent and its applications. I think that should be enough depth on the theoretical/low level ideas of neural networks, and I know I can explain it well. Instead of the test neural network, I'm going to have everyone build a dataset together of whatever they want, and then I'll build the neural network at home. In the future, I might show it quick in class.
I didn't end up having time to do the 3rd FastAI lesson last weekend. I have to run a 25 minute class during MARC, and that's taken up all of my time so I never actually get to work on my project. My computer's also semi-broken, so I won't be able to work on it until I get that fixed. I need to get that fixed before I can do my class, because I'm planning to demonstrate running a simple convolutional neural network. Hopefully in a week or so once this project is over I'll be able to start working on my actual project again but I'm worried I'll have forgotten a lot of details from the course by then.
I've continued to stay on track to finish this course on time. I'll do the third lesson tomorrow. I also came up with a few alternative areas I could expand into if my current project doesn't pan out, like generative adversarial networks. I'm also going to need to decide eventually whether I'm going to stick with PyTorch and the FastAI library, go to pure PyTorch, or Tensorflow. I shouldn't need to decide this soon.
I’m completing the second FastAI lesson as this goes out, so I’m staying on track to finish the course on time. It was hard to find the time this week to do it earlier, but at least I’m doing it now. I also revised my project proposal based on the new knowledge I’ve gained. I really don’t have anything else to say just because my weekly goals are so simple: just to finish one lesson.
I finished the first FastAI lesson and now have a much better idea of how the rest of my project is going to be accomplished. I’m going to keep completing one lesson a week. I also did some experimenting on my own with different small test datasets following the guidelines of the first lesson. My confidence in this project is growing the more I learn, so I do think I will be able to make a sizeable amount of progress on it during the year and a half I have left. That’s all for this week; I should have more to say next week after I finish a more sizeable lesson.