The data that we used was from NASA's Exoplanet Archive. The dataset included data for over 5000 exoplanets that have been discovered between 1992 and 2023. Each data point included information about the exoplanet itself, its discovery, and its system. Because of the nature of the data, many of the data points had missing fields which had to be considered.
The data set can be found at: https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=PS
While there was limited sketching involved in this assignment, I did a few small sketches of the basic charts, and I also sketched the UI with some notes before I started coding anything. From my User Interface class last semester I know the value that sketching can bring to a project. I often struggle with knowing where I want to put everything when I start coding a webpage, so sketching helps me figure that out ahead of time rather than when I sit down at the computer.
For the bar charts and histogram, all of my sketches were fairly similar. I added a few notes onto these charts for ways that I could add other additional information to the visualizations.
I didn't follow through on all of the notes. Originally I wanted to display either the system, year, or discovery facility as colors in some of the bar charts. However, there were far too many of each of those parameters to separate them.
Others were helpful, such as the split bar chart for habitability. I decided to break the non-habitability by whether the stars were under the habitability threshold or over that threshold.
The sketches above were for the non-bar related charts that we were tasked with making. At first, I thought that the line chart for discovery over time would be cumulative. However, after I thought about it a bit more, I realized that what I really cared about was the number that were discovered in each year, rather than the total. I also ended up changing the scatter plot. At first I wanted to use radius of the circles in the plot to show another piece of information, but I ended up scrapping that idea.
Finally, I sketched out the UI to determine how I wanted everything to be laid out on the web page. The sketch on the right was my original UI sketch. I listed out each chart and added labels to some to determine which charts I thought might be related to each other. Additionally, this was where I decided to make the discovery over time non-cumulative. As I noted in the top right of the sketch, I forgot to add a chart (oops).
In the middle I added some ideas for stretch goals to add to my UI if I had time. I ended up not doing any because of time constraints, but I would have loved to add the quality of life improvements I listed there. As I used my final product I realized how helpful all of those features would have been with my visualization.
My final sketch is the UI on the left of the sketch. I added the missing chart and reordered the layout to better fit the classifications that I decided on in my first UI sketch. I ended up following this sketch to guide the layout of my page and I found it to be very usable.
All of the non-split bar charts look and function very similarly. Each has a y axis value of the number of exoplanets that fit into each group. The charts differ in their x axis, which include: The number of stars in an exoplanet's system, the number of planets in their system, the star type of the system's star, and the discovery method of the exoplanet.
The titles of the star type and discovery method have links to pages that inform the user for what each type of star means and what each discovery method entails.
Every bar chart has a tool tip that gives the exact number of exoplanets that fits into the given group, since that is not apparent by just looking at each chart. Additionally, if the user clicks on a bar, the rest of the visualizations update with data that has been filtered to include only those exoplanets that fit into the given category.
The split bar chart shows whether each exoplanet is habitable or non-habitable based on their star type and their orbit radius. A planet is "Too Cold" to be habitable if it is below the lower bound of a certain threshold based on its star type and "Too Hot" if it is above the upper bound of that threshold.
I decided to scale the y-axis logarithmically because the number of values with no star type (and therefore unknown habitability) was so much higher than any of the other values. I decided that no-one would be focusing on the blank values in this case. Additionally, I placed this chart next to the star type chart so that the user can see how the number of blank values compares to the value of any of the other star types, which should mitigate some of the confusion.
This visualization also has a tooltip for each bar which shows the count of each bar. Unfortunately I was unable to get the filtering working on this chart for either clicking on this chart or when other charts are selected. I believe this was because of the way I split each data set into sub-groups in my code.
The histogram on my page includes 10 bins that show how far an exoplanet is from Earth. I don't think that this visualization is particularly useful with the number of bins that I currently have. Almost all of the exoplanets are in the first bin. However, the visualization becomes much more useful when the data is filtered. With a smaller range of distances there is better distribution of the data points.
The bars in the histogram have a tooltip on them that shows the bin that is being hovered as well as the exact count of data points in that bin.
The line chart on my page shows the number of exoplanets that have been discovered between 1992, the earliest point in the data, and 2023, the latest. As I mentioned before I originally wanted this chart to be cumulative, but decided that non-cumulative fit better with the information I wanted to convey. The line chart filters when the bar chart bars are selected, but I was unable to get tooltips working for this chart.
My scatter plot showed the radius and mass of each exoplanet. The green dots are all of the exoplanets in the dataset, and the red dots are the planets in our solar system. I scaled both the mass and radius logarithmically because there were several data points that forced the majority of the data to appear almost on top of the x-axis. I found this chart super interesting because there is a clear line of data going through the middle of the points. I still haven't figured out what is causing this but it appeared both in the logarithmic scale and linear (when I removed all of the very high masses). Each dot has a tooltip that shows the exoplanet's name as well as its mass and radius. The data also filters with the bar charts, which makes the line in the middle of the data even clearer.
The final component of the interface is a table that shows discovery information for each exoplanet. In my sketches I wanted to incorporate the discovery facility as different colors, but I decided against it due to the number of different facilities. However, I still wanted to show this data in some way, so I added it to my table. I chose discovery year and method because I thought I may be able to find which facilities were more active at what time and what methods they used.
The table filters along with the bar chart, but because there are so many data points it is very hard to have takeaways from its data. I wish I had some sort of search bar or filters on the table itself to make it easier to use.
Through using my visualizations for a short time I was able to learn a few things about the data set. These findings fall into a couple of categories, assumptions that I had that weren't necessarily true, guesses I took that ended up being right, and things I didn't know enough about to make a guess beforehand.
Stuff I Got Wrong:
Number of planets and number of stars in a system - I had no scientific reason that this would be the case, but I was pretty sure that systems with more stars would probably be bigger and have more planets, but this wasn't the case. There were systems with over 4 planets in systems with 1, 2, and 3 stars.
Number of planets and star type - This is similar to the last one where I thought that maybe the type of star would mean that there were more or fewer planets in the system, but after I looked into it with my visualizations I found no correlation.
Stuff I Got Right:
Discovery facility and discovery method - This was kind of low hanging fruit, but the different discovery filters do use different discovery methods. I was able to see this pretty clearly using my table and the bar chart for discovery method.
They found way more exoplanets after 2010- What this has taught me is that the only findings I got right were slam dunks to begin with. The discovery methods that were used to find the earliest exoplanets were improved upon and learned more about which lead to more discoveries.
Stuff I Never Would've Thought Of:
NOTE: These findings are more specific than the other ones because I didn't know nearly enough about the data to predict these takeaways from the beginning, but I guess that shows the value of making the visualizations.
Microlensing discovered most of the further from earth exoplanets - Of the discovery methods that were used to find the exoplanets, microlensing was the only "common" (makes up at least 1% of the data) method that detected any exoplanets over 3000 parsecs from Earth.
No exoplanet with star type A was discovered until 2008 - Star type A is the hottest star type that was included in our data set, and the first exoplanet with a star of this type was discovered in 2008. There wasn't a discovery type that correlated with that start type, with an equal number of transit and imaging discoveries for type A stars. Both of those had existed for over 4 years before the first exoplanet with star type A was discovered.
There were several parts of this project that I would love to come back to and improve. The first thing that I would improve would be to fix the visual bugs that got past me in the first iteration. I forgot to fix the widths of the cells in my table and my discovery over time graph has the incorrect title. These would be simple fixes that would make my page more usable and professional.
After those were fixed, I could implement some of the missing functionality that I couldn't get working originally. These would be tooltips for discovery over time and filtering based on habitability. Adding these features would round out the "core" functionality of my chart, but there are more improvements that I would make before I consider the page complete.
Next, I would complete the "A goals" of the assignment. The A goals include a dialog that appears when the user selects a point in the table or on the scatterplot that provides more information about the exoplanet's system. They also include adding brushing functionality to the line chart and scatter plot. This would allow the user more freedom to filter the data, which could lead to additional findings.
Finally, I would implement some of the ideas that I had in my sketches. One of these was chips that show the currently applied filters and allow the user to remove filters one at a time rather than all at once. In addition to the chips, I wanted to create a dialog where the user could be more explicit with how they wanted to filter. I would also have liked to implement the changes that I suggested on my table where I would add filters on the discovery facility and year as well as a search bar to make that component more useful.