2021-03-10 MAR
Hackathon 9:30-11:00
Cancer Research Data Commons
https://datacommons.cancer.gov - you can register in the resource ahead of the 11:00 seminar
Collaborative Dictionary Authoring
[Wendy]
This article from BioTeam that came out yesterday discussed their “collaborative dictionary authoring” effort shares the same idea 😊 https://www.bio-itworld.com/news/2021/03/09/see-your-data-collaborative-dictionary-authoring I think in addition to document raw public data we may also want to document derived data that required computational intensive process as well, e.g. if the raw source is fastq files and if someone ran alignment and variant calling we would want to document the exact pipeline and locations of the BAMs and VCFs files so the researcher may not need to repeat the process.
AI review
I'm a python programmer, apologies if I'll be using javascript :-D https://youtu.be/AhE8RhPGH1A
Diagraming confluence
The "confluence platform" as the MVP data platform.
Martin :-)
Bhaumik, Hui , Nicole
Firestore > BigQuery > R <today's most pressing topic>
Nicole, Lorena - how can we tell if the process is faithful
Nicole et. al discussed noSQL pains in BQ tabular formats. These news that GCP, like Microsoft, is heading the noSQL route as well may be interesting: https://www.crn.com/news/applications-os/google-cloud-mongodb-take-their-alliance-to-the-next-level
[Jonas] The PGS story - we have an API !
[Daniel] update
composite plots - https://plotly.com/javascript/subplots
https://episphere.github.io/qaqc, Lorena
Data Commons - 11:00
> On Wednesday, March 10, from 11:00 a.m.–12:00 p.m. ET, the Data Science Seminar Series will feature a presentation by Matthew Trunnell titled, “Those Awkward Teenage Years: The Maturing of Data Commons.”
> Please join us via Webex.