2021-03-10 MAR

Hackathon 9:30-11:00

Cancer Research Data Commons

https://datacommons.cancer.gov - you can register in the resource ahead of the 11:00 seminar

Collaborative Dictionary Authoring

[Wendy]

This article from BioTeam that came out yesterday discussed their “collaborative dictionary authoring” effort shares the same idea 😊 https://www.bio-itworld.com/news/2021/03/09/see-your-data-collaborative-dictionary-authoring I think in addition to document raw public data we may also want to document derived data that required computational intensive process as well, e.g. if the raw source is fastq files and if someone ran alignment and variant calling we would want to document the exact pipeline and locations of the BAMs and VCFs files so the researcher may not need to repeat the process.

AI review

I'm a python programmer, apologies if I'll be using javascript :-D https://youtu.be/AhE8RhPGH1A

Diagraming confluence

The "confluence platform" as the MVP data platform.

Martin :-)

Indexing conceptIDs --> next week

Bhaumik, Hui , Nicole

Firestore > BigQuery > R <today's most pressing topic>

Nicole, Lorena - how can we tell if the process is faithful

Nicole et. al discussed noSQL pains in BQ tabular formats. These news that GCP, like Microsoft, is heading the noSQL route as well may be interesting: https://www.crn.com/news/applications-os/google-cloud-mongodb-take-their-alliance-to-the-next-level

Polygenic Risk Scores --> next week

[Jonas] The PGS story - we have an API !

R cloudRun --> next week

[Daniel] update

Plotly --> next week

composite plots - https://plotly.com/javascript/subplots

BigQuery integration --> next week

https://episphere.github.io/qaqc, Lorena

Data Commons - 11:00

> On Wednesday, March 10, from 11:00 a.m.–12:00 p.m. ET, the Data Science Seminar Series will feature a presentation by Matthew Trunnell titled, “Those Awkward Teenage Years: The Maturing of Data Commons.”

> Please join us via Webex.