2019-07-17 JUL

GCP

[Eric] > we’ll hold down the fort next week on C4B.

Review GCP exploits, plans, future use?

Nicole mentioned feeling restricted by SQL-only queries in GCP's BigQuery - there's also CosmosDB, from Azure, that might help get around this: https://azure.microsoft.com/en-us/free/cosmos-db/search/?&OCID=AID2000128_SEM_TG6d75e6&MarinID=TG6d75e6_288194345701_cosmos%20db_e_c__46775457819_aud-395027706889:kwd-320108205507&lnkd=Google_Azure_Brand&gclid=Cj0KCQjwjrvpBRC0ARIsAFrFuV_VO0wlEi-Y16xoOrJi6_VUthSfg06jaOS_dzQrNXIKlAMD4U96VDIaAqdrEALw_wcB

... for the right price.

Bhaumik: "It's hard to design our code in functions (i.e. takes lots of dev time)."

App Engine - charge by active instances, rather than per-request. Costlier, but easiest time-to-deploy.

(If you want to move to Cloud Functions, start in Cloud Functions.)

Community building

[Eric] Do we have a social chat app for DCEG data folks? Is that gitter’s role?

[Jonas] I really think so ! More specifically, we could start by circulating https://gitter.im/cloud4bio/community and if a lot a people fall in we could split a GCP specific group out of it. There is an argument here you may want to counter-argue: I think the Cloud is where a data science group at DCEG becomes useful. The Cloud (GCP, STRIDES) is more of a DCEG (NCI? NIH?) specific resource. On the contrary, Data Science per se is a component all projects now include, often through external collaborations.

Furthermore, this may be a good way to expose the weekly hackathon + get additional input/contributions. There is some talking of a yearly C4B workshop, maybe by folding it into other things cooking – it is a sure thing now that there will be a yearly hands-on immersive/conference on data science every year. Back to Eric's question, if all are in favor we could just circulate the C4B link by STRIDES and let them decide if circulating it to the training venues is something they want to encourage.





FYI R Bioinformatics training

From: Sean Davis <seandavi@gmail.com>

Date: Friday, July 12, 2019 at 1:39 PM

Subject: Learning materials of interest for informatics

I recently co-organized a 2-week course at Cold Spring Harbor Labs on Statistical Methods for Functional Genomics (https://meetings.cshl.edu/courses.aspx?course=C-DATA&year=19). We have made large parts of the materials available online. Links below cover single-cell RNA-seq, DNA methylation, R and Bioconductor, ATAC-seq, machine learning, data visualization, and data integration.

- http://bioconductor.org/packages/release/workflows/html/rnaseqGene.html

- https://seandavi.github.io/ITR

- https://seandavi.github.io/AtacSeqWorkshop

- https://kkorthauer.org/fungeno2019/

I suspect that a number of groups at CCR and DCEG may find the materials useful.

Sean

--

Sean Davis, MD, PhD

Center for Cancer Research

National Cancer Institute

National Institutes of Health

Bethesda, MD 20892

https://seandavi.github.io/

https://twitter.com/seandavis12