2019-06-26

TensorFlow (Marie, Jonas)

https://codelabs.developers.google.com/codelabs/tfjs-training-regression/index.html

https://www.youtube.com/watch?v=IHZwWFHWa-w

https://github.com/episphere/ai

https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/TensorFlow-MNIST/td-p/318708

Andrew Ng's Coursera course has historically been the gold-standard for machine learning basics. It's math heavy, but very much worthwhile: https://www.coursera.org/learn/machine-learning


Nvidia is the dominant company for accelerating AI/ML tasks. They used to have lots of raw code in their docs, but they've abstracted most of it away into high-level APIs after the AI explosion of the last few years (aside: the world changed fast, because I was coding these by hand ~4 years ago). This blog post is a good example of what actually training these models looks like from the view of a systems programmer: https://cognitivedemons.wordpress.com/2017/09/02/a-neural-network-in-10-lines-of-cuda-c-code/

The key takeaway is that neural nets are doing *lots* of computation. The total number of multiplication-add instructions is roughly (number of training epochs * (number of neurons in layer * number of layers)) just for the forward step. Up until ~ five years ago, multiply-add was two CPU instructions, though it's now often fused into one. Then there's backprop to train the model, shuffling data between RAM/GPU (which has a different memory subsystem), performing softmax or other normalization, not to mention regularization steps (either/both of L1/L2). All of this gets abstracted away by TensorFlow, but I think it's important to understand that these are expensive to train and run. Understanding what's going on behind the scenes can help you optimize models for efficiency even when using a high-level API.

Google Genomics API (Eric)

General availability, if Eric is in session we'll need to ask him to explain news like these: https://sadasystems.com/blog/google-genomics-api-mainstream-acceptance-ml-upgrades-2018

white paper: https://cloud.google.com/genomics/resources/google-genomics-whitepaper.pdf

SmartSheets (Geeta)

https://youtu.be/4fJOWfEuLIE

Blockchain and other interesting tidbits from NLM EHR Conference (Nicole)

https://biometry.nci.nih.gov/cdas/approved-projects/2180/ <-- funded project to RSI

https://www.boozallen.com/s/insight/blog/7-things-youre-getting-wrong-about-blockchain-technology.html

https://blockchainhealthcaretoday.com/index.php/journal/article/view/13

https://www.boozallen.com/s/insight/thought-leadership/the-artificial-intelligence-primer.html

(Related to blockchain in genomics: https://www.wired.com/story/these-dna-startups-want-to-put-all-of-you-on-the-blockchain/

The biggest entrant in the field is from George Church (the same one who brought us CRISPR and next-gen sequencing) : https://nebula.org)

http://arussell.org/research-pub/

https://loinc.org/

https://www.hl7.org/fhir/observation.html

https://www.newyorker.com/culture/cultural-comment/hahaha-vs-hehehe

https://www.ohdsi.org/ (small note from Ben: AllofUs is going to be in OMOP format [h/t John Wilbanks])