2019-06-26
TensorFlow (Marie, Jonas)
https://codelabs.developers.google.com/codelabs/tfjs-training-regression/index.html
https://www.youtube.com/watch?v=IHZwWFHWa-w
https://github.com/episphere/ai
https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/TensorFlow-MNIST/td-p/318708
Andrew Ng's Coursera course has historically been the gold-standard for machine learning basics. It's math heavy, but very much worthwhile: https://www.coursera.org/learn/machine-learning
Nvidia is the dominant company for accelerating AI/ML tasks. They used to have lots of raw code in their docs, but they've abstracted most of it away into high-level APIs after the AI explosion of the last few years (aside: the world changed fast, because I was coding these by hand ~4 years ago). This blog post is a good example of what actually training these models looks like from the view of a systems programmer: https://cognitivedemons.wordpress.com/2017/09/02/a-neural-network-in-10-lines-of-cuda-c-code/
The key takeaway is that neural nets are doing *lots* of computation. The total number of multiplication-add instructions is roughly (number of training epochs * (number of neurons in layer * number of layers)) just for the forward step. Up until ~ five years ago, multiply-add was two CPU instructions, though it's now often fused into one. Then there's backprop to train the model, shuffling data between RAM/GPU (which has a different memory subsystem), performing softmax or other normalization, not to mention regularization steps (either/both of L1/L2). All of this gets abstracted away by TensorFlow, but I think it's important to understand that these are expensive to train and run. Understanding what's going on behind the scenes can help you optimize models for efficiency even when using a high-level API.
Google Genomics API (Eric)
General availability, if Eric is in session we'll need to ask him to explain news like these: https://sadasystems.com/blog/google-genomics-api-mainstream-acceptance-ml-upgrades-2018
white paper: https://cloud.google.com/genomics/resources/google-genomics-whitepaper.pdf
SmartSheets (Geeta)
Blockchain and other interesting tidbits from NLM EHR Conference (Nicole)
https://biometry.nci.nih.gov/cdas/approved-projects/2180/ <-- funded project to RSI
https://blockchainhealthcaretoday.com/index.php/journal/article/view/13
https://www.boozallen.com/s/insight/thought-leadership/the-artificial-intelligence-primer.html
(Related to blockchain in genomics: https://www.wired.com/story/these-dna-startups-want-to-put-all-of-you-on-the-blockchain/
The biggest entrant in the field is from George Church (the same one who brought us CRISPR and next-gen sequencing) : https://nebula.org)
http://arussell.org/research-pub/
https://www.hl7.org/fhir/observation.html
https://www.newyorker.com/culture/cultural-comment/hahaha-vs-hehehe
https://www.ohdsi.org/ (small note from Ben: AllofUs is going to be in OMOP format [h/t John Wilbanks])