2021-03-03
Episphere Data Science Journal Club 9:30-10:30
Jeya
Maruan Al-Shedivat, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh (2021) An Inferential Perspective on Federated Learning –
Thanks to deep learning, today we can train better machine learning models when given access to massive data. However, the standard, centralized training is impossible in many interesting use-cases—due to the associated data transfer and maintenance costs (most notably in video analytics), privacy concerns (e.g., in healthcare settings), or sensitivity of the proprietary data (e.g., in drug discovery). And yet, different parties that own even a small amount of data want to benefit from access to accurate models. This is where federated learning comes to the rescue!
Suggestion for next week...
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
A classic AI paper. Seminal paper on Transformers. Even though the paper is on an NLP task (language translation English -> German; English -> French), it has deep implications across data science.
1. Very influential in NLP (gave rise to transformers like GPT and BERT that replaced traditional CNNs and RNNs).
2. Relevant to natural language encoding (e.g., SOCCer bot).
3. Relevant to time-series analysis.
4. Will be relevant to any sequence analysis (e.g., genomics).
5. Starting to revolutionize image analytics.
6. Was critical to solve the protein folding problem (AlphaFold 2).
https://www.cis.upenn.edu/~mkearns/papers/barbados/heckerman.pdf
http://leaf.cmu.edu - FEMNIST, two datasets - https://s3.amazonaws.com/nist-srd/SD19/by_class.zip and https://s3.amazonaws.com/nist-srd/SD19/by_write.zip
Daniel? 😬
Hackathon 10:30-11:30
d3.js
Lee
https://observablehq.com/collection/@d3/learn-d3
https://observablehq.com/@mbostock/10-years-of-open-source-visualization
FAIR
Jonas
Presentation - Thursday 10:30-11:30
Clinic - Fridays 10:30-11:30 - https://episphere.github.io/fair
DCEG OD Seminar
Date: Thursday, March 4, 2021
Time: 10:30 Am – 11:30 AM
Location: WebEx
Title: FAIR principles in epidemiology
Dr. Montserrat García-Closas
Deputy Director, DCEG
Director, Trans-Divisional Research Program
Senior Investigator
Dr. Jonas Almeida
Chief Data Scientist
Senior Investigator
Trans-Divisional Research Program
Host: Dr. Stephen J. Chanock, Director, DCEG
Discussant: Dr. Amy Berrington, REB, DCEG
JOIN WEBEX MEETING
https://cbiit.webex.com/cbiit/j.php?MTID=mdc970dd1c03ac0a5af16a883c5b83eee
Meeting number (access code): 730 511 678
Meeting password: DCEGseminar1!
JOIN FROM A VIDEO SYSTEM OR APPLICATION
Dial sip:730511678@cbiit.webex.com
You can also dial 173.243.2.68 and enter your meeting number.
JOIN BY PHONE
1-650-479-3207 Call-in toll number (US/Canada) Tap here to call (mobile phones only, hosts not supported): tel:%2B1-650-479-3207,,*01*730511678%23%23*01*
Quest
Nicole
Concept IDs and Connect Questionnaires