2021-03-03

Episphere Data Science Journal Club 9:30-10:30

Jeya

Maruan Al-Shedivat, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh (2021) An Inferential Perspective on Federated Learning –

Thanks to deep learning, today we can train better machine learning models when given access to massive data. However, the standard, centralized training is impossible in many interesting use-cases—due to the associated data transfer and maintenance costs (most notably in video analytics), privacy concerns (e.g., in healthcare settings), or sensitivity of the proprietary data (e.g., in drug discovery). And yet, different parties that own even a small amount of data want to benefit from access to accurate models. This is where federated learning comes to the rescue!

Suggestion for next week...

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

A classic AI paper. Seminal paper on Transformers. Even though the paper is on an NLP task (language translation English -> German; English -> French), it has deep implications across data science.

1. Very influential in NLP (gave rise to transformers like GPT and BERT that replaced traditional CNNs and RNNs).

2. Relevant to natural language encoding (e.g., SOCCer bot).

3. Relevant to time-series analysis.

4. Will be relevant to any sequence analysis (e.g., genomics).

5. Starting to revolutionize image analytics.

6. Was critical to solve the protein folding problem (AlphaFold 2).



Daniel? 😬


Hackathon 10:30-11:30

d3.js

Lee

https://observablehq.com/collection/@d3/learn-d3

https://observablehq.com/@mbostock/10-years-of-open-source-visualization


FAIR

Jonas

Presentation - Thursday 10:30-11:30

Clinic - Fridays 10:30-11:30 - https://episphere.github.io/fair


DCEG OD Seminar

Date: Thursday, March 4, 2021

Time: 10:30 Am – 11:30 AM

Location: WebEx


Title: FAIR principles in epidemiology

Dr. Montserrat García-Closas

Deputy Director, DCEG

Director, Trans-Divisional Research Program

Senior Investigator

Dr. Jonas Almeida

Chief Data Scientist

Senior Investigator

Trans-Divisional Research Program


Host: Dr. Stephen J. Chanock, Director, DCEG

Discussant: Dr. Amy Berrington, REB, DCEG

JOIN WEBEX MEETING

https://cbiit.webex.com/cbiit/j.php?MTID=mdc970dd1c03ac0a5af16a883c5b83eee

Meeting number (access code): 730 511 678

Meeting password: DCEGseminar1!

JOIN FROM A VIDEO SYSTEM OR APPLICATION

Dial sip:730511678@cbiit.webex.com

You can also dial 173.243.2.68 and enter your meeting number.

JOIN BY PHONE

1-650-479-3207 Call-in toll number (US/Canada) Tap here to call (mobile phones only, hosts not supported): tel:%2B1-650-479-3207,,*01*730511678%23%23*01*

Quest

Nicole

Concept IDs and Connect Questionnaires