2024-01-03 JAN

News from NeurIPS 2023, delivering AI Assistants

Journal Club

NeurIPS | 2023 [Jeya]

Thirty-seventh Conference on Neural Information Processing Systems

News / Highlights / paper awards

Ideas relevant to epiVerse

Foundation models creating a paradigm shift in AI: https://arxiv.org/abs/2108.07258. Recipe by Andrew Ng—
1. Build prototype using LLM APIs.
2. If safe, deploy immediately (no testing).
3. Monitor performance. If you spot a tricky example, add the example to your hand-crafted eval dataset. When tuning (including prompt engineering), examine results on eval set. Eval set can be ~10 examples.
4. Optional: develop systematic error metrics that are more relevant to your KPI.
5. Optional: invest in building a large eval set.
Evaluation packages: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
1. Design pattern 1: for applications with a clear right answer, measure accuracy.
2. Design pattern 2: for applications with several good answers or approximate answers, develop an LLM agent to evaluate the original LLM's output. If an approximate gold standard answer (or references sources for one) is available, include it into the agent's knowledge base.
Exploring open "source" models: versioning, privacy, product vs. model concerns
1. Mistral: https://mistral.ai/
2. DeepInfra: https://deepinfra.com/pricing
3. AnyScale: https://www.anyscale.com/endpoints (mentioned by Yann LeCun)
Agents

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings (https://arxiv.org/abs/2305.11554)

Ideas relevant to Federated Learning

EPFL (https://www.epfl.ch/labs/mlo/) is deeply involved in P2P federated learning. Potential ally in the GDPR zone?
EPFL's efforts on P2P with WebRTC: https://github.com/epfml/disco
EPFL's efforts on efficient P2P learning through "Epidemic" learning: Boosting Decentralized Learning with Randomized Communication https://arxiv.org/abs/2310.01972

Test of time award winner - lessons learned from word2vec

Paper: Distributed Representations of Words and Phrases and their Compositionality (https://arxiv.org/abs/1310.4546)

Semi-supervised objectives + large corpora = key to NLU
1. Skip gram
2. CBOW
3. Next word prediction
Fast, parallel, weakly-synchronized computation dominates ML
1. Allows scaling, which give better results
2. A single parameter server that distributes model parameters across multiple machines and orchestrates their learning. Led to their next big paper. This was motivated by negative sampling.
3. Hate for locking and synchronization was the biggest enabler for works like these.
Focus your compute where it really helps improve your learning
1. Common tokens are easier to learn and are less informative. So negative sample the tokens that were frequent both in inputs and targets.
2. Make models simpler and faster (parallel) by focussing on the important problems.
3. Word2vec > RNNs. Transformers > LSTMs.
  Notes from Jeya's presentation:
  https://jalammar.github.io/illustrated-word2vec , https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
Tokenization helps solve nuanced problems
1. Tokenization strategy: which bits of text gets a vector. Which bit to focus on?
2. It can be used for phrase representations. Compound concepts/nouns are represented by multiple words. Bigrams made up of two infrequent words are focussed on more.
3. Have a flexible strategy on representing words.
4. Sub-word tokenization is still used today in Transformers
Treating language as a sequence of dense vectors is powerful.
1. Representing concepts as a dense vector. Operators in that space look at geometrical relationships in that space.
2. 1985 Rumelhart suggested this. Neuroscientists have debated this for decades. These were just conjectures.
3. Word2vec: syntactic and semantic relationships were represented geometrically (PCA).
4. By simple addition/subtraction you can solve analogy problems.
  1. Paris - France + Italy = Rome
  2. Sushi - Japan + Germany = Bratwurst
  3. Czech + currency = koruna
  4. French + actress = Juliette Binoche, Vanessa Paradis, Charlotte Gainsbourg

FDA's entry - synthetic data

Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses (https://arxiv.org/abs/2310.18494).

Hackathon

Delivering AI Assistants

recall last FAIR Friday, and the use of https://platform.openai.com/playground , can/should we poxy governance?

Google Cloud

https://carah.io/hhssymposium

Preprint dynamics

for example https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10593073

DeArraying update

Creating standards, acting on them remotely [Aaron]

Strategic planning

Report abuse