Search this site
Embedded Files
Towards AGI
  • Home
  • Schedule
  • Projects
  • Topics&Papers
    • Adversarial Robustness
    • Alignment and Safety
    • CompPsych-FoMo
    • Compression and Fast Inference
    • Continual Learning at Scale
    • Emergence & Phase Transitions in ML
    • Foundation Models
    • Generalization (iid and ood)
    • High Performance Computing
    • Knowledge Fusion
    • Neural Scaling Laws
    • Out-of-Distribution Generalization
    • Scaling Laws in Nature
    • State Space Models
    • Time Series Foundation Models
  • Reading Group
Towards AGI
  • Home
  • Schedule
  • Projects
  • Topics&Papers
    • Adversarial Robustness
    • Alignment and Safety
    • CompPsych-FoMo
    • Compression and Fast Inference
    • Continual Learning at Scale
    • Emergence & Phase Transitions in ML
    • Foundation Models
    • Generalization (iid and ood)
    • High Performance Computing
    • Knowledge Fusion
    • Neural Scaling Laws
    • Out-of-Distribution Generalization
    • Scaling Laws in Nature
    • State Space Models
    • Time Series Foundation Models
  • Reading Group
  • More
    • Home
    • Schedule
    • Projects
    • Topics&Papers
      • Adversarial Robustness
      • Alignment and Safety
      • CompPsych-FoMo
      • Compression and Fast Inference
      • Continual Learning at Scale
      • Emergence & Phase Transitions in ML
      • Foundation Models
      • Generalization (iid and ood)
      • High Performance Computing
      • Knowledge Fusion
      • Neural Scaling Laws
      • Out-of-Distribution Generalization
      • Scaling Laws in Nature
      • State Space Models
      • Time Series Foundation Models
    • Reading Group

Foundation Models

LLM overview - video


Transformers: Surveys etc

  • VLP: A Survey on Vision-Language Pre-training 

  • Illustrated Transformer (tutorial)

  • Transformers in vision: A survey

  • A survey on visual transformer

  • Transformers with Learnable Activation Functions

    https://www.jmlr.org/papers/volume23/21-0998/21-0998.pdf



Reinforcement Learning at Scale

  • MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge  


MultiModal Models & MultiModal Transfer

  • Emergent World Representations: Exploring a sequence model trained on a synthetic task    (blog)

  • UL2: Unifying Language Learning Paradigms

  • Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone 

  • Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language

  • Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot

  • VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks (comparison of adapters)

  • Multimodal Few-Shot Learning with Frozen Language Models 

  • AdapterDrop: On the Efficiency of Adapters in Transformers

  • AdapterFusion: Non-Destructive Task Composition for Transfer Learning 

  • Unified IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

  • PaLI: A Jointly-Scaled Multilingual Language-Image Model 

  • CoCa: Contrastive Captioners are Image-Text Foundation 

  • LoRA: Low-Rank Adaptation of Large Language Models  (adapter alternative)

  • Flamingo: a Visual Language Model for Few-Shot Learning  (perceiver resampler)

  • FLAVA: A Foundational Language And Vision Alignment Model 

  • MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning   (adapters)

  • Pretrained Transformers As Universal Computation Engines

  • Perceiver: General Perception with Iterative Attention  Video: Youtube Summary

  • Zero-Shot Text-to-Image Generation

  • Learning Transferable Visual Models From Natural Language Supervision




  • GIT: Generative approach. Simple architecture, very strong results

  • https://arxiv.org/abs/2205.14100


  • Datasets/Tasks to test on:

  • Captioning:

  • Localized Narratives: (If we want to get really ambitious, we can try training a MAGMA model that does things like pixel level classification, bounding boxes, etc)

  • https://storage.googleapis.com/openimages/web/index.html

ground: Probing Vision and Language Models for Visio-Linguistic Compositionality

  • https://arxiv.org/abs/2204.03162


  • Bongard-HOI:

  • https://arxiv.org/abs/2205.13803


  • CLEVRER:

  • https://arxiv.org/abs/1910.01442

  • QLEVR:

  • https://arxiv.org/abs/2205.03075


  • Another benchmark:

  • https://arxiv.org/abs/2206.05379

  • Data Generation:

  • Cycle-Consistent Counterfactuals by Latent Transformations

  • https://arxiv.org/abs/2203.15064

  • Semantic Segmentation with Diffusion: (only relevant for later generative portions)

  • https://arxiv.org/abs/2112.03126


Language

  • Teaching language models to support answers with verified quotes 

  • Evaluating Human-Language Model Interaction

  • Retrieval-Augmented Multimodal Language Modeling 

  • Holistic evaluation of language models 


  • Improving language models by retrieving from trillions of tokens 

  • T5 Explained - Papers With Code 

  • Chain of Thought Prompting Elicits Reasoning in Large Language Models

  • Teaching Algorithmic Reasoning via In-context Learning

  • Data governance in the age of large-scale data-driven language technology

  • From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough

  • Teaching Algorithmic Reasoning via In-context Learning

  • Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners

  • Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text

  • GPT-3 paper: Language Models are Few-Shot Learners  (GPT-3 Language Models are Few-Shot Learners (Paper Explained))


Vision

  • V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

  • Combined Scaling for Open-Vocabulary Image Classification

  • Emerging Properties in Self-Supervised Vision Transformers

  •  A ConvNet for the 2020s

  • Vision-language pre-training: Basics, recent advances, and future trends


Time-series Transformers

    • Transformers in Time Series: a Survey

    • Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications

    • Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation 

    • Multivariate Time Series Forecasting with Latent Graph Inference

    • SCINet (https://arxiv.org/pdf/2106.09305.pdf

    • DEPTS (https://openreview.net/forum?id=AJAR-JgNw)

    • S4 (https://srush.github.io/annotated-s4/)

    • ETSformer (https://arxiv.org/abs/2202.01381)

    • Pyraformer (https://openreview.net/pdf?id=0EXmFzUn5I)

    • Informer (https://arxiv.org/abs/2012.07436)

    • Reformer (https://arxiv.org/pdf/2001.04451.pdf)

    • N-HiTS (https://arxiv.org/pdf/2201.12886.pdf)

    • Autoformer (https://arxiv.org/pdf/2106.13008.pdf)

    • LogTrans (https://arxiv.org/pdf/1907.00235.pdf)

    • GLR local global ts representations (https://arxiv.org/pdf/2202.02262.pdf)

    • TACTiS (https://arxiv.org/pdf/2202.03528.pdf)

    • MQTransformer (https://arxiv.org/pdf/2009.14799.pdf)

    • ProTran (https://proceedings.neurips.cc/paper/2021/file/c68bd9055776bf38d8fc43c0ed283Paper.pdf)

    • Preformer (https://arxiv.org/pdf/2202.11356.pdf)

    • Spacetimeformer (https://arxiv.org/pdf/2109.12218.pdf)


Other Modalities

  • Table Pretraining: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse