Towards AGI - Foundation Models

Foundation Models

LLM overview - video

Transformers: Surveys etc

VLP: A Survey on Vision-Language Pre-training
Illustrated Transformer (tutorial)
Transformers in vision: A survey
A survey on visual transformer
Transformers with Learnable Activation Functions

https://www.jmlr.org/papers/volume23/21-0998/21-0998.pdf

Reinforcement Learning at Scale

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

MultiModal Models & MultiModal Transfer

GIT: Generative approach. Simple architecture, very strong results
https://arxiv.org/abs/2205.14100

Datasets/Tasks to test on:
Captioning:
Localized Narratives: (If we want to get really ambitious, we can try training a MAGMA model that does things like pixel level classification, bounding boxes, etc)
https://storage.googleapis.com/openimages/web/index.html

ground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Another benchmark:
https://arxiv.org/abs/2206.05379
Data Generation:
Cycle-Consistent Counterfactuals by Latent Transformations
https://arxiv.org/abs/2203.15064
Semantic Segmentation with Diffusion: (only relevant for later generative portions)
https://arxiv.org/abs/2112.03126

Language

Vision

Time-series Transformers

- Transformers in Time Series: a Survey
- Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications
- Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
- Multivariate Time Series Forecasting with Latent Graph Inference
- SCINet (https://arxiv.org/pdf/2106.09305.pdf
- DEPTS (https://openreview.net/forum?id=AJAR-JgNw)
- S4 (https://srush.github.io/annotated-s4/)
- ETSformer (https://arxiv.org/abs/2202.01381)
- Pyraformer (https://openreview.net/pdf?id=0EXmFzUn5I)
- Informer (https://arxiv.org/abs/2012.07436)
- Reformer (https://arxiv.org/pdf/2001.04451.pdf)
- N-HiTS (https://arxiv.org/pdf/2201.12886.pdf)
- Autoformer (https://arxiv.org/pdf/2106.13008.pdf)
- LogTrans (https://arxiv.org/pdf/1907.00235.pdf)
- GLR local global ts representations (https://arxiv.org/pdf/2202.02262.pdf)
- TACTiS (https://arxiv.org/pdf/2202.03528.pdf)
- MQTransformer (https://arxiv.org/pdf/2009.14799.pdf)
- ProTran (https://proceedings.neurips.cc/paper/2021/file/c68bd9055776bf38d8fc43c0ed283Paper.pdf)
- Preformer (https://arxiv.org/pdf/2202.11356.pdf)
- Spacetimeformer (https://arxiv.org/pdf/2109.12218.pdf)

Other Modalities

Table Pretraining: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks

Page updated

Google Sites

Report abuse