Topics in Deep Learning

Adversarial Machine Learning

  1. Introduction using Pytorch with a model trained on MNIST

  2. Beautifully written article about GAN with an elegant explanation : Click Here

  3. Ian Goodfellow Presentation slides (2018-July).

We can watch similar tech talks at : https://learning.acm.org/techtalks-archive

Probabilistic and Graphical Models :Christopher Bishop

Lecture Series : https://www.youtube.com/watch?v=ju1Grt2hdko&list=PLL0GjJzXhAWTRiW_ynFswMaiLSa0hjCZ3

Trend in Machine Learning since NIPS

A brief history of ML since the founding of NIPS/NeurIPS:

1980s: NIPS (Neural Information Processing Systems)

1990s: BIPS (Bayesian . . . )

2000s: KIPS (Kernel . . . )

2010s: DIPS (Deep . . . )

2020s: ?IPS (??? . . . )

Replies :

  • SSIPS: Self-Supervised... (Yann-Lecun)

The Master Algorithm : Pedro Domingos

Limitations of Supervised Learning - Views from Legends

Geof Hinton, who is a famous professor of ML at the University of Toronto (part time Googler), has said:

When we’re learning to see, nobody’s telling us what the right answers are — we just look. Every so often, your mother says “that’s a dog”, but that’s very little information. You’d be lucky if you got a few bits of information — even one bit per second — that way. The brain’s visual system has 10^14 neural connections. And you only live for 10^9 seconds. So it’s no use learning one bit per second. You need more like 10^5 bits per second. And there’s only one place you can get that much information: from the input itself. — Geoffrey Hinton, 1996

Questions to Ponder Over

1. Is the composite function learned by DNN Smooth? Are there any way to figure it out? monotonous? What other Properties ?

Advantage of smooth function is the rate of change is y w.r.t x is not abrupt. Therefore, x need not to be exact, some neighborhood around x would also be sufficient.

2. Are there any connection between Number of training examples and total number of Parameters? is that Low resource training?

3. DNNs are function approximator, however, the function that generated data is often unknown or assumed to come from some pdf? Does Universal Approxiamtion theorem sufficient to explain the connection between the real function that generated data and an approximate fucntion that fits the observed data? Can we measure the closeness between the two?