Topics in Deep Learning
Adversarial Machine Learning
Introduction using Pytorch with a model trained on MNIST
Beautifully written article about GAN with an elegant explanation : Click Here
Ian Goodfellow Presentation slides (2018-July).
We can watch similar tech talks at : https://learning.acm.org/techtalks-archive
Zero-Shot Learning
Probabilistic and Graphical Models :Christopher Bishop
Lecture Series : https://www.youtube.com/watch?v=ju1Grt2hdko&list=PLL0GjJzXhAWTRiW_ynFswMaiLSa0hjCZ3
Trend in Machine Learning since NIPS
A brief history of ML since the founding of NIPS/NeurIPS:
1980s: NIPS (Neural Information Processing Systems)
1990s: BIPS (Bayesian . . . )
2000s: KIPS (Kernel . . . )
2010s: DIPS (Deep . . . )
2020s: ?IPS (??? . . . )
Replies :
SSIPS: Self-Supervised... (Yann-Lecun)
The Master Algorithm : Pedro Domingos
Limitations of Supervised Learning - Views from Legends
Geof Hinton, who is a famous professor of ML at the University of Toronto (part time Googler), has said:
When we’re learning to see, nobody’s telling us what the right answers are — we just look. Every so often, your mother says “that’s a dog”, but that’s very little information. You’d be lucky if you got a few bits of information — even one bit per second — that way. The brain’s visual system has 10^14 neural connections. And you only live for 10^9 seconds. So it’s no use learning one bit per second. You need more like 10^5 bits per second. And there’s only one place you can get that much information: from the input itself. — Geoffrey Hinton, 1996
Questions to Ponder Over
1. Is the composite function learned by DNN Smooth? Are there any way to figure it out? monotonous? What other Properties ?
Advantage of smooth function is the rate of change is y w.r.t x is not abrupt. Therefore, x need not to be exact, some neighborhood around x would also be sufficient.
2. Are there any connection between Number of training examples and total number of Parameters? is that Low resource training?
3. DNNs are function approximator, however, the function that generated data is often unknown or assumed to come from some pdf? Does Universal Approxiamtion theorem sufficient to explain the connection between the real function that generated data and an approximate fucntion that fits the observed data? Can we measure the closeness between the two?