Deep Learning


-padding
-n-grams or char-level
- threshold calibration on validation set and use that threshold it for test



Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning





The newest paper on this is probably Do Deep Nets Really Need to be Deep?. While its angle is different, the main point is exactly the same: you can train a shallow network to imitate a deep one, but first you need to train the deep network to get predictions from it. Once you have those predictions, they become the labels and the second model attempts to learn the mapping from the first model.









based on their study neurons work like this: they build a dictionary of bases, and for each new item they approximate it with a sparse sum of bases


here out of 64 bases only 3 are used to approximate the new item:



sparse coding allows learning useful features from unlabeled data. infinit amount of unlabeled data e.g. from internet
https://www.youtube.com/watch?v=n1ViNeWhC24

sparse coding very closely related to ICA (Independent Component Analysis)
Andrew Ng uses ICA these days rather than sparse coding     

or learn feature hierarchies

where DBN is Deep Belief Network
The same holds when applying the same technique to other data sets and modalities. If you let unsupervised features to be learned hierachically like below it is much better than hand engineering nice features.

 
 

How quickly this stanford feature learning technique passed previous benchmarks in various fields by high margins:


Technical challenge: Scaling up!






Maxims: It is not that who has the best algorithm, it is that who has the most data. An utterly complex algorithm loses to an inferior one that has ben trained on more and more data. In sillicon valley you see simple algorithms like logistic regression being used but their advantage to others is that they see far more data than others hence they outperform

supervised algorithms

unsupervised it is not important how big data yo have trained on but rather ow many features you have learned






To speed up: divide the neural network into different machines, in each machine take advantage of multicores and also propagate updates every delta p thresholds. https://www.youtube.com/watch?v=9TqbZxxzJXI  robust to machine failiure















Subpages (2): python Torch
Ċ
Morteza Sh-,
Jun 24, 2014, 9:08 AM
Comments