MLSS-Indo-Lab-Transformer

From Wawan Cenggoro to Everyone: (3:06 PM)

 @rian: I missed to read some parts of this documentation: https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN. Apparently, it accepts variable length using torch.nn.utils.rnn.pack_sequence().

From Anditya Arifianto to Everyone: (3:12 PM)

 Feel free to type your questions in zoom chat or rocket chat

From Robby Hardi to Everyone: (3:12 PM)

 J(theta) means jacobian?

From Novri Suhermi to Everyone: (3:14 PM)

 it's the cost function I think

From Robby Hardi to Everyone: (3:17 PM)

 @Novri: Ok. Thanks

From Renny P. Kusumawardani to Everyone: (3:17 PM)

 Pak @Anditya, could you please share the link to the Colab Notebook?

From Anditya Arifianto to Everyone: (3:17 PM)

 it's still the same colab as before https://colab.research.google.com/drive/1tMQ4b_hf7YJ5qVumDoK_0T3BqKW7SzXE?usp=sharing

From Renny P. Kusumawardani to Everyone: (3:18 PM)

 I see, thanks! Sorry didn’t notice that :)

From Anditya Arifianto to Everyone: (3:18 PM)

 👍 

From Renny P. Kusumawardani to Everyone: (3:25 PM)

 What is the tied_weights for? I mean, why do you want to share the same parameters on both input and output embeddings' weights

From UNTARI NOVIA WISESTY to Everyone: (3:27 PM)

 from your point of you, between LSTM and GRU, which one more powerful in accuracy and time complexity? Thank you

From Renny P. Kusumawardani to Everyone: (3:29 PM)

 Great, thanks Genta! Would love to read the paper you refer to, if you don’t mind :)

From Wawan Cenggoro to Everyone: (3:32 PM)

 I might missed your explanation, why did you reinitialize the encoder and decoder weights in init_weights()? Isn't learning rate of 20 too large? It is usually below 0, isn't it?

From Renny P. Kusumawardani to Everyone: (3:34 PM)

 Is there any particular paper that you refer to for this implementation?

From Wawan Cenggoro to Everyone: (3:36 PM)

 It is already initialized by default with random uniform I believe

From Wawan Cenggoro to Everyone: (3:37 PM)

 I see, thanks

From Me to Everyone: (3:38 PM)

 will you show an example how to load the saved model?

From Wawan Cenggoro to Everyone: (3:39 PM)

 interesting

From Renny P. Kusumawardani to Everyone: (3:40 PM)

 I see, thanks Genta! :)

From Dedy Rahman Wijaya to Everyone: (3:42 PM)

 thank you for your answer Genta

From Anditya Arifianto to Everyone: (3:42 PM)

 @Teeradaj, about load saved model, there is an example in Evaluation cell (Practical 6) sub section 'Train from scratch->Evaluation"

From Me to Everyone: (3:42 PM)

 @Anditya thank you

From Renny P. Kusumawardani to Everyone: (3:52 PM)

 I have always wondered why they are called Key, Value, and Query. Could you comment on what you think is the intuition behind the naming?

From Wawan Cenggoro to Everyone: (3:58 PM)

 Yes Yes it can

From Lya Hulliyyatus Suadaa to Everyone: (3:58 PM)

From Hariyanti Binti Mohd Saleh to Everyone: (3:58 PM)

 why not

From Me to Everyone: (3:58 PM)

 sure !

From Georgios to Everyone: (3:58 PM)

From Tisa Siti Saadah to Everyone: (3:58 PM)

From Wawan Cenggoro to Everyone: (4:01 PM)

 Have you read "Hopfield Networks is All You Need"? It is an interesting paper where they show that Transformer is actually some kind of Hopfield Networks.

From Me to Everyone: (4:02 PM)

 can you explain more about ‘cross-attention’ in Transformer?

From Renny P. Kusumawardani to Everyone: (4:02 PM)

 Yes, the intuition Haha, thanks Genta! It’s just something that irks me a bit :)

From Hariyanti Binti Mohd Saleh to Everyone: (4:18 PM)

 I'm working with image data.. Can you please share a bit about image transformer..thanks.. is the way of implementation also same.

From Wawan Cenggoro to Everyone: (4:20 PM)

 Are you using transformer too for speech? yes sure can you explain a little bit about low-rank transformer?

From Lya Hulliyyatus Suadaa to Everyone: (4:25 PM)

 Between GPT and BERT, what do you think better for text generation?

From yusril maulidan to Everyone: (4:29 PM)

 For sentiment analysis (speech and facial recognition), which method do you recommend based on your experience?

From Me to Everyone: (4:31 PM)

 which technique did you use to reduce the dimension?

From Me to Everyone: (4:33 PM)

 i see. thank you very much :)

From Hariyanti Binti Mohd Saleh to Everyone: (4:33 PM)

 do you share coding of that paper in GitHub?

From yusril maulidan to Everyone: (4:35 PM)

 thank you

From Genta Winata to Everyone: (4:36 PM)

 https://github.com/gentaiscool/end2end-asr-pytorch https://github.com/audioku/meta-transfer-learning https://github.com/audioku/cross-accent-maml-asr

From Hariyanti Binti Mohd Saleh to Everyone: (4:37 PM)

 cool 👍

From ade romadhony to Everyone: (4:37 PM)

 Genta is very busy right now. Currently he is doing internship. Thank you to share your knowledge in MLSS-Indo :)

From Renny P. Kusumawardani to Everyone: (4:38 PM)

 Thank you, Genta and Pak Anditya! Not easy to cover the breadth of material that you did :)

From Me to Everyone: (4:38 PM)

 thank you

From Georgios to Everyone: (4:38 PM)

 Thank you

Google Sites

Report abuse

MLSS-Indo-Lab-Transformer

see also >> LinkedIn & ResearchGate & Google Scholar & DBLP & researchmap