MLSS-Indo-Lab-Transformer

From Wawan Cenggoro to Everyone: (3:06 PM)

  • 
@rian: I missed to read some parts of this documentation: https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN. Apparently, it accepts variable length using torch.nn.utils.rnn.pack_sequence().


From Anditya Arifianto to Everyone: (3:12 PM)

  • 
Feel free to type your questions in zoom chat or rocket chat


From Robby Hardi to Everyone: (3:12 PM)

  • 
J(theta) means jacobian?


From Novri Suhermi to Everyone: (3:14 PM)

  • 
it's the cost function I think


From Robby Hardi to Everyone: (3:17 PM)

  • 
@Novri: Ok. Thanks


From Renny P. Kusumawardani to Everyone: (3:17 PM)

  • 
Pak @Anditya, could you please share the link to the Colab Notebook?


From Anditya Arifianto to Everyone: (3:17 PM)

  • 
it's still the same colab as before
https://colab.research.google.com/drive/1tMQ4b_hf7YJ5qVumDoK_0T3BqKW7SzXE?usp=sharing


From Renny P. Kusumawardani to Everyone: (3:18 PM)

  • 
I see, thanks! Sorry didn’t notice that :)


From Anditya Arifianto to Everyone: (3:18 PM)


👍


From Renny P. Kusumawardani to Everyone: (3:25 PM)

  • 
What is the tied_weights for?
I mean, why do you want to share the same parameters on both input and output embeddings' weights


From UNTARI NOVIA WISESTY to Everyone: (3:27 PM)

  • 
from your point of you, between LSTM and GRU, which one more powerful in accuracy and time complexity? Thank you


From Renny P. Kusumawardani to Everyone: (3:29 PM)

  • 
Great, thanks Genta! Would love to read the paper you refer to, if you don’t mind :)


From Wawan Cenggoro to Everyone: (3:32 PM)

  • 
I might missed your explanation, why did you reinitialize the encoder and decoder weights in init_weights()?
Isn't learning rate of 20 too large? It is usually below 0, isn't it?


From Renny P. Kusumawardani to Everyone: (3:34 PM)

  • 
Is there any particular paper that you refer to for this implementation?


From Wawan Cenggoro to Everyone: (3:36 PM)

  • 
It is already initialized by default with random uniform I believe


From Wawan Cenggoro to Everyone: (3:37 PM)

  • 
I see, thanks


From Me to Everyone: (3:38 PM)

  • 
will you show an example how to load the saved model?


From Wawan Cenggoro to Everyone: (3:39 PM)

  • 
interesting


From Renny P. Kusumawardani to Everyone: (3:40 PM)

  • 
I see, thanks Genta! :)


From Dedy Rahman Wijaya to Everyone: (3:42 PM)

  • 
thank you for your answer Genta


From Anditya Arifianto to Everyone: (3:42 PM)

  • 
@Teeradaj, about load saved model, there is an example in Evaluation cell (Practical 6)
sub section 'Train from scratch->Evaluation"


From Me to Everyone: (3:42 PM)

  • 
@Anditya thank you


From Renny P. Kusumawardani to Everyone: (3:52 PM)

  • 
I have always wondered why they are called Key, Value, and Query. Could you comment on what you think is the intuition behind the naming?


From Wawan Cenggoro to Everyone: (3:58 PM)

  • 
Yes
Yes it can


From Lya Hulliyyatus Suadaa to Everyone: (3:58 PM)

  • 
yes


From Hariyanti Binti Mohd Saleh to Everyone: (3:58 PM)

  • 
why not


From Me to Everyone: (3:58 PM)

  • 
sure !


From Georgios to Everyone: (3:58 PM)

  • 
Yes


From Tisa Siti Saadah to Everyone: (3:58 PM)

  • 
yes


From Wawan Cenggoro to Everyone: (4:01 PM)

  • 
Have you read "Hopfield Networks is All You Need"? It is an interesting paper where they show that Transformer is actually some kind of Hopfield Networks.


From Me to Everyone: (4:02 PM)

  • 
can you explain more about ‘cross-attention’ in Transformer?


From Renny P. Kusumawardani to Everyone: (4:02 PM)

  • 
Yes, the intuition
Haha, thanks Genta! It’s just something that irks me a bit :)


From Hariyanti Binti Mohd Saleh to Everyone: (4:18 PM)

  • 
I'm working with image data.. Can you please share a bit about image transformer..thanks..
is the way of implementation also same.


From Wawan Cenggoro to Everyone: (4:20 PM)

  • 
Are you using transformer too for speech?
yes sure
can you explain a little bit about low-rank transformer?


From Lya Hulliyyatus Suadaa to Everyone: (4:25 PM)

  • 
Between GPT and BERT, what do you think better for text generation?


From yusril maulidan to Everyone: (4:29 PM)

  • 
For sentiment analysis (speech and facial recognition), which method do you recommend based on your experience?


From Me to Everyone: (4:31 PM)

  • 
which technique did you use to reduce the dimension?


From Me to Everyone: (4:33 PM)

  • 
i see. thank you very much :)


From Hariyanti Binti Mohd Saleh to Everyone: (4:33 PM)

  • 
do you share coding of that paper in GitHub?


From yusril maulidan to Everyone: (4:35 PM)

  • 
thank you


From Genta Winata to Everyone: (4:36 PM)

  • 
https://github.com/gentaiscool/end2end-asr-pytorch
https://github.com/audioku/meta-transfer-learning
https://github.com/audioku/cross-accent-maml-asr


From Hariyanti Binti Mohd Saleh to Everyone: (4:37 PM)

  • 
cool
👍


From ade romadhony to Everyone: (4:37 PM)

  • 
Genta is very busy right now. Currently he is doing internship. Thank you to share your knowledge in MLSS-Indo :)


From Renny P. Kusumawardani to Everyone: (4:38 PM)

  • 
Thank you, Genta and Pak Anditya! Not easy to cover the breadth of material that you did :)


From Me to Everyone: (4:38 PM)

  • 
thank you


From Georgios to Everyone: (4:38 PM)

  • 
Thank you