MLSS-Indo-Lab-Transformer
From Wawan Cenggoro to Everyone: (3:06 PM)
@rian: I missed to read some parts of this documentation: https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN. Apparently, it accepts variable length using torch.nn.utils.rnn.pack_sequence().
From Anditya Arifianto to Everyone: (3:12 PM)
Feel free to type your questions in zoom chat or rocket chat
From Robby Hardi to Everyone: (3:12 PM)
J(theta) means jacobian?
From Novri Suhermi to Everyone: (3:14 PM)
it's the cost function I think
From Robby Hardi to Everyone: (3:17 PM)
@Novri: Ok. Thanks
From Renny P. Kusumawardani to Everyone: (3:17 PM)
Pak @Anditya, could you please share the link to the Colab Notebook?
From Anditya Arifianto to Everyone: (3:17 PM)
it's still the same colab as before https://colab.research.google.com/drive/1tMQ4b_hf7YJ5qVumDoK_0T3BqKW7SzXE?usp=sharing
From Renny P. Kusumawardani to Everyone: (3:18 PM)
I see, thanks! Sorry didn’t notice that :)
From Anditya Arifianto to Everyone: (3:18 PM)
👍
From Renny P. Kusumawardani to Everyone: (3:25 PM)
What is the tied_weights for? I mean, why do you want to share the same parameters on both input and output embeddings' weights
From UNTARI NOVIA WISESTY to Everyone: (3:27 PM)
from your point of you, between LSTM and GRU, which one more powerful in accuracy and time complexity? Thank you
From Renny P. Kusumawardani to Everyone: (3:29 PM)
Great, thanks Genta! Would love to read the paper you refer to, if you don’t mind :)
From Wawan Cenggoro to Everyone: (3:32 PM)
I might missed your explanation, why did you reinitialize the encoder and decoder weights in init_weights()? Isn't learning rate of 20 too large? It is usually below 0, isn't it?
From Renny P. Kusumawardani to Everyone: (3:34 PM)
Is there any particular paper that you refer to for this implementation?
From Wawan Cenggoro to Everyone: (3:36 PM)
It is already initialized by default with random uniform I believe
From Wawan Cenggoro to Everyone: (3:37 PM)
I see, thanks
From Me to Everyone: (3:38 PM)
will you show an example how to load the saved model?
From Wawan Cenggoro to Everyone: (3:39 PM)
interesting
From Renny P. Kusumawardani to Everyone: (3:40 PM)
I see, thanks Genta! :)
From Dedy Rahman Wijaya to Everyone: (3:42 PM)
thank you for your answer Genta
From Anditya Arifianto to Everyone: (3:42 PM)
@Teeradaj, about load saved model, there is an example in Evaluation cell (Practical 6) sub section 'Train from scratch->Evaluation"
From Me to Everyone: (3:42 PM)
@Anditya thank you
From Renny P. Kusumawardani to Everyone: (3:52 PM)
I have always wondered why they are called Key, Value, and Query. Could you comment on what you think is the intuition behind the naming?
From Wawan Cenggoro to Everyone: (3:58 PM)
Yes Yes it can
From Lya Hulliyyatus Suadaa to Everyone: (3:58 PM)
yes
From Hariyanti Binti Mohd Saleh to Everyone: (3:58 PM)
why not
From Me to Everyone: (3:58 PM)
sure !
From Georgios to Everyone: (3:58 PM)
Yes
From Tisa Siti Saadah to Everyone: (3:58 PM)
yes
From Wawan Cenggoro to Everyone: (4:01 PM)
Have you read "Hopfield Networks is All You Need"? It is an interesting paper where they show that Transformer is actually some kind of Hopfield Networks.
From Me to Everyone: (4:02 PM)
can you explain more about ‘cross-attention’ in Transformer?
From Renny P. Kusumawardani to Everyone: (4:02 PM)
Yes, the intuition Haha, thanks Genta! It’s just something that irks me a bit :)
From Hariyanti Binti Mohd Saleh to Everyone: (4:18 PM)
I'm working with image data.. Can you please share a bit about image transformer..thanks.. is the way of implementation also same.
From Wawan Cenggoro to Everyone: (4:20 PM)
Are you using transformer too for speech? yes sure can you explain a little bit about low-rank transformer?
From Lya Hulliyyatus Suadaa to Everyone: (4:25 PM)
Between GPT and BERT, what do you think better for text generation?
From yusril maulidan to Everyone: (4:29 PM)
For sentiment analysis (speech and facial recognition), which method do you recommend based on your experience?
From Me to Everyone: (4:31 PM)
which technique did you use to reduce the dimension?
From Me to Everyone: (4:33 PM)
i see. thank you very much :)
From Hariyanti Binti Mohd Saleh to Everyone: (4:33 PM)
do you share coding of that paper in GitHub?
From yusril maulidan to Everyone: (4:35 PM)
thank you
From Genta Winata to Everyone: (4:36 PM)
https://github.com/gentaiscool/end2end-asr-pytorch https://github.com/audioku/meta-transfer-learning https://github.com/audioku/cross-accent-maml-asr
From Hariyanti Binti Mohd Saleh to Everyone: (4:37 PM)
cool 👍
From ade romadhony to Everyone: (4:37 PM)
Genta is very busy right now. Currently he is doing internship. Thank you to share your knowledge in MLSS-Indo :)
From Renny P. Kusumawardani to Everyone: (4:38 PM)
Thank you, Genta and Pak Anditya! Not easy to cover the breadth of material that you did :)
From Me to Everyone: (4:38 PM)
thank you
From Georgios to Everyone: (4:38 PM)
Thank you