
From Wawan Cenggoro to Everyone: (3:06 PM)

@rian: I missed to read some parts of this documentation: Apparently, it accepts variable length using torch.nn.utils.rnn.pack_sequence().

From Anditya Arifianto to Everyone: (3:12 PM)

Feel free to type your questions in zoom chat or rocket chat

From Robby Hardi to Everyone: (3:12 PM)

J(theta) means jacobian?

From Novri Suhermi to Everyone: (3:14 PM)

it's the cost function I think

From Robby Hardi to Everyone: (3:17 PM)

@Novri: Ok. Thanks

From Renny P. Kusumawardani to Everyone: (3:17 PM)

Pak @Anditya, could you please share the link to the Colab Notebook?

From Anditya Arifianto to Everyone: (3:17 PM)

it's still the same colab as before

From Renny P. Kusumawardani to Everyone: (3:18 PM)

I see, thanks! Sorry didn’t notice that :)

From Anditya Arifianto to Everyone: (3:18 PM)


From Renny P. Kusumawardani to Everyone: (3:25 PM)

What is the tied_weights for?
I mean, why do you want to share the same parameters on both input and output embeddings' weights

From UNTARI NOVIA WISESTY to Everyone: (3:27 PM)

from your point of you, between LSTM and GRU, which one more powerful in accuracy and time complexity? Thank you

From Renny P. Kusumawardani to Everyone: (3:29 PM)

Great, thanks Genta! Would love to read the paper you refer to, if you don’t mind :)

From Wawan Cenggoro to Everyone: (3:32 PM)

I might missed your explanation, why did you reinitialize the encoder and decoder weights in init_weights()?
Isn't learning rate of 20 too large? It is usually below 0, isn't it?

From Renny P. Kusumawardani to Everyone: (3:34 PM)

Is there any particular paper that you refer to for this implementation?

From Wawan Cenggoro to Everyone: (3:36 PM)

It is already initialized by default with random uniform I believe

From Wawan Cenggoro to Everyone: (3:37 PM)

I see, thanks

From Me to Everyone: (3:38 PM)

will you show an example how to load the saved model?

From Wawan Cenggoro to Everyone: (3:39 PM)


From Renny P. Kusumawardani to Everyone: (3:40 PM)

I see, thanks Genta! :)

From Dedy Rahman Wijaya to Everyone: (3:42 PM)

thank you for your answer Genta

From Anditya Arifianto to Everyone: (3:42 PM)

@Teeradaj, about load saved model, there is an example in Evaluation cell (Practical 6)
sub section 'Train from scratch->Evaluation"

From Me to Everyone: (3:42 PM)

@Anditya thank you

From Renny P. Kusumawardani to Everyone: (3:52 PM)

I have always wondered why they are called Key, Value, and Query. Could you comment on what you think is the intuition behind the naming?

From Wawan Cenggoro to Everyone: (3:58 PM)

Yes it can

From Lya Hulliyyatus Suadaa to Everyone: (3:58 PM)


From Hariyanti Binti Mohd Saleh to Everyone: (3:58 PM)

why not

From Me to Everyone: (3:58 PM)

sure !

From Georgios to Everyone: (3:58 PM)


From Tisa Siti Saadah to Everyone: (3:58 PM)


From Wawan Cenggoro to Everyone: (4:01 PM)

Have you read "Hopfield Networks is All You Need"? It is an interesting paper where they show that Transformer is actually some kind of Hopfield Networks.

From Me to Everyone: (4:02 PM)

can you explain more about ‘cross-attention’ in Transformer?

From Renny P. Kusumawardani to Everyone: (4:02 PM)

Yes, the intuition
Haha, thanks Genta! It’s just something that irks me a bit :)

From Hariyanti Binti Mohd Saleh to Everyone: (4:18 PM)

I'm working with image data.. Can you please share a bit about image transformer..thanks..
is the way of implementation also same.

From Wawan Cenggoro to Everyone: (4:20 PM)

Are you using transformer too for speech?
yes sure
can you explain a little bit about low-rank transformer?

From Lya Hulliyyatus Suadaa to Everyone: (4:25 PM)

Between GPT and BERT, what do you think better for text generation?

From yusril maulidan to Everyone: (4:29 PM)

For sentiment analysis (speech and facial recognition), which method do you recommend based on your experience?

From Me to Everyone: (4:31 PM)

which technique did you use to reduce the dimension?

From Me to Everyone: (4:33 PM)

i see. thank you very much :)

From Hariyanti Binti Mohd Saleh to Everyone: (4:33 PM)

do you share coding of that paper in GitHub?

From yusril maulidan to Everyone: (4:35 PM)

thank you

From Genta Winata to Everyone: (4:36 PM)


From Hariyanti Binti Mohd Saleh to Everyone: (4:37 PM)


From ade romadhony to Everyone: (4:37 PM)

Genta is very busy right now. Currently he is doing internship. Thank you to share your knowledge in MLSS-Indo :)

From Renny P. Kusumawardani to Everyone: (4:38 PM)

Thank you, Genta and Pak Anditya! Not easy to cover the breadth of material that you did :)

From Me to Everyone: (4:38 PM)

thank you

From Georgios to Everyone: (4:38 PM)

Thank you