* Cantinho Satkeys

Refresh History
  • FELISCUNHA: ghyt74  pessoal  4tj97u<z
    21 de Abril de 2025, 10:38
  • cereal killa:
    19 de Abril de 2025, 21:17
  • j.s.: tenham uma Santa e Feliz Páscoa  49E09B4F 49E09B4F 49E09B4F
    19 de Abril de 2025, 18:19
  • j.s.:
    19 de Abril de 2025, 18:19
  • j.s.: dgtgtr a todos  4tj97u<z 4tj97u<z
    19 de Abril de 2025, 18:15
  • FELISCUNHA: Uma santa sexta feira para todo o auditório  4tj97u<z
    18 de Abril de 2025, 11:12
  • JPratas: try65hytr Pessoal  4tj97u<z classic k7y8j0
    18 de Abril de 2025, 03:28
  • cereal killa: try65hytr malta  classic 2dgh8i
    14 de Abril de 2025, 23:14
  • FELISCUNHA: Votos de um santo domingo para todo o auditório  101041
    13 de Abril de 2025, 11:45
  • j.s.: e um bom domingo de Ramos  43e5r6 43e5r6
    11 de Abril de 2025, 21:02
  • j.s.: tenham um excelente fim de semana  49E09B4F
    11 de Abril de 2025, 21:01
  • j.s.: try65hytr a todos  4tj97u<z
    11 de Abril de 2025, 21:00
  • JPratas: try65hytr  y5r6t Pessoal  classic k7y8j0
    11 de Abril de 2025, 04:15
  • JPratas: dgtgtr A Todos  4tj97u<z classic k7y8j0
    10 de Abril de 2025, 18:29
  • FELISCUNHA: ghyt74  pessoal   49E09B4F
    09 de Abril de 2025, 11:59
  • cereal killa: try65hytr pessoal  2dgh8i
    08 de Abril de 2025, 23:21
  • FELISCUNHA: Votos de um santo domingo para todo o auditório  43e5r6
    06 de Abril de 2025, 11:13
  • cccdh: Ola para todos!
    04 de Abril de 2025, 23:41
  • j.s.: tenham um excelente fim de semana  49E09B4F
    04 de Abril de 2025, 21:10
  • j.s.: try65hytr a todos  4tj97u<z
    04 de Abril de 2025, 21:10

Autor Tópico: Deep Learning for NLP - Part 6  (Lida 76 vezes)

0 Membros e 1 Visitante estão a ver este tópico.

Online mitsumi

  • Moderador Global
  • ***
  • Mensagens: 119150
  • Karma: +0/-0
Deep Learning for NLP - Part 6
« em: 13 de Agosto de 2021, 14:54 »
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.38 GB | Duration: 2h 39m

What you'll learn
Deep Learning for Natural Language Processing
Popular Transformer encoder and decoder models
Multi-modal Transformer models
Large scale Transformer models
DL for NLP

Requirements
Basics of machine learning
Basic understanding of Transformer based models and word embeddings
Transformer Models like BERT and GPT
Description
This course is a part of "Deep Learning for NLP" Series. In this course, I will talk about various popular Transformer models beyond the ones I have already covered in the previous sessions in this series. Such Transformer models including encoder as well as decoder based models and differ in terms of various aspects like form of input, pretraining objectives, pretraining data, architecture variations, etc.

These Transformer models have been all proposed after 2019 and some of them are also from early 2021. Thus, as of Aug 2021, these models are very recent and state of the art across multiple NLP tasks.

The course consists of three main sections as follows.

In the first section, I will talk about a few Transformer encoder and decoder models which extend the original Transformer framework. Specifically I will cover SpanBERT, Electra, DeBERTa and DialoGPT. SpanBERT, Electra and DeBERTa are Transformer encoders while DialoGPT is a Transformer decoder model. For each model, we will also talk about their architecture or pretraining differs from standard Transformer. We will also talk important results on various NLP tasks.

In the second section, I will talk about multi-modal Transformer models. Multimodal learning has gained a lot of momentum in recent years. Thus, there was a need to come up with Transformer models which could handle text and image data together. In this part, I will cover VisualBERT and vilBERT which both process the multi-modal input very effectively. Both the models have many similarities. We will discuss about theri similarities and differences in detail.

Lastly, in the third section, I will talk about lareg scale Transformer models. I will introduce the mixture of experts (MoE) architecture. Then I will talk about how GShard adapts the MoE architecture, and shows great results on massive multilingual machine translation. Lastly, I will discuss Switch Transformers which simplify the MoE routing algorithm and also do several engineering optimizations to reduce network communciation and computation costs and mitigate instabilities.

In general, each of these papers is pretty long and thus it becomes very difficult and time consuming to understand them. In these sessions, I have tried to summarize them nicely bringing out the intuitions and tying the important concepts across such papers in a coherent story. Hope you will find it useful for your work and understanding.

Who this course is for:
Beginners in deep learning
Python developers interested in data science concepts
Masters or PhD students who wish to learn deep learning concepts quickly

Screenshots


Download link:
Só visivel para registados e com resposta ao tópico.

Only visible to registered and with a reply to the topic.

Links are Interchangeable - No Password - Single Extraction