Grin logo
en de es fr
Shop
GRIN Website
Publier des textes, profitez du service complet
Go to shop › Informatique - Linguistique informatique

Multi-Modal Machine Learning. An Introduction to BERT Pre-Trained Visio-Linguistic Models

Titre: Multi-Modal Machine Learning. An Introduction to BERT Pre-Trained Visio-Linguistic Models

Exposé Écrit pour un Séminaire / Cours , 2021 , 22 Pages , Note: 1,3

Autor:in: Johanna Garthe (Auteur)

Informatique - Linguistique informatique
Extrait & Résumé des informations   Lire l'ebook
Résumé Extrait Résumé des informations

In the field of multi-modal machine learning, where the fusion of various sensory inputs shapes learning paradigms, this paper provides an introduction to BERT-based pre-trained visio-linguistic models by specifically summarizing and analyzing two approaches: ViLBERT and VL-BERT, aiming to highlight and discuss their distinctive characteristics. The paper is structured into five chapters as follows. Chapter 2 lays the fundamental principles by introducing the characteristics of the Transformer encoder and BERT. Chapter 3 presents the selected visual-linguistic models, ViLBERT and VL-BERT. The objective of chapter 4 is to summarize and discuss both models. The paper concludes with an outlook in chapter 5.

Transfer learning is a powerful technique in the field of deep learning. At first, a model is pre-trained on a specific task. Then fine-tuning is performed by taking the trained network as the basis of a new purpose-specific model to apply it on a separate task. In this way, transfer learning helps to reduce the need to develop new models for new tasks from scratch and hence saves time for training and verification. Nowadays, there are different such pre-trained models in computer vision, natural language processing (NLP) and recently for visio-linguistic tasks. The pre-trained models presented later in this paper are both based on and use BERT. BERT, which stands for Bidirectional Encoder Representations from Transformers, is a popular training technique for NLP, which is based on the architecture of a Transformer.

Extrait


Inhaltsverzeichnis (Table of Contents)

  • Introduction
  • Fundamental Principles
    • Transformer
    • BERT
  • Visio-Linguistic Models
    • VILBERT
    • VL-BERT
  • Discussion
  • Outlook

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This seminar paper aims to introduce BERT pre-trained visio-linguistic models by summarizing two recent published approaches, VILBERT and VL-BERT, and discussing their characteristics.

  • Multi-modal machine learning
  • BERT pre-trained visio-linguistic models
  • Comparison of VILBERT and VL-BERT
  • Characteristics and functionalities of the models
  • Applications and future directions

Zusammenfassung der Kapitel (Chapter Summaries)

  • Introduction: This chapter introduces the concept of multi-modal machine learning and its relevance in artificial intelligence, highlighting the importance of integrating various sensory modalities for learning and processing information. It also introduces the specific focus of the paper: BERT pre-trained visio-linguistic models and the models that will be discussed.
  • Fundamental Principles: This chapter establishes the foundations for understanding the selected models by explaining the core principles of the Transformer encoder and BERT, two essential components of the chosen models. It provides insights into their architectures and functionalities, preparing the reader for the detailed analysis of specific models.
  • Visio-Linguistic Models: This chapter dives into the heart of the paper, presenting the two selected visual-linguistic models, VILBERT and VL-BERT. The chapter highlights the key features, architecture, and pre-training procedures of each model, providing a detailed exploration of their functionalities and capabilities.

Schlüsselwörter (Keywords)

The primary focus of this seminar paper lies on multi-modal machine learning, especially in the context of visio-linguistic models. This includes the analysis of BERT pre-trained models like VILBERT and VL-BERT. The paper examines their architectures, functionalities, pre-training tasks, and potential applications within this specific field. Key terms encompass transformer encoder, BERT, visual-linguistic models, co-attention, and pre-training techniques.

Fin de l'extrait de 22 pages  - haut de page

Résumé des informations

Titre
Multi-Modal Machine Learning. An Introduction to BERT Pre-Trained Visio-Linguistic Models
Université
University of Trier  (Computerlinguistik und Digital Humanities)
Cours
Mathematische Modellierung
Note
1,3
Auteur
Johanna Garthe (Auteur)
Année de publication
2021
Pages
22
N° de catalogue
V1431361
ISBN (PDF)
9783346983749
ISBN (Livre)
9783346983756
Langue
anglais
mots-clé
Multi-Modal Machine Learning Machine Learning NLP Natural Language Processing BERT Transformer
Sécurité des produits
GRIN Publishing GmbH
Citation du texte
Johanna Garthe (Auteur), 2021, Multi-Modal Machine Learning. An Introduction to BERT Pre-Trained Visio-Linguistic Models, Munich, GRIN Verlag, https://www.grin.com/document/1431361
Lire l'ebook
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
  • Si vous voyez ce message, l'image n'a pas pu être chargée et affichée.
Extrait de  22  pages
Grin logo
  • Grin.com
  • Page::Footer::PaymentAndShipping
  • Contact
  • Prot. des données
  • CGV
  • Imprint