What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages
eBook - PDF

What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

  1. 45 pages
  2. English
  3. PDF
  4. Available on iOS & Android
eBook - PDF

What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages

About this book

Master's Thesis from the year 2022 in the subject Computer Sciences - Artificial Intelligence, grade: 7.50, Universidad de Alcalá, course: Artificial Intelligence and Deep Learning, language: English, abstract: Vision Transformers (ViT) are neural model architectures that compete and exceed classical convolutional neural networks (CNNs) in computer vision tasks. ViT's versatility and performance is best understood by proceeding with a backward analysis. In this study, we aim to identify, analyse and extract the key elements of ViT by backtracking on the origin of Transformer neural architectures (TNA). We hereby highlight the benefits and constraints of the Transformer architecture, as well as the foundational role of self- and multi-head attention mechanisms. We now understand why self-attention might be all we need. Our interest of the TNA has driven us to consider self-attention as a computational primitive. This generic computation framework provides flexibility in the tasks that can be performed by the Transformer. After a good grasp on Transformers, we went on to analyse their vision-applied counterpart, namely ViT, which is roughly a transposition of the initial Transformer architecture to an image-recognition and -processing context.When it comes to computer vision, convolutional neural networks are considered the go to paradigm. Because of their proclivity for vision, we naturally seek to understand how ViT compared to CNN. It seems that their inner workings are rather different.CNNs are built with a strong inductive bias, an engineering feature that provides them with the ability to perform well in vision tasks. ViT have less inductive bias and need to learn this (convolutional filters) by ingesting enough data. This makes Transformer-based architecture rather data-hungry and more adaptable.Finally, we describe potential enhancements on the Transformer with a focus on possible architectural extensions. We discuss some exciting learning approaches in machine learning. Our last part analysis leads us to ponder on the flexibility of Transformer-based neural architecture. We realize and argue that this feature might possibility be linked to their Turing-completeness.

Trusted by 375,005 students

Access to over 1.5 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Publisher
GRIN Verlag
Year
2024
Edition
0
eBook ISBN
9783346993304

Table of contents

    Frequently asked questions

    Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
    No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
    Perlego offers two plans: Essential and Complete
    • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
    • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.5M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
    Both plans are available with monthly, semester, or annual billing cycles.
    We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1.5 million books across 990+ topics, we’ve got you covered! Learn about our mission
    Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
    Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
    Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
    Yes, you can access What Fuels Transformers in Computer Vision? Unraveling ViT's Advantages by Tolga Topal in PDF and/or ePUB format, as well as other popular books in Computer Science & Artificial Intelligence (AI) & Semantics. We have over 1.5 million books available in our catalogue for you to explore.