chtnnh's Digital Garden

              • aiSafety
              • compressingQuantizedLanguageModels
            • 01_machine_learning_and_deep_learning
            • 02_large_language_models
            • 03_vision_language_models
            • 04_ai_agents
            • 05_advanced_topics
            • 06_aws_for_ai_engineering
            • 07_terraform_for_ai_infrastructure
            • 08_open_source_projects_and_communities
            • training-and-evaluation
          • roadmap
          • quickRefresher
          • list
          • todo
              • mechInterp
              • Proof for the Macroscopic Effects of Quantum Events via the Butterfly Effect and Chaos Theory
              • frauchigerRennerDeepDive
            • shouldYouStartAStartup
          • antlerMENAP
            • README
            • README
            • README
            • README
            • README
            • README
            • README
            • README
          • CONTRIBUTING
          • LICENSE
          • README
        • Graph View Guide
        • IMPLEMENTATION_SUMMARY
        • Master MOC - Map of Content
        • Projects Tracker
        • README
        • TAGGING_SUMMARY
      Home

      ❯

      learning

      ❯

      ai

      ❯

      notes

      ❯

      03_vision_language_models

      03_vision_language_models

      May 16, 20251 min read

      Vision Language Models (VLMs)

      Introduction

      Vision Language Models (VLMs) are AI systems that can understand and process both visual and textual information. They bridge the gap between computer vision and natural language processing, enabling AI to comprehend and describe visual content.

      Topics Covered

      1. Algorithms

      • Contrastive Learning
      • Masking-based VLMs
      • Generative-based VLMs
      • Pretrained Backbone-based VLMs

      2. Architectures

      • Vision Transformer (ViT)
      • Dual Encoder
      • Fusion Encoder-Decoder

      3. Training Techniques

      • Data Collection and Preprocessing
      • Data Pruning
      • Contrastive Learning
      • Masked Language-Image Modeling
      • Transfer Learning

      4. Ethical Considerations

      • Bias Mitigation
      • Responsible AI Development
      • Fairness Evaluation

      5. Tools and Libraries

      • Hugging Face Transformers
      • NVIDIA NeMo

      Learning Resources

      Documentation and Guides

      • Vision Language Model Prompt Engineering Guide
      • Hugging Face Transformers Documentation
      • NVIDIA NeMo Documentation

      Articles and Tutorials

      • Vision Language Models (VLMs) Explained
      • A Deep Dive into VLMs: Vision-Language Models
      • What is a Vision-Language Model (VLM)?
      • Guide to Vision-Language Models (VLMs)

      Graph View

      • Vision Language Models (VLMs)
      • Introduction
      • Topics Covered
      • 1. Algorithms
      • 2. Architectures
      • 3. Training Techniques
      • 4. Ethical Considerations
      • 5. Tools and Libraries
      • Learning Resources
      • Documentation and Guides
      • Articles and Tutorials

      darukavana - chtnnh's digital garden | Product Hunt

      • GitHub