Multimodal

Created in October 26, 2024

2024 · multimodal · tutorial

Resources: Multimodal RAG: Chat with Videos
- Cross-Modal Encoder: Bridge Tower
- LVLM
Resources: LMM prompting with Gemini

Resources: Multimodal RAG: Chat with Videos

You can check out the code for Multimodal Embeddings, Multimodal Preprocessing, Multimodal Retrieval from Vector Stores, and Large Vision-Language Models (LVLMs).

LVLM

Resources: LMM prompting with Gemini

Gemini family consists of:

Ultra: largest model for highly complex tasks
Pro: best model for general performance across wide range of tasks
Flash: lightweight model, opyimized for speed and efficiency
Nano: most efficient model for on-device tasks (Model Distillation)

Enjoy Reading This Article?

Here are some more articles you might like to read next:

Monte Carlo Tree Search (MCTS)

金刚经:凡所有相，皆是虚妄

道德经:道可道，非常道，名可名，非常名