Multimodal


Resources: Multimodal RAG: Chat with Videos

Cross-Modal Encoder: Bridge Tower

LVLM

Resources: LMM prompting with Gemini

Gemini family consists of:

  • Ultra: largest model for highly complex tasks
  • Pro: best model for general performance across wide range of tasks
  • Flash: lightweight model, opyimized for speed and efficiency
  • Nano: most efficient model for on-device tasks (Model Distillation)



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • curl cookers
  • docker cookers
  • 金刚经:凡所有相,皆是虚妄
  • 道德经:道可道,非常道,名可名,非常名
  • Connect Remote Server To Colab