Skip to content

Supported Models in Marqo

Marqo supports a variety of different models which you can find below. If you're looking for more information on the models supported in Marqo, please see List of Supported Models.

Text

The following models are supported by default (and primarily based on the excellent SBERT and Hugging Face libraries and models).

Images

The models that are used for tensorizing images come from CLIP. We support two implementations, one from OpenAI, and the other one is an open source implementation called open clip. The following models are supported;

OpenAI

  • RN50
  • RN101
  • RN50x4
  • RN50x16
  • RN50x64
  • ViT-B/32
  • ViT-B/16
  • ViT-L/14
  • ViT-L/14@336px

Open CLIP

  • open_clip/RN101-quickgelu/openai
  • open_clip/RN101-quickgelu/yfcc15m
  • open_clip/RN101/openai
  • open_clip/RN101/yfcc15m
  • open_clip/RN50-quickgelu/cc12m
  • open_clip/RN50-quickgelu/openai
  • open_clip/RN50-quickgelu/yfcc15m
  • open_clip/RN50/cc12m
  • open_clip/RN50/openai
  • open_clip/RN50/yfcc15m
  • open_clip/RN50x16/openai
  • open_clip/RN50x4/openai
  • open_clip/RN50x64/openai

  • open_clip/ViT-B-16-plus-240/laion400m_e31

  • open_clip/ViT-B-16-plus-240/laion400m_e32
  • open_clip/ViT-B-16/laion2b_s34b_b88k
  • open_clip/ViT-B-16/laion400m_e31
  • open_clip/ViT-B-16/laion400m_e32
  • open_clip/ViT-B-16/openai
  • open_clip/ViT-B-16-SigLIP/webli
  • open_clip/ViT-B-16-SigLIP-256/webli
  • open_clip/ViT-B-16-SigLIP-384/webli
  • open_clip/ViT-B-16-SigLIP-512/webli
  • open_clip/ViT-B-16-quickgelu/metaclip_fullcc
  • open_clip/ViT-B-32-quickgelu/laion400m_e31
  • open_clip/ViT-B-32-quickgelu/laion400m_e32
  • open_clip/ViT-B-32-quickgelu/openai
  • open_clip/ViT-B-32/laion2b_e16
  • open_clip/ViT-B-32/laion2b_s34b_b79k
  • open_clip/ViT-B-32/laion400m_e31
  • open_clip/ViT-B-32/laion400m_e32
  • open_clip/ViT-B-32/openai
  • open_clip/ViT-B-32-256/datacomp_s34b_b86k

  • open_clip/ViT-H-14/laion2b_s32b_b79k

  • open_clip/ViT-H-14-quickgelu/dfn5b
  • open_clip/ViT-H-14-378-quickgelu/dfn5b

  • open_clip/ViT-L-14-336/openai

  • open_clip/ViT-L-14/laion2b_s32b_b82k
  • open_clip/ViT-L-14/laion400m_e31
  • open_clip/ViT-L-14/laion400m_e32
  • open_clip/ViT-L-14/openai
  • open_clip/ViT-L-14-quickgelu/dfn2b
  • open_clip/ViT-L-14-CLIPA-336/datacomp1b
  • open_clip/ViT-L-16-SigLIP-256/webli
  • open_clip/ViT-L-16-SigLIP-384/webli

  • open_clip/ViT-bigG-14/laion2b_s39b_b160k

  • open_clip/ViT-g-14/laion2b_s12b_b42k
  • open_clip/ViT-g-14/laion2b_s34b_b88k
  • open_clip/ViT-SO400M-14-SigLIP-384/webli

  • open_clip/coca_ViT-B-32/laion2b_s13b_b90k

  • open_clip/coca_ViT-B-32/mscoco_finetuned_laion2b_s13b_b90k
  • open_clip/coca_ViT-L-14/laion2b_s13b_b90k
  • open_clip/coca_ViT-L-14/mscoco_finetuned_laion2b_s13b_b90k

  • open_clip/convnext_base/laion400m_s13b_b51k

  • open_clip/convnext_base_w/laion2b_s13b_b82k
  • open_clip/convnext_base_w/laion2b_s13b_b82k_augreg
  • open_clip/convnext_base_w/laion_aesthetic_s13b_b82k
  • open_clip/convnext_base_w_320/laion_aesthetic_s13b_b82k
  • open_clip/convnext_base_w_320/laion_aesthetic_s13b_b82k_augreg
  • open_clip/convnext_large_d/laion2b_s26b_b102k_augreg
  • open_clip/convnext_large_d_320/laion2b_s29b_b131k_ft
  • open_clip/convnext_large_d_320/laion2b_s29b_b131k_ft_soup
  • open_clip/convnext_xxlarge/laion2b_s34b_b82k_augreg
  • open_clip/convnext_xxlarge/laion2b_s34b_b82k_augreg_rewind
  • open_clip/convnext_xxlarge/laion2b_s34b_b82k_augreg_soup

  • open_clip/roberta-ViT-B-32/laion2b_s12b_b32k

  • open_clip/xlm-roberta-base-ViT-B-32/laion5b_s13b_b90k
  • open_clip/xlm-roberta-large-ViT-H-14/frozen_laion5b_s13b_b90k

  • open_clip/EVA02-L-14-336/merged2b_s6b_b61k

  • open_clip/EVA02-L-14/merged2b_s4b_b131k
  • open_clip/EVA02-B-16/merged2b_s8b_b131k

Multilingual CLIP

Marqo supports multilingual clips models that are trained on more than 100 languages, provided by this project. You can use the following models and achieve multi-modal search in your preferred language:

  • multilingual-clip/XLM-Roberta-Large-Vit-L-14
  • multilingual-clip/XLM-R Large Vit-B/16+
  • multilingual-clip/XLM-Roberta-Large-Vit-B-32
  • multilingual-clip/LABSE-Vit-L-14

Bring Your Own Model

Most customers can get great performance from publicly available CLIP models. However, some use cases will benefit even more from a model fine-tuned for their domain specific task. In this circumstance, you should use your own model with fine-tuned weights and parameters. It is very convenient to incorporate your own model in Marqo as long as your model belongs to one of the following frameworks:

To find out more information on bringing your own model to Marqo, visit Bring Your Own Model.