Supported Models in Marqo
Marqo supports a variety of different models which you can find below. If you're looking for more information on the models supported in Marqo, please see List of Supported Models.
Text
The following models are supported by default (and primarily based on the excellent SBERT and Hugging Face libraries and models).
- sentence-transformers/all-MiniLM-L6-v1
- sentence-transformers/all-MiniLM-L6-v2
- sentence-transformers/all-MiniLM-L12-v2
- sentence-transformers/all-mpnet-base-v1
- sentence-transformers/all-mpnet-base-v2
- sentence-transformers/stsb-xlm-r-multilingual
- flax-sentence-embeddings/all_datasets_v3_MiniLM-L12
- flax-sentence-embeddings/all_datasets_v3_MiniLM-L6
- flax-sentence-embeddings/all_datasets_v4_MiniLM-L12
- flax-sentence-embeddings/all_datasets_v4_MiniLM-L6
- flax-sentence-embeddings/all_datasets_v3_mpnet-base
- flax-sentence-embeddings/all_datasets_v4_mpnet-base
- hf/e5-small
- hf/e5-base
- hf/e5-large
- hf/e5-small-unsupervised
- hf/e5-base-unsupervised
- hf/e5-large-unsupervised
- hf/e5-small-v2
- hf/e5-base-v2
- hf/e5-large-v2
- hf/bge-small-en-v1.5
- hf/bge-base-en-v1.5
- hf/bge-large-en-v1.5
- hf/bge-small-zh-v1.5
- hf/bge-base-zh-v1.5
- hf/bge-large-zh-v1.5
- hf/multilingual-e5-small
- hf/multilingual-e5-base
- hf/multilingual-e5-large
- hf/multilingual-e5-large-instruct
- hf/GIST-large-Embedding-v0
- hf/snowflake-arctic-embed-m
- hf/snowflake-arctic-embed-m-v1.5
- hf/snowflake-arctic-embed-l
- hf/ember-v1
Images
The models that are used for embedding images come from CLIP. We support two implementations: one from OpenAI, and one open source implementation called OpenCLIP. The following models are supported;
OpenAI
- RN50
- RN101
- RN50x4
- RN50x16
- RN50x64
- ViT-B/32
- ViT-B/16
- ViT-L/14
- ViT-L/14@336px
OpenCLIP
- open_clip/RN101-quickgelu/openai
- open_clip/RN101-quickgelu/yfcc15m
- open_clip/RN101/openai
- open_clip/RN101/yfcc15m
- open_clip/RN50-quickgelu/cc12m
- open_clip/RN50-quickgelu/openai
- open_clip/RN50-quickgelu/yfcc15m
- open_clip/RN50/cc12m
- open_clip/RN50/openai
- open_clip/RN50/yfcc15m
- open_clip/RN50x16/openai
- open_clip/RN50x4/openai
-
open_clip/RN50x64/openai
-
open_clip/ViT-B-16-plus-240/laion400m_e31
- open_clip/ViT-B-16-plus-240/laion400m_e32
- open_clip/ViT-B-16/laion2b_s34b_b88k
- open_clip/ViT-B-16/laion400m_e31
- open_clip/ViT-B-16/laion400m_e32
- open_clip/ViT-B-16/openai
- open_clip/ViT-B-16-SigLIP/webli
- open_clip/ViT-B-16-SigLIP-256/webli
- open_clip/ViT-B-16-SigLIP-384/webli
- open_clip/ViT-B-16-SigLIP-512/webli
- open_clip/ViT-B-16-quickgelu/metaclip_fullcc
- open_clip/ViT-B-32-quickgelu/laion400m_e31
- open_clip/ViT-B-32-quickgelu/laion400m_e32
- open_clip/ViT-B-32-quickgelu/openai
- open_clip/ViT-B-32/laion2b_e16
- open_clip/ViT-B-32/laion2b_s34b_b79k
- open_clip/ViT-B-32/laion400m_e31
- open_clip/ViT-B-32/laion400m_e32
- open_clip/ViT-B-32/openai
-
open_clip/ViT-B-32-256/datacomp_s34b_b86k
-
open_clip/ViT-H-14/laion2b_s32b_b79k
- open_clip/ViT-H-14-quickgelu/dfn5b
-
open_clip/ViT-H-14-378-quickgelu/dfn5b
-
open_clip/ViT-L-14-336/openai
- open_clip/ViT-L-14/laion2b_s32b_b82k
- open_clip/ViT-L-14/laion400m_e31
- open_clip/ViT-L-14/laion400m_e32
- open_clip/ViT-L-14/openai
- open_clip/ViT-L-14-quickgelu/dfn2b
- open_clip/ViT-L-14-CLIPA-336/datacomp1b
- open_clip/ViT-L-16-SigLIP-256/webli
-
open_clip/ViT-L-16-SigLIP-384/webli
-
open_clip/ViT-bigG-14/laion2b_s39b_b160k
- open_clip/ViT-g-14/laion2b_s12b_b42k
- open_clip/ViT-g-14/laion2b_s34b_b88k
-
open_clip/ViT-SO400M-14-SigLIP-384/webli
-
open_clip/coca_ViT-B-32/laion2b_s13b_b90k
- open_clip/coca_ViT-B-32/mscoco_finetuned_laion2b_s13b_b90k
- open_clip/coca_ViT-L-14/laion2b_s13b_b90k
-
open_clip/coca_ViT-L-14/mscoco_finetuned_laion2b_s13b_b90k
-
open_clip/convnext_base/laion400m_s13b_b51k
- open_clip/convnext_base_w/laion2b_s13b_b82k
- open_clip/convnext_base_w/laion2b_s13b_b82k_augreg
- open_clip/convnext_base_w/laion_aesthetic_s13b_b82k
- open_clip/convnext_base_w_320/laion_aesthetic_s13b_b82k
- open_clip/convnext_base_w_320/laion_aesthetic_s13b_b82k_augreg
- open_clip/convnext_large_d/laion2b_s26b_b102k_augreg
- open_clip/convnext_large_d_320/laion2b_s29b_b131k_ft
- open_clip/convnext_large_d_320/laion2b_s29b_b131k_ft_soup
- open_clip/convnext_xxlarge/laion2b_s34b_b82k_augreg
- open_clip/convnext_xxlarge/laion2b_s34b_b82k_augreg_rewind
-
open_clip/convnext_xxlarge/laion2b_s34b_b82k_augreg_soup
-
open_clip/roberta-ViT-B-32/laion2b_s12b_b32k
- open_clip/xlm-roberta-base-ViT-B-32/laion5b_s13b_b90k
-
open_clip/xlm-roberta-large-ViT-H-14/frozen_laion5b_s13b_b90k
-
open_clip/EVA02-L-14-336/merged2b_s6b_b61k
- open_clip/EVA02-L-14/merged2b_s4b_b131k
- open_clip/EVA02-B-16/merged2b_s8b_b131k
Multilingual CLIP
Marqo supports multilingual CLIP models that are trained on more than 100 languages, provided by the Multilingual-CLIP project. You can use the following models and achieve multimodal search in your preferred language:
- multilingual-clip/XLM-Roberta-Large-Vit-L-14
- multilingual-clip/XLM-R Large Vit-B/16+
- multilingual-clip/XLM-Roberta-Large-Vit-B-32
- multilingual-clip/LABSE-Vit-L-14
Bring Your Own Model
Most customers can get great performance from publicly available CLIP models. However, some use cases will benefit even more from a model fine-tuned for their domain specific task.
In these cases, you should use your own model with fine-tuned weights and parameters. It is very convenient to incorporate your own model in Marqo if it belongs to one of the following frameworks:
To find out more information on bringing your own model to Marqo, visit Bring Your Own Model.