Skip to content

Query filtering in Marqo

You can use Marqo's query DSL to refine search results. The filters are executed efficiently as a pre-filter across an HNSW graph, if you're using a neural search query (the default query for Marqo).


Filters have several use cases, for example, restricting the results a specific user has access to or creating faceted search interfaces.

Query filters use a syntax to parse and split the provided query string based on operators, such as AND or NOT. The query then analyzes each split text independently before returning matching documents.

Example

In the following example, Marqo's filter query analyzer splits the query into two components, "country:USA" and "state:NY".

results = mq.index("my-first-index").search(
    q="New York", filter_string="country:(United States) OR state:NY"
)
cURL http://localhost:8882/indexes/cities/search
{
  "q": "New York"
  "filter": "country:(United States) OR state:NY"
}

Marqo's query DSL is based on Lucene but with some differences.

  • all term queries must be connected to a field, e.g., city:(New York)

  • efficient range queries are supported for numeric types without manually specifying the type

  • fuzzy/approximate searches are not supported

Let's dig into some of the specifics:

Fields

Marqo's search terms must be fielded. You can search any field by typing the field name followed by a colon ":" and then the term you are looking for.

As an example, let's assume a Marqo index contains two fields, title and text and text is the default field. If you want to find the document entitled "The Right Way" which contains the text "go", you can enter:

title:(The Right Way) AND text:go

Since text is the default field, the field indicator is not required.

Note: The field is only valid for the term that it directly precedes, so the query

title:Do it right

Will only find "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field).

Range Queries

Marqo supports efficient execution of range queries on numeric types.

Numbers from 0..100:

some_numeric:[0 TO 100]

Greater than or equal to 0:

some_numeric:[0 TO *]

IN Queries

Marqo supports the IN operator for restricting a field to a list of values. The value list must be enclosed in parentheses and comma-separated. Values with spaces must be enclosed in parentheses.

Currently IN is only supported for structured indexes. This operator is also only supported for the following field types: text, int, long, array<text>, array<int>, array<long>, and custom_vector. The _id field can also be filtered on with IN.

Text field example:

text_field IN (apple, banana, (wild cherry))

Integer field example:

int_field IN (1, 2, 3)

Boolean Queries

Marqo supports execution of boolean queries, if you reference true or false within the filter then it will be treated as a boolean.

some_bool:true

Boolean Operators

Boolean operators allow terms to be combined through logic operators. Marqo supports AND, "+", OR, NOT and "-" as Boolean operators(Note: Boolean operators must be ALL CAPS).

The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

food:(ice cream) OR type:confectionary

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.

To search for documents that exactly match "ice cream" and "confectionary" use the query:

food:(ice cream) AND type:confectionary

The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.

To search for documents that match a type of "confectionary" but not "ice cream" use the query:

type:confectionary AND NOT food:(ice cream)

Grouping

Lucene supports using parentheses to group clauses to form sub queries. This can be very useful if you want to control the boolean logic for a query.

To search for either type is "confectionary" or food is "ice cream" and sweetness is 10 use the query:

(type:confectionary OR food:(ice cream)) AND sweetness:10

Escaping Special Characters

Marqo supports escaping special characters that are part of the query syntax. The current list special characters are

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

To escape these character use backslash (\) before the character. For example to filter for (1+1):2 in the field myField use the query:

myField:\(1\+1\)\:2

Note that in languages like Python, you also need to escape the backslashes with their own backslashes. So the above filter string looks like the following, in the Python client:

my_index.search(q="what colour are plants?", filter_string="myField:\\(1\\+1\\)\\:2")

You also need to escape characters such as the special characters and spaces in fieldnames:

my\ field:hello

Filtering with array fields

Marqo supports filtering over array fields. This can be useful for usecases such as filtering over document tags. See here for more details.

Filtering on multimodal objects

Marqo supports filtering on multimodal object. The dot notation is used to search fields within a multimodal combination object. For the object with the following structure.

{
   "_id": "article_1",
   "my_combination_field": {
      "my_interior_field": "hello there",
      "img": "https://my-img-store.jpg"
   }
}

You can filter for this document with following filter string:

my_combination_field.my_interior_field:(hello there)

Filtering on custom vector fields

Marqo supports filtering on custom vector fields. This will only work on the given content field. You cannot filter on the vector of a custom_vector field.

{
    "_id": "custom_audio_doc_1",
    "my_custom_vector": {
            "vector": [0.1, 0.2, 0.3...],
            "content": "Singing audio file"
    }
}

You can filter for this document with following filter string:

my_custom_vector:(Singing audio file)