Image Search with Localization
Transform your image search capabilities with our step-by-step guide to implementing localisation features using Marqo. Follow along to enhance your project with the power of vector search.
Getting Started
Before diving into the code, let's set up your environment.
-
Clone the Repository
Get the necessary example files by cloning the examples repository. -
Run Marqo
Use Docker to pull and run the Marqo image:docker rm -f marqo docker pull marqoai/marqo:2.0.0 docker run --name marqo -it -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:2.0.0
For more detailed instructions, see the getting started guide.
-
Explore Further
You can view the original code and article for additional context here and here.
Walkthrough
Follow these steps to integrate image search with localisation into your project.
Step 1: Import Libraries and Define Helpers
Start by importing the necessary libraries and defining any helper functions you'll need.
from marqo import Client
import os
import pandas as pd
from utils import download_data
Step 2: Prepare Your Environment
Ensure Marqo is started and ready to use. Follow the instructions in the repository if you haven't already.
Step 3: Data Acquisition
Decide on the source of your data for indexing, either remotely or locally.
use_remote = False
in_docker = True
Step 4: Download and Locate Data
Fetch and prepare your image data, setting up the paths accordingly.
data = pd.read_csv('files.csv', index_col=0)
docker_path = 'http://host.docker.internal:8222/'
local_dir = os.getcwd() + '/images/'
locators = download_data(data=data, download_dir=local_dir, use_remote=use_remote, in_docker=in_docker, docker_path=docker_path)
Step 5: Document Preparation
Organize your images in a format suitable for indexing with Marqo.
documents = [{"image_location": s3_uri, '_id': os.path.basename(s3_uri)} for s3_uri in locators]
Step 6: Index Creation
Initialize the client and configure your indexing settings.
client = Client()
Define index names and settings for image preprocessing.
index_name_prefix = "visual-search"
patch_methods = ["dino/v1", None, "simple"]
model_name = "ViT-B/32"
n_processes = 3
batch_size = 50
delete_index = True
Apply settings and create the index.
settings = {
"treatUrlsAndPointersAsImages": True,
"imagePreprocessing": {
"patchMethod": None
},
"model": None,
"normalizeEmbeddings": True,
}
for patch_method in patch_methods:
suffix = '' if patch_method is None else f"-{patch_method.replace('/', '-')}"
index_name = index_name_prefix + suffix
settings['model'] = model_name
settings['imagePreprocessing']['patchMethod'] = patch_method
if delete_index:
try:
client.index(index_name).delete()
except:
print("index does not exist, cannot delete")
response = client.create_index(index_name, settings_dict=settings)
response = client.index(index_name).add_documents(
documents,
client_batch_size=batch_size,
tensor_fields=['image_location']
)
Full Code
indexing_all_data.py
#####################################################
### STEP 0. Import and define any helper functions
#####################################################
from marqo import Client
import os
import pandas as pd
from utils import download_data
#####################################################
### STEP 1. start Marqo
#####################################################
# Follow the instructions here https://github.com/marqo-ai/marqo
#####################################################
### STEP 2. Get the data for indexing
#####################################################
# this will pull directly from the s3 bucket if True, otherwise it will pull for local indexing
use_remote = False
in_docker = True
data = pd.read_csv("files.csv", index_col=0)
docker_path = "http://host.docker.internal:8222/"
local_dir = os.getcwd() + "/images/"
locators = download_data(
data=data,
download_dir=local_dir,
use_remote=use_remote,
in_docker=in_docker,
docker_path=docker_path,
)
documents = [
{"image_location": s3_uri, "_id": os.path.basename(s3_uri)} for s3_uri in locators
]
# if you have the images locally, see the instructions
# here https://marqo.pages.dev/Advanced-Usage/images/ for the best ways to index
#####################################################
### STEP 3. Create the index(s)
######################################################
client = Client()
# setup the settings so we can comapre the different methods
index_name_prefix = "visual-search"
patch_methods = [
"dino-v1",
None,
"simple",
] # ["dino/v1", "dino/v2", "frcnn", None, "simple"]
model_name = "ViT-B/32"
n_processes = 3
batch_size = 50
# set this to false if you do not want to delete the previous index of the same name
delete_index = True
settings = {
"treatUrlsAndPointersAsImages": True,
"imagePreprocessing": {"patchMethod": None},
"model": None,
"normalizeEmbeddings": True,
}
for patch_method in patch_methods:
suffix = "" if patch_method is None else f"-{patch_method.replace('/', '-')}"
index_name = index_name_prefix + suffix
# update the settings we want to use
settings["model"] = model_name
settings["imagePreprocessing"]["patchMethod"] = patch_method
# optionally delete the index if it exists
if delete_index:
try:
client.index(index_name).delete()
except:
print("index does not exist, cannot delete")
# create the index with our settings
response = client.create_index(index_name, settings_dict=settings)
response = client.index(index_name).add_documents(
documents, client_batch_size=batch_size, tensor_fields=["image_location"]
)