Create Dataset
Create a new dataset in Marqtune by posting to /datasets
. This returns a presigned URL where the dataset file
should be uploaded. When using the py-marqtune client, create_dataset()
automatically handles file upload and
dataset creation. For REST API usage, upload the file to the presigned URL which triggers the dataset validation and
preparation tasks.
There are two primary types of datasets in Marqtune: training and evaluation.
- Training Dataset: Used for training machine learning models. It includes input data and corresponding attributes.
- Evaluation Dataset: Used for evaluating the performance of trained models. It includes query data and expected results to test the model's accuracy and effectiveness.
POST /datasets
Body Parameters
Name | Type | Default value | Description |
---|---|---|---|
datasetType |
String | training |
Required - Valid dataset types are [evaluation, training]. |
dataSchema |
Dictionary | "" |
Required - Mapping of columns to data types. dataSchema defined must match input file schema. |
queryColumn |
String | "" |
Required - if datasetType is evaluation. |
resultColumns |
String | "" |
Required - if datasetType is evaluation. |
imageDownloadHeaders |
String | "" |
Optional - Headers for the image download. Can be used to authenticate the images for download. |
waitForCompletion |
Boolean | True |
Optional[py-marqtune client only] - Instructs the client to continuously wait and poll until the operation is completed. |
The dataset file should be in CSV format and must follow the structure specified in the dataSchema.
Given the following dataSchema:
data_schema = {"my_image": "image_pointer", "my_text": "text", "my_scores": "score"}
my_image,my_text,my_scores
path/to/image1.jpg,"This is a sample text",0.9
path/to/image2.jpg,"Another sample text",0.8
path/to/image3.jpg,"More text",0.95
Example: Training Dataset
from marqtune.client import Client
from marqtune.enums import ModelType, DatasetType, InstanceType
url = "https://marqtune.marqo.ai"
api_key = "{api_key}"
marqtune_client = Client(url=url, api_key=api_key)
data_schema = {
"my_image": "image_pointer",
"my_text": "text",
"my_scores": "score"
}
marqtune_client.create_dataset(
file_path="path_to_file",
dataset_name="dataset_name",
dataset_type=DatasetType.TRAINING,
data_schema=data_schema,
wait_for_completion=True
)
# Create a dataset.
cURL -X POST 'https://marqtune.marqo.ai/datasets' \
-H "Content-Type: application/json" \
-H 'x-api-key: {api_key}' \
-d '{
"datasetType": "evaluation"
"dataSchema":[
{
"my_image": "image_pointer",
"my_text": "text",
"my_scores": "score"
}
]
}
Example: Evaluation Dataset
from marqtune.client import Client
from marqtune.enums import ModelType, DatasetType, InstanceType
url = "https://marqtune.marqo.ai"
api_key = "{api_key}"
marqtune_client = Client(url=url, api_key=api_key)
data_schema = {
"my_image": "image_pointer",
"my_text": "text",
"my_query": "text",
"my_scores": "score" # Optional if datasetType is evaluation
}
query_column = "my_query"
result_columns = [
"my_image_2",
"my_text_2"
]
marqtune_client.create_dataset(
file_path="path_to_file",
dataset_name="dataset_name",
dataset_type=DatasetType.EVALUATION,
data_schema=data_schema,
query_column=query_column,
result_columns=result_columns,
wait_for_completion=True
)
# Create a dataset.
cURL -X POST 'https://marqtune.marqo.ai/datasets' \
-H "Content-Type: application/json" \
-H 'x-api-key: {api_key}' \
-d '{
"datasetType": "evaluation"
"dataSchema":[
{
"my_image": "image_pointer",
"my_text": "text",
"my_query": "text",
"my_scores": "score" # Optional if datasetType is evaluation.
}
],
"queryColumn": "my_query",
"resultColumns":[
"my_image_2",
"my_text_2"
]
}
Response: 202 Accepted
Dataset creation task has been created and is now waiting for file to be uploaded.
{
"statusCode": 202,
"body": {
"uploadUrl": "upload_url",
"datasetId": "datasetId"
}
}
Response: 400 (Bad Request)
Required parameters not present or body is incorrect.
{
"statusCode": 400,
"body": {
"message": "Invalid arguments in request body or query parameters"
}
}
Response: 400 (Invalid Request)
Request path or method is invalid.
{
"statusCode": 400,
"body": {
"message": "Invalid request method"
}
}
Response: 401 (Unauthorised)
Unauthorised. Check your API key and try again.
{
"message": "Unauthorized."
}
Response: 500 (Internal server error)
Internal server error. Check your API key and try again.
{
"message": "Internal server error."
}