Skip to content

Collections

Zatabase Collections provide a schema-flexible NoSQL document model layered on top of the relational engine. You can ingest raw JSON documents, define declarative projections to transform and type fields, and then query the projected data using standard SQL.

Create a collection with an optional projection that maps JSON fields to typed columns:

Terminal window
curl -s -X POST https://your-project.zatabase.io/v1/collections \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "events",
"projection": {
"event_type": "lowercase",
"user_id": "text",
"timestamp": "int",
"amount": "float",
"metadata": "text"
}
}' | jq

Projections define how raw JSON field values are transformed and stored in the underlying table. Each field maps to a transform type:

TransformDescriptionJSON InputStored Value
textStore as-is (string)"Hello""Hello"
lowercaseLowercase the string"HELLO""hello"
intParse as integer"42" or 4242
floatParse as float"3.14" or 3.143.14

Fields present in the JSON but not in the projection are ignored. Fields in the projection but missing from a document are stored as NULL.

Get a collection’s current projection:

Terminal window
curl -s https://your-project.zatabase.io/v1/collections/events/projection \
-H "Authorization: Bearer $ZATABASE_TOKEN" | jq

Update a projection (existing data is not retroactively transformed):

Terminal window
curl -s -X PUT https://your-project.zatabase.io/v1/collections/events/projection \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"event_type": "lowercase",
"user_id": "text",
"timestamp": "int",
"amount": "float",
"source": "lowercase"
}' | jq
Terminal window
curl -s https://your-project.zatabase.io/v1/collections \
-H "Authorization: Bearer $ZATABASE_TOKEN" | jq

The ingest endpoint supports three formats, auto-detected by Content-Type header. All formats support optional gzip compression (add Content-Encoding: gzip).

Best for streaming large datasets. Each line is an independent JSON document:

Terminal window
curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/x-ndjson" \
--data-binary @- <<'EOF'
{"event_type": "CLICK", "user_id": "u_001", "timestamp": 1709000000, "amount": 0.0}
{"event_type": "PURCHASE", "user_id": "u_002", "timestamp": 1709000060, "amount": 29.99}
{"event_type": "SIGNUP", "user_id": "u_003", "timestamp": 1709000120, "amount": 0.0}
EOF

NDJSON ingestion uses simd-json for high-performance parsing and processes each line as it arrives, keeping memory usage constant regardless of file size.

A single JSON array of objects:

Terminal window
curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/json" \
-d '[
{"event_type": "CLICK", "user_id": "u_001", "timestamp": 1709000000, "amount": 0.0},
{"event_type": "PURCHASE", "user_id": "u_002", "timestamp": 1709000060, "amount": 29.99}
]'

Both NDJSON and JSON array formats support gzip compression for reduced transfer size:

Terminal window
# Compress and ingest
gzip -c events.ndjson | curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/x-ndjson" \
-H "Content-Encoding: gzip" \
--data-binary @-

The response reports the number of successfully ingested records and any errors:

{
"inserted": 3,
"errors": 0,
"error_details": []
}

If individual records fail (e.g., type conversion errors), the rest of the batch continues. Error details include the line number and error message for each failed record.

Once data is ingested, the projected columns are available via standard SQL:

Terminal window
# Count events by type
curl -s -X POST https://your-project.zatabase.io/v1/sql \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM events WHERE event_type = '\''click'\'' AND amount > 0"}'

Since projections apply transforms at ingest time, the stored data reflects the transform. For example, with a lowercase projection on event_type, querying WHERE event_type = 'click' matches documents originally ingested as "CLICK".

Terminal window
# Find recent high-value events
curl -s -X POST https://your-project.zatabase.io/v1/sql \
-H "Authorization: Bearer $ZATABASE_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "SELECT * FROM events WHERE amount > 10.0 ORDER BY timestamp DESC LIMIT 20"}'
  • Event ingestion: Stream application events as NDJSON, project relevant fields, query with SQL
  • Log aggregation: Ingest structured logs, project severity/service/timestamp, analyze with WHERE clauses
  • Data lake queries: Ingest raw JSON exports from external systems, define projections for the fields you need
  • ETL pipelines: Use collections as a staging area; ingest raw data, project to typed columns, then query or export
  • NDJSON is preferred for large datasets because it processes line-by-line with constant memory
  • Batch sizes of 1,000-10,000 records per HTTP request provide optimal throughput
  • Gzip compression reduces network transfer by 5-10x for typical JSON payloads
  • Projections are applied at write time, so reads are fast regardless of projection complexity
  • Zatabase uses simd-json for NDJSON parsing, achieving multi-GB/s parse rates on modern CPUs