Collections

Zatabase Collections provide a schema-flexible NoSQL document model layered on top of the relational engine. You can ingest raw JSON documents, define declarative projections to transform and type fields, and then query the projected data using standard SQL.

Creating a Collection

Create a collection with an optional projection that maps JSON fields to typed columns:

curl -s -X POST https://your-project.zatabase.io/v1/collections \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "events",
    "projection": {
      "event_type": "lowercase",
      "user_id": "text",
      "timestamp": "int",
      "amount": "float",
      "metadata": "text"
    }
  }' | jq

Projection Transforms

Projections define how raw JSON field values are transformed and stored in the underlying table. Each field maps to a transform type:

Transform	Description	JSON Input	Stored Value
`text`	Store as-is (string)	`"Hello"`	`"Hello"`
`lowercase`	Lowercase the string	`"HELLO"`	`"hello"`
`int`	Parse as integer	`"42"` or `42`	`42`
`float`	Parse as float	`"3.14"` or `3.14`	`3.14`

Fields present in the JSON but not in the projection are ignored. Fields in the projection but missing from a document are stored as NULL.

Managing Projections

Get a collection’s current projection:

curl -s https://your-project.zatabase.io/v1/collections/events/projection \
  -H "Authorization: Bearer $ZATABASE_TOKEN" | jq

Update a projection (existing data is not retroactively transformed):

curl -s -X PUT https://your-project.zatabase.io/v1/collections/events/projection \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "lowercase",
    "user_id": "text",
    "timestamp": "int",
    "amount": "float",
    "source": "lowercase"
  }' | jq

Listing Collections

curl -s https://your-project.zatabase.io/v1/collections \
  -H "Authorization: Bearer $ZATABASE_TOKEN" | jq

Ingesting Data

The ingest endpoint supports three formats, auto-detected by Content-Type header. All formats support optional gzip compression (add Content-Encoding: gzip).

NDJSON (Newline-Delimited JSON)

Best for streaming large datasets. Each line is an independent JSON document:

curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @- <<'EOF'
{"event_type": "CLICK", "user_id": "u_001", "timestamp": 1709000000, "amount": 0.0}
{"event_type": "PURCHASE", "user_id": "u_002", "timestamp": 1709000060, "amount": 29.99}
{"event_type": "SIGNUP", "user_id": "u_003", "timestamp": 1709000120, "amount": 0.0}
EOF

NDJSON ingestion uses simd-json for high-performance parsing and processes each line as it arrives, keeping memory usage constant regardless of file size.

JSON Array

A single JSON array of objects:

curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[
    {"event_type": "CLICK", "user_id": "u_001", "timestamp": 1709000000, "amount": 0.0},
    {"event_type": "PURCHASE", "user_id": "u_002", "timestamp": 1709000060, "amount": 29.99}
  ]'

Gzip Compressed

Both NDJSON and JSON array formats support gzip compression for reduced transfer size:

# Compress and ingest
gzip -c events.ndjson | curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  -H "Content-Encoding: gzip" \
  --data-binary @-

Ingest Response

The response reports the number of successfully ingested records and any errors:

{
  "inserted": 3,
  "errors": 0,
  "error_details": []
}

If individual records fail (e.g., type conversion errors), the rest of the batch continues. Error details include the line number and error message for each failed record.

Querying Collection Data

Once data is ingested, the projected columns are available via standard SQL:

# Count events by type
curl -s -X POST https://your-project.zatabase.io/v1/sql \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT * FROM events WHERE event_type = '\''click'\'' AND amount > 0"}'

Since projections apply transforms at ingest time, the stored data reflects the transform. For example, with a lowercase projection on event_type, querying WHERE event_type = 'click' matches documents originally ingested as "CLICK".

# Find recent high-value events
curl -s -X POST https://your-project.zatabase.io/v1/sql \
  -H "Authorization: Bearer $ZATABASE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT * FROM events WHERE amount > 10.0 ORDER BY timestamp DESC LIMIT 20"}'

Use Cases

Event ingestion: Stream application events as NDJSON, project relevant fields, query with SQL
Log aggregation: Ingest structured logs, project severity/service/timestamp, analyze with WHERE clauses
Data lake queries: Ingest raw JSON exports from external systems, define projections for the fields you need
ETL pipelines: Use collections as a staging area; ingest raw data, project to typed columns, then query or export

Performance Considerations

NDJSON is preferred for large datasets because it processes line-by-line with constant memory
Batch sizes of 1,000-10,000 records per HTTP request provide optimal throughput
Gzip compression reduces network transfer by 5-10x for typical JSON payloads
Projections are applied at write time, so reads are fast regardless of projection complexity
Zatabase uses simd-json for NDJSON parsing, achieving multi-GB/s parse rates on modern CPUs