JSONiq (The SQL of NoSQL!)

A query language for JSON data.

Implementations:

Zorba (http://www.zorba.io/home)

RumbleDB (https://rumbledb.org/)

We will use RumbleDB for all the examples in these notes.

Part I. JSON and JSONiq Data Model

2. JSON Syntax (http://www.json.org)

Strings

Strings are double-quoted.

To put it simply, they are sequences of Unicode characters with absolutely no restriction:

"foo",
"What NoSQL solutions are out there?"
"What \"NoSQL\" solutions are out there?"

The most frequent control characters even have their own shortcuts:

\n (new line)
\t (tab)
\r (carriage return)
\b (backspace)

Numbers

0
9
42
-96
123456789012345678901234567890123456789012345

0.3
9.6
42.2346902834
-96.01345023400

0.3e0
9.6E+24
42.2346902834e-2
-96.01345023400E-02345

Booleans

true
false

Null

null

Objects

e.g. A Stack Overflow question:

{
  "_id" : "511C7C5C9A277C22D138802D",
  "question_id" : 4419499,
  "last_edit_date" : "2012-12-17T00:02:31",
  "creation_date" : "2010-12-11T23:15:19",
  "last_activity_date" : "2012-12-17T00:02:31",
  "score" : 15,
  "accepted_answer_id" : 4421601,
  "title" : "MySQL and NoSQL: Help me to choose the right one",
  "tags" : [ "php", "mysql", "nosql", "cassandra" ],
  "view_count" : 3972,
  "owner" : {
    "user_id" : 279538,
    "display_name" : "cedivad",
    "reputation" : 430,
    "user_type" : "registered",
    "profile_image" : "http://www.gravatar.com/avatar/b77f...",
    "link" : "http://stackoverflow.com/users/279538/cedivad",
    "accept_rate" : 74
  },
  "link" : "http://stackoverflow.com/questions/4419499/mys...",
  "is_answered" : true
}

3. JSONiq Data Model

Items and Sequences

Any JSON value is a JSONiq item

A JSONiq sequence is a list of items separated by commas!

{ "foo" : "bar" }
"foo", 2, true, { "foo", "bar" }, null, [ 1, 2, 3 ]

Sequences are flat and cannot be nested. This makes streaming possible, which is very powerful.

( ("foo", 2), ( (true, 4, null), 6 ) )

is same as

"foo", 2, true, 4, null, 6

The empty sequence is

()

JSONiq classifies the items mentioned above in three categories:

Example of date value:

date("2013-06-21") + dayTimeDuration("P365D")
=
"2014-06-21"

4. JSONiq Type System

let $x as integer := 16
return $x * $x
let $x := 16
return $x * $x

Whenever you do not specify the type of a variable or the type signature of a function, the most general type for any sequence of items, item*, is assumed.

instance of operator

16 instance of integer

should evaluate to true

Item Types

Atomic Types are organized into a hierarchy

16 instance of integer,
16 instance of decimal,
16.6 instance of decimal,
16.6e10 instance of double,
"foo" instance of string,
true instance of boolean,
null instance of null,
"foo" instance of atomic

All of above evaluate to true

Some more atomic types:

date("2013-06-18") instance of date,
dateTime("2013-06-21T05:00:00Z") instance of dateTime,
time("05:00:00") instance of time,
long("1234567890123") instance of long

All of above evaluate to true

Object Types and Array Types

{ "foo" : "bar" } instance of object,
{ "foo" : "bar" } instance of json-item,
{} instance of object,
[ 1, 2, 3, 4 ] instance of array,
[ 1, 2, 3, 4 ] instance of json-item

All of above evaluate to true

The Most General Type: item

{ "foo" : "bar" } instance of item,
[ 1, 2, 3, 4 ] instance of item,
"foo" instance of item,
42 instance of item,
false instance of item,
null instance of item

All of above evaluate to true

Sequence Type

All sequence types match the type item*

{ "foo" : "bar" } instance of item*,
() instance of item*,
([ 1, 2, 3 ], 2, { "foo" : "bar" }, 4) instance of item*

( { "foo" : "bar" } , {} ) instance of object*,
() instance of object*,
( [ 1, 2, 3 ] , {} ) instance of json-item+,
[ 1, 2, 3 ] instance of array?,
() instance of array?,
"foo" instance of string

All of above evaluate to true

Empty Sequence Type: ()

() instance of ()

Part II. Construction of Items and JSON Navigation

5. Construction of Items

Any JSON building block is also a well-formed JSONiq query which just returns itself

Atomic Literals

// strings
"foo",
"This is a line\nand this is a new line",
"\u0001",
"This is a nested \"quote\""

// numbers
42,
3.14,
-6.022E23

// leading zeros!
042,
.1415926535,
42.,
+6.022E23

// booleans and null
true,
false,
null

Object Constructors

{},
{ "foo" : "bar" },
{ "foo" : [ 1, 2, 3, 4, 5, 6 ] },
{ "foo" : true, "bar" : false },
{ "this is a key" : { "value" : "a value" } }

Object constructors with unquoted keys

{ foo : "bar" },
{ foo : [ 1, 2, 3, 4, 5, 6 ] },
{ foo : "bar", bar : "foo" },
{ "but you need the quotes here" : null }

Array Constructors

[],
[ 1, 2, 3, 4, 5, 6 ],
[ "foo", [ 3.14, "Go" ], { "foo" : "bar" }, true ]

Composing Constructors

Because JSONiq expressions are fully composable, in objects and arrays constructors, you can put way more than just atomic literals, object constructors and array constructors: you can put any JSONiq expression.

An expression is the JSONiq building block. You already know some (literals, constructors, comma, cast, instance of) and plenty more will be introduced in the next part (arithmetics, logic, comparison, if-then-else, try-catch, FLWORS that allow you to join, select, group, filter, project, stream in windows, …)

[ 1 to 10 ],
[ "foo" || "bar", 1 to 3, 2 + 2 ],
{ "foo" || "bar" : true },
{ "foo" : 1 + 1 },
{ "foo" : (), "bar" : (1, 2) },
{| { "foo" : "bar" }, { "bar" : "foo" } |}

produces the following answer:

[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
[ "foobar", 1, 2, 3, 4 ]
{ "foobar" : true }
{ "foo" : 2 }
{ "foo" : null, "bar" : [ 1, 2 ] }
{ "foo" : "bar", "bar" : "foo" }

6. Collections

The following 3 collections will be used in some of the examples later in these notes.

Consider the following 3 files:

collection1.json

{
  "question": "What NoSQL technology should I use?" 
}

collection-faq.json

{"faqs": [
  { "_id" : "511C7C5C9A277C22D138802D",
    "question_id" : 4419499,
    "last_edit_date" : "2012-12-17T00:02:31",
    "creation_date" : "2010-12-11T23:15:19",
    "last_activity_date" : "2012-12-17T00:02:31",
    "score" : 15,
    "accepted_answer_id" : 4421601,
    "title" : "MySQL and NoSQL: Help me to choose the right one",
    "tags" : [ "php", "mysql", "nosql", "cassandra" ],
    "view_count" : 3972,
    "owner" : {
      "user_id" : 279538,
      "display_name" : "cedivad",
      "reputation" : 430,
      "user_type" : "registered",
      "profile_image" : "http://www.gravatar.com/avatar/b77fadd2ba791",
      "link" : "http://stackoverflow.com/users/279538/cedivad",
      "accept_rate" : 74
    },
    "link" : "http://stackoverflow.com/questions/4419499/mysql-and-nosql-help-me-to-choose-the-right-one",
    "is_answered" : true
  },
  {
    "_id" : "511C7C5C9A277C22D138802F",
    "question_id" : 282783,
    "last_edit_date" : "2012-04-30T22:43:02",
    "creation_date" : "2008-11-12T02:02:42",
    "last_activity_date" : "2012-04-30T22:43:02",
    "score" : 42,
    "accepted_answer_id" : 282813,
    "title" : "The Next-gen Databases",
    "tags" : [ "sql", "database", "nosql", "non-relational-database" ],
    "view_count" : 5266,
    "owner" : {
      "user_id" : 3932,
      "display_name" : "Randin",
      "reputation" : 585,
      "user_type" : "registered",
      "profile_image" : "http://www.gravatar.com/avatar/d9d7ba9c17d671d",
      "link" : "http://stackoverflow.com/users/3932/randin",
      "accept_rate" : 100
    },
    "link" : "http://stackoverflow.com/questions/282783/the-next-gen-databases",
    "is_answered" : true
  }
]
}

collection-answers.json

{ "answers": [
  { "_id" : "511C7C5D9A277C22D13880C3",
    "question_id" : 37823,
    "answer_id" : 37841,
    "creation_date" : "2008-09-01T12:14:38",
    "last_activity_date" : "2008-09-01T12:14:38",
    "score" : 7,
    "is_accepted" : false,
    "owner" : {
      "user_id" : 2562,
      "display_name" : "Ubiguchi",
      "reputation" : 1871,
      "user_type" : "registered",
      "profile_image" : "http://www.gravatar.com/avatar/00b87a917ec763c0c...",
      "link" : "http://stackoverflow.com/users/2562/ubiguchi"
    }
  },
  { "_id" : "511C7C5D9A277C22D13880C4",
    "question_id" : 37823,
    "answer_id" : 37844,
    "creation_date" : "2008-09-01T12:16:40",
    "last_activity_date" : "2008-09-01T12:16:40",
    "score" : 4,
    "is_accepted" : false,
    "owner" : {
      "user_id" : 2974,
      "display_name" : "Rob Wells",
      "reputation" : 17543,
      "user_type" : "registered",
      "profile_image" : "http://www.gravatar.com/avatar/8769281d99f8fe9c2...",
      "link" : "http://stackoverflow.com/users/2974/rob-wells",
      "accept_rate" : 94
    }
  },
  { "_id" : "511C7C5F9A277C22D1388211",
    "question_id" : 4419499,
    "answer_id" : 4419542,
    "creation_date" : "2010-12-11T23:24:21",
    "last_edit_date" : 1292112046,
    "last_activity_date" : "2010-12-12T00:00:46",
    "score" : 17,
    "is_accepted" : false,
    "owner" : {
      "user_id" : 236047,
      "display_name" : "Victor Nicollet",
      "reputation" : 14632,
      "user_type" : "registered",
      "profile_image" : "http://www.gravatar.com/avatar/e083220ac33b47364d3...",
      "link" : "http://stackoverflow.com/users/236047/victor-nicollet",
      "accept_rate" : 95
    }
  },
  { "_id" : "511C7C5F9A277C22D1388212",
    "question_id" : 4419499,
    "answer_id" : 4419578,
    "creation_date" : "2010-12-11T23:30:42",
    "last_activity_date" : "2010-12-11T23:30:42",
    "score" : 1,
    "is_accepted" : false,
    "owner" : {
      "user_id" : 510782,
      "display_name" : "descent89",
      "reputation" : 33,
      "user_type" : "registered",
      "profile_image" : "http://www.gravatar.com/avatar/d15c0949f7e051e9d86...",
      "link" : "http://stackoverflow.com/users/510782/descent89"
    }
  }
]
}

rumbleDB has a function, called json-doc, that can read one JSON value to a variable.

rumbleDB also has a function, called json-file, that can read a sequence of JSON objects in to a sequence variable. This works only if each line in the file contains an object or a value.

7. JSON Navigation

Object Navigation

Very much like Javascript; dot notation.

Object lookup

{
"question" : "What NoSQL technology should I use?"
}.question,
{
"question" : "What NoSQL technology should I use?"
}.answer

produces the following:

What NoSQL technology should I use?

Lookup on a single-object collection

let $oneobject := json-doc("collection1.json")
return $oneobject.question

produces

What NoSQL technology should I use?

Object lookup with an iteration on several objects

({ "foo" : "bar" }, { "foo" : "bar2" } ).foo,
for $x in json-doc("collection-faq.json") return {"ids": $x.faqs[].question_id}

produces

bar
bar2
{ "ids" : [ 4419499, 282783 ] }

Object lookup on non-objects

"foo".foo,
({"question" : "What NoSQL technology should I use?"},
 [ "question", "answer" ],
 { "question" : "answer" },
 "question").question

produces

What NoSQL technology should I use?
answer

Quotes and parentheses for object lookup

{
"my question" : "What NoSQL technology should I use?"
}."my question",
{
"my question" : "What NoSQL technology should I use?"
}.("my " || "question")

produces

What NoSQL technology should I use?
What NoSQL technology should I use?

Object lookup with a variable

let $field := "my " || "question"
return {"my question" : "What technology should I use?"}.$field

produces

What technology should I use?

Array Unboxing

The items in an array (which is an item) can be extracted as a sequence of items with the [] postfix operator.

[ "What NoSQL technology should I use?", "What is the bottleneck in MapReduce?" ][]

produces

What NoSQL technology should I use?
What is the bottleneck in MapReduce?

The following query:

for $x in json-doc("collection-faq.json") return $x.faqs[].tags

produces

[ "php", "mysql", "nosql", "cassandra" ]
[ "sql", "database", "nosql", "non-relational-database" ]

and the following query:

for $x in json-doc("collection-faq.json") return $x.faqs[].tags[]

produces

php
mysql
nosql
cassandra
sql
database
nosql
non-relational-database

Sequence Filtering

Predicate expression for picking an item at a given position

(1 to 10)[5],
("What NoSQL technology should I use?", "What is the bottleneck in MapReduce?")[2]

produces

5
What is the bottleneck in MapReduce?

Predicate expression for filtering

("What NoSQL technology should I use?", "What is the bottleneck in MapReduce?")[contains($$, "NoSQL")],
(1 to 10)[$$ mod 2 eq 0]

produces

What NoSQL technology should I use?
2
4
6
8
10

Notice the use of $$, referred to as the context-item (i.e. the item being processed)

Array Navigation

Use of the combination of unboxing and filtering:

[ "question", "answer" ][][2],
{ questions: [ "What NoSQL technology should I use?", { "faq" : "What is the bottleneck in MapReduce?" }]
}.questions[][2].faq

produces

answer
What is the bottleneck in MapReduce?

Part III. JSONiq Expressions

8. Basic Expressions

Comma Operator (sequence generation)

1, 2, 3, 4, 5,
{ "foo" : "bar" }, [ 1 ],
1 + 1, 2 + 2,
(1, 2, (3, 4), 5)

produces

1
2
3
4
5
{ "foo" : "bar" }
[ 1 ]
2
4
1
2
3
4
5

Range Operator

1 to 10,
10 to 1

produces

1
2
3
4
5
6
7
8
9
10

Parenthesized and Arithmetic Expressions; Date Arithmetic

( 2 + 3 ) * 5,
(),
1 * ( 2 + 3 ) + 7 idiv 2 - (-8) mod 2,
date("2013-05-01") - date("2013-04-02"),
2.3e4 + 5

produces

25
8
P29D
23005

String Operations

"Captain" || " " || "Kirk",
"Captain" || () || "Kirk"

produces

Captain Kirk
CaptainKirk

Comparisons

1 + 1 eq 2,
1 lt 2,
1 eq null,
"foo" ne null,
null eq null,
null lt 1

produces

true
true
false
true
true
true

and

() to 10,
1 to (),
1 + (),
() eq 1,
() ge 10

produces no output!

Logic

{
  "empty-sequence" : boolean(()),
  "null" : boolean(null),
  "non-empty-string" : boolean("foo"),
  "empty-string" : boolean(""),
  "zero" : boolean(0),
  "not-zero" : boolean(1e42)
},
null and "foo"

produces

{ 
  "empty-sequence" : false, 
  "null" : false, 
  "non-empty-string" : true, 
  "empty-string" : false, 
  "zero" : false, 
  "not-zero" : true 
}
false

Propositional Logic

and, or, not

true and ( true or not true ),
1 + 1 eq 2 or not 1 + 1 eq 3

produces

true
true

First Order Logic (Quantified Variables)

every $i in 1 to 10 satisfies $i gt 0,
some $i in -5 to 5, $j in 1 to 10 satisfies $i eq $j
some $i as integer in -5 to 5, $j as integer in 1 to 10 satisfies $i eq $j

produces

true
true
true

Builtin Functions

keys({ "foo" : "bar", "bar" : "foo" }),
concat("foo", "bar"),
sum(1 to 100),
avg(1 to 100),
count( (1 to 100)[ $$ mod 5 eq 0 ] ),
count((1, 2, 3, 4))

produces

foo
bar
foobar
5050
50.5
20
4

9. Control Flow Expressions

Conditional Expressions

if (1 + 1 eq 2)
then { "foo" : "yes" }
else { "foo" : "false" }

produces

{ "foo" : "yes" }

The following sequence containing conditional expressions

if (null) then { "foo" : "yes" } else { "foo" : "no" },
if (1) then { "foo" : "yes" } else { "foo" : "no" },
if (0) then { "foo" : "yes" } else { "foo" : "no" },
if ("foo") then { "foo" : "yes" } else { "foo" : "no" },
if ("") then { "foo" : "yes" } else { "foo" : "no" },
if (()) then { "foo" : "yes" } else { "foo" : "no" },
if (exists(json-doc("collection-faq.json").faqs)) then { "foo" : "yes" } else { "foo" : "no" }

produces

{ "foo" : "no" }
{ "foo" : "yes" }
{ "foo" : "no" }
{ "foo" : "yes" }
{ "foo" : "no" }
{ "foo" : "no" }
{ "foo" : "yes" }

Note: else clause is mandatory! So,

if (1+1 eq 2) then { "foo" : "yes" } else ()

produces { "foo" : "yes" }

Switch Expressions

switch ("foo")
  case "bar" return "foo"
  case "foo" return "bar"
  default    return "none"

produces

bar

and

switch ("no-match")
  case "bar" return "foo"
  case "foo" return "bar"
  default    return "none"

produces

none

and

switch (2)
  case 1 + 1 return "foo"
  case 2 + 2 return "bar"
  default    return "none",
switch (true)
  case 1 + 1 eq 2 return "1 + 1 is 2"
  case 2 + 2 eq 5 return "2 + 2 is 5"
  default         return "none of the above is true"

produces

foo
1 + 1 is 2

Try-Catch Expressions

try { 1 div 0 } catch * { "Caught!" }

produces Caught!

error outside of try-catch

let $x := 1 div 0
return try { $x } catch * { "Caught!" }

produces Division by zero!

try-catch with syntax error

try { x } catch * { "Caught!" }

produces Parser failed.

try-catch with type error

try { "foo" + "bar" } catch * { "Caught!" }

produces Caught!