A query language for JSON data.
Implementations:
Zorba (http://www.zorba.io/home)
RumbleDB (https://rumbledb.org/)
We will use RumbleDB for all the examples in these notes.
Strings are double-quoted.
To put it simply, they are sequences of Unicode characters with absolutely no restriction:
"foo",
"What NoSQL solutions are out there?"
"What \"NoSQL\" solutions are out there?"
The most frequent control characters even have their own shortcuts:
\n (new line)
\t (tab)
\r (carriage return)
\b (backspace)
0
9
42
-96
123456789012345678901234567890123456789012345
0.3
9.6
42.2346902834
-96.01345023400
0.3e0
9.6E+24
42.2346902834e-2
-96.01345023400E-02345
true
false
null
e.g. A Stack Overflow question:
{
"_id" : "511C7C5C9A277C22D138802D",
"question_id" : 4419499,
"last_edit_date" : "2012-12-17T00:02:31",
"creation_date" : "2010-12-11T23:15:19",
"last_activity_date" : "2012-12-17T00:02:31",
"score" : 15,
"accepted_answer_id" : 4421601,
"title" : "MySQL and NoSQL: Help me to choose the right one",
"tags" : [ "php", "mysql", "nosql", "cassandra" ],
"view_count" : 3972,
"owner" : {
"user_id" : 279538,
"display_name" : "cedivad",
"reputation" : 430,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/b77f...",
"link" : "http://stackoverflow.com/users/279538/cedivad",
"accept_rate" : 74
},
"link" : "http://stackoverflow.com/questions/4419499/mys...",
"is_answered" : true
}
Any JSON value is a JSONiq item
A JSONiq sequence is a list of items separated by commas!
{ "foo" : "bar" }
"foo", 2, true, { "foo", "bar" }, null, [ 1, 2, 3 ]
Sequences are flat and cannot be nested. This makes streaming possible, which is very powerful.
( ("foo", 2), ( (true, 4, null), 6 ) )
is same as
"foo", 2, true, 4, null, 6
The empty sequence is
()
JSONiq classifies the items mentioned above in three categories:
Example of date
value:
date("2013-06-21") + dayTimeDuration("P365D")
=
"2014-06-21"
let $x as integer := 16
return $x * $x
let $x := 16
return $x * $x
integerthat was shown above in a query matches singleton sequences of one atomic item of type integer.
Whenever you do not specify the type of a variable or the type signature of a function, the most general type for any sequence of items, item*, is assumed.
16 instance of integer
should evaluate to true
Atomic Types are organized into a hierarchy
16 instance of integer,
16 instance of decimal,
16.6 instance of decimal,
16.6e10 instance of double,
"foo" instance of string,
true instance of boolean,
null instance of null,
"foo" instance of atomic
All of above evaluate to true
Some more atomic types:
date("2013-06-18") instance of date,
dateTime("2013-06-21T05:00:00Z") instance of dateTime,
time("05:00:00") instance of time,
long("1234567890123") instance of long
All of above evaluate to true
{ "foo" : "bar" } instance of object,
{ "foo" : "bar" } instance of json-item,
{} instance of object,
[ 1, 2, 3, 4 ] instance of array,
[ 1, 2, 3, 4 ] instance of json-item
All of above evaluate to true
{ "foo" : "bar" } instance of item,
[ 1, 2, 3, 4 ] instance of item,
"foo" instance of item,
42 instance of item,
false instance of item,
null instance of item
All of above evaluate to true
All sequence types match the type item*
{ "foo" : "bar" } instance of item*,
() instance of item*,
([ 1, 2, 3 ], 2, { "foo" : "bar" }, 4) instance of item*
( { "foo" : "bar" } , {} ) instance of object*,
() instance of object*,
( [ 1, 2, 3 ] , {} ) instance of json-item+,
[ 1, 2, 3 ] instance of array?,
() instance of array?,
"foo" instance of string
All of above evaluate to true
() instance of ()
Any JSON building block is also a well-formed JSONiq query which just returns itself
// strings
"foo",
"This is a line\nand this is a new line",
"\u0001",
"This is a nested \"quote\""
// numbers
42,
3.14,
-6.022E23
// leading zeros!
042,
.1415926535,
42.,
+6.022E23
// booleans and null
true,
false,
null
{},
{ "foo" : "bar" },
{ "foo" : [ 1, 2, 3, 4, 5, 6 ] },
{ "foo" : true, "bar" : false },
{ "this is a key" : { "value" : "a value" } }
{ foo : "bar" },
{ foo : [ 1, 2, 3, 4, 5, 6 ] },
{ foo : "bar", bar : "foo" },
{ "but you need the quotes here" : null }
[],
[ 1, 2, 3, 4, 5, 6 ],
[ "foo", [ 3.14, "Go" ], { "foo" : "bar" }, true ]
Because JSONiq expressions are fully composable, in objects and arrays constructors, you can put way more than just atomic literals, object constructors and array constructors: you can put any JSONiq expression.
An expression is the JSONiq building block. You already know some (literals, constructors, comma, cast, instance of) and plenty more will be introduced in the next part (arithmetics, logic, comparison, if-then-else, try-catch, FLWORS that allow you to join, select, group, filter, project, stream in windows, …)
[ 1 to 10 ],
[ "foo" || "bar", 1 to 3, 2 + 2 ],
{ "foo" || "bar" : true },
{ "foo" : 1 + 1 },
{ "foo" : (), "bar" : (1, 2) },
{| { "foo" : "bar" }, { "bar" : "foo" } |}
produces the following answer:
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
[ "foobar", 1, 2, 3, 4 ]
{ "foobar" : true }
{ "foo" : 2 }
{ "foo" : null, "bar" : [ 1, 2 ] }
{ "foo" : "bar", "bar" : "foo" }
The following 3 collections will be used in some of the examples later in these notes.
Consider the following 3 files:
collection1.json
{
"question": "What NoSQL technology should I use?"
}
collection-faq.json
{"faqs": [
{ "_id" : "511C7C5C9A277C22D138802D",
"question_id" : 4419499,
"last_edit_date" : "2012-12-17T00:02:31",
"creation_date" : "2010-12-11T23:15:19",
"last_activity_date" : "2012-12-17T00:02:31",
"score" : 15,
"accepted_answer_id" : 4421601,
"title" : "MySQL and NoSQL: Help me to choose the right one",
"tags" : [ "php", "mysql", "nosql", "cassandra" ],
"view_count" : 3972,
"owner" : {
"user_id" : 279538,
"display_name" : "cedivad",
"reputation" : 430,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/b77fadd2ba791",
"link" : "http://stackoverflow.com/users/279538/cedivad",
"accept_rate" : 74
},
"link" : "http://stackoverflow.com/questions/4419499/mysql-and-nosql-help-me-to-choose-the-right-one",
"is_answered" : true
},
{
"_id" : "511C7C5C9A277C22D138802F",
"question_id" : 282783,
"last_edit_date" : "2012-04-30T22:43:02",
"creation_date" : "2008-11-12T02:02:42",
"last_activity_date" : "2012-04-30T22:43:02",
"score" : 42,
"accepted_answer_id" : 282813,
"title" : "The Next-gen Databases",
"tags" : [ "sql", "database", "nosql", "non-relational-database" ],
"view_count" : 5266,
"owner" : {
"user_id" : 3932,
"display_name" : "Randin",
"reputation" : 585,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/d9d7ba9c17d671d",
"link" : "http://stackoverflow.com/users/3932/randin",
"accept_rate" : 100
},
"link" : "http://stackoverflow.com/questions/282783/the-next-gen-databases",
"is_answered" : true
}
]
}
collection-answers.json
{ "answers": [
{ "_id" : "511C7C5D9A277C22D13880C3",
"question_id" : 37823,
"answer_id" : 37841,
"creation_date" : "2008-09-01T12:14:38",
"last_activity_date" : "2008-09-01T12:14:38",
"score" : 7,
"is_accepted" : false,
"owner" : {
"user_id" : 2562,
"display_name" : "Ubiguchi",
"reputation" : 1871,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/00b87a917ec763c0c...",
"link" : "http://stackoverflow.com/users/2562/ubiguchi"
}
},
{ "_id" : "511C7C5D9A277C22D13880C4",
"question_id" : 37823,
"answer_id" : 37844,
"creation_date" : "2008-09-01T12:16:40",
"last_activity_date" : "2008-09-01T12:16:40",
"score" : 4,
"is_accepted" : false,
"owner" : {
"user_id" : 2974,
"display_name" : "Rob Wells",
"reputation" : 17543,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/8769281d99f8fe9c2...",
"link" : "http://stackoverflow.com/users/2974/rob-wells",
"accept_rate" : 94
}
},
{ "_id" : "511C7C5F9A277C22D1388211",
"question_id" : 4419499,
"answer_id" : 4419542,
"creation_date" : "2010-12-11T23:24:21",
"last_edit_date" : 1292112046,
"last_activity_date" : "2010-12-12T00:00:46",
"score" : 17,
"is_accepted" : false,
"owner" : {
"user_id" : 236047,
"display_name" : "Victor Nicollet",
"reputation" : 14632,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/e083220ac33b47364d3...",
"link" : "http://stackoverflow.com/users/236047/victor-nicollet",
"accept_rate" : 95
}
},
{ "_id" : "511C7C5F9A277C22D1388212",
"question_id" : 4419499,
"answer_id" : 4419578,
"creation_date" : "2010-12-11T23:30:42",
"last_activity_date" : "2010-12-11T23:30:42",
"score" : 1,
"is_accepted" : false,
"owner" : {
"user_id" : 510782,
"display_name" : "descent89",
"reputation" : 33,
"user_type" : "registered",
"profile_image" : "http://www.gravatar.com/avatar/d15c0949f7e051e9d86...",
"link" : "http://stackoverflow.com/users/510782/descent89"
}
}
]
}
rumbleDB has a function, called json-doc
, that can read one JSON value to a variable.
rumbleDB also has a function, called json-file
, that can read a sequence of JSON objects in to a sequence variable. This works only if each line in the file contains an object or a value.
Very much like Javascript; dot
notation.
Object lookup
{
"question" : "What NoSQL technology should I use?"
}.question,
{
"question" : "What NoSQL technology should I use?"
}.answer
produces the following:
What NoSQL technology should I use?
Lookup on a single-object collection
let $oneobject := json-doc("collection1.json")
return $oneobject.question
produces
What NoSQL technology should I use?
Object lookup with an iteration on several objects
({ "foo" : "bar" }, { "foo" : "bar2" } ).foo,
for $x in json-doc("collection-faq.json") return {"ids": $x.faqs[].question_id}
produces
bar
bar2
{ "ids" : [ 4419499, 282783 ] }
Object lookup on non-objects
"foo".foo,
({"question" : "What NoSQL technology should I use?"},
[ "question", "answer" ],
{ "question" : "answer" },
"question").question
produces
What NoSQL technology should I use?
answer
Quotes and parentheses for object lookup
{
"my question" : "What NoSQL technology should I use?"
}."my question",
{
"my question" : "What NoSQL technology should I use?"
}.("my " || "question")
produces
What NoSQL technology should I use?
What NoSQL technology should I use?
Object lookup with a variable
let $field := "my " || "question"
return {"my question" : "What technology should I use?"}.$field
produces
What technology should I use?
The items in an array (which is an item) can be extracted as a sequence of items with the [] postfix operator.
[ "What NoSQL technology should I use?", "What is the bottleneck in MapReduce?" ][]
produces
What NoSQL technology should I use?
What is the bottleneck in MapReduce?
The following query:
for $x in json-doc("collection-faq.json") return $x.faqs[].tags
produces
[ "php", "mysql", "nosql", "cassandra" ]
[ "sql", "database", "nosql", "non-relational-database" ]
and the following query:
for $x in json-doc("collection-faq.json") return $x.faqs[].tags[]
produces
php
mysql
nosql
cassandra
sql
database
nosql
non-relational-database
Predicate expression for picking an item at a given position
(1 to 10)[5],
("What NoSQL technology should I use?", "What is the bottleneck in MapReduce?")[2]
produces
5
What is the bottleneck in MapReduce?
Predicate expression for filtering
("What NoSQL technology should I use?", "What is the bottleneck in MapReduce?")[contains($$, "NoSQL")],
(1 to 10)[$$ mod 2 eq 0]
produces
What NoSQL technology should I use?
2
4
6
8
10
Notice the use of $$
, referred to as the context-item (i.e. the item being processed)
Use of the combination of unboxing and filtering:
[ "question", "answer" ][][2],
{ questions: [ "What NoSQL technology should I use?", { "faq" : "What is the bottleneck in MapReduce?" }]
}.questions[][2].faq
produces
answer
What is the bottleneck in MapReduce?
1, 2, 3, 4, 5,
{ "foo" : "bar" }, [ 1 ],
1 + 1, 2 + 2,
(1, 2, (3, 4), 5)
produces
1
2
3
4
5
{ "foo" : "bar" }
[ 1 ]
2
4
1
2
3
4
5
1 to 10,
10 to 1
produces
1
2
3
4
5
6
7
8
9
10
( 2 + 3 ) * 5,
(),
1 * ( 2 + 3 ) + 7 idiv 2 - (-8) mod 2,
date("2013-05-01") - date("2013-04-02"),
2.3e4 + 5
produces
25
8
P29D
23005
"Captain" || " " || "Kirk",
"Captain" || () || "Kirk"
produces
Captain Kirk
CaptainKirk
1 + 1 eq 2,
1 lt 2,
1 eq null,
"foo" ne null,
null eq null,
null lt 1
produces
true
true
false
true
true
true
and
() to 10,
1 to (),
1 + (),
() eq 1,
() ge 10
produces no output!
natural:
{
"empty-sequence" : boolean(()),
"null" : boolean(null),
"non-empty-string" : boolean("foo"),
"empty-string" : boolean(""),
"zero" : boolean(0),
"not-zero" : boolean(1e42)
},
null and "foo"
produces
{
"empty-sequence" : false,
"null" : false,
"non-empty-string" : true,
"empty-string" : false,
"zero" : false,
"not-zero" : true
}
false
and, or, not
true and ( true or not true ),
1 + 1 eq 2 or not 1 + 1 eq 3
produces
true
true
every $i in 1 to 10 satisfies $i gt 0,
some $i in -5 to 5, $j in 1 to 10 satisfies $i eq $j
some $i as integer in -5 to 5, $j as integer in 1 to 10 satisfies $i eq $j
produces
true
true
true
keys({ "foo" : "bar", "bar" : "foo" }),
concat("foo", "bar"),
sum(1 to 100),
avg(1 to 100),
count( (1 to 100)[ $$ mod 5 eq 0 ] ),
count((1, 2, 3, 4))
produces
foo
bar
foobar
5050
50.5
20
4
if (1 + 1 eq 2)
then { "foo" : "yes" }
else { "foo" : "false" }
produces
{ "foo" : "yes" }
The following sequence containing conditional expressions
if (null) then { "foo" : "yes" } else { "foo" : "no" },
if (1) then { "foo" : "yes" } else { "foo" : "no" },
if (0) then { "foo" : "yes" } else { "foo" : "no" },
if ("foo") then { "foo" : "yes" } else { "foo" : "no" },
if ("") then { "foo" : "yes" } else { "foo" : "no" },
if (()) then { "foo" : "yes" } else { "foo" : "no" },
if (exists(json-doc("collection-faq.json").faqs)) then { "foo" : "yes" } else { "foo" : "no" }
produces
{ "foo" : "no" }
{ "foo" : "yes" }
{ "foo" : "no" }
{ "foo" : "yes" }
{ "foo" : "no" }
{ "foo" : "no" }
{ "foo" : "yes" }
Note: else
clause is mandatory! So,
if (1+1 eq 2) then { "foo" : "yes" } else ()
produces { "foo" : "yes" }
switch ("foo")
case "bar" return "foo"
case "foo" return "bar"
default return "none"
produces
bar
and
switch ("no-match")
case "bar" return "foo"
case "foo" return "bar"
default return "none"
produces
none
and
switch (2)
case 1 + 1 return "foo"
case 2 + 2 return "bar"
default return "none",
switch (true)
case 1 + 1 eq 2 return "1 + 1 is 2"
case 2 + 2 eq 5 return "2 + 2 is 5"
default return "none of the above is true"
produces
foo
1 + 1 is 2
try { 1 div 0 } catch * { "Caught!" }
produces Caught!
error outside of try-catch
let $x := 1 div 0
return try { $x } catch * { "Caught!" }
produces Division by zero!
try-catch with syntax error
try { x } catch * { "Caught!" }
produces Parser failed.
try-catch with type error
try { "foo" + "bar" } catch * { "Caught!" }
produces Caught!