Use case
- Write-intensive: highly normalized: avoid update anomalies!
- Read-intensive: highly denormalized: avoid joins!
Denormalizing road
- relational database: homogeneous collection of flat items
- document store(semi-structured): heterogeneous collection of arborescent items.
Syntax
One syntax defines a language. If a sentence is well-formed, it belongs to this language. Otherwise, it doesn’t.
XML
Entities
- element, defined by
<element-name>...</element-name>or<element-name/> - attribute:
- only within the brackets, defined like
<a attr="value"/>. - And only attributes can appear inside opening element tag.
- two attributes cannot have the same name within a single element.
- only within the brackets, defined like
- text: only between the element tags, e.g.
<a>this is text</a>. cannot appear outside of elements. - comment:
<!-- This is a comment --> - processing instructoin: defined by
<? ...?>,e.g.<?xml vesion="1.0"?> - CDATA sections: appeared like the text, the format is
<![CDATA[ ... ]]> - Document Type:
<!DOCTYPE document[(internal subset)]> - (Internal) Entity declarations:
<!ENTITY name "value">, using&name;to get the value. - External parsed entities: conditions are relaxed ???: text at top level, multiple elements.. Defined method:
<!ENTITY name SYSTEM "path"> - ???External unparsed entities: allows an entity to appear as the attribute’s value??
Entity References
Five: <: <, >: >, ': ‘, ": “, &: &,
Character References
two formats: hex, beginning with &#x and dec, beginning with $#
Norms
XML valid name
a-z,A-Z, :, _are allowed anywhere in a name0-9, -, .are allowed but not at start.- other ASCII characters are not allowed in XML names.
Whitespaces
whitespace matters!
space: #x20;; tabs: #x9;; carriage return: #xD;; newlines: #xA;
Carriage return is automatically replaced with newline before parsing.
if in an element declaring that <item xml:space="preserve">: the space before its sub-elements will not be ignored. By default, such whitespaces are ignored.
Namespace
- prefix: anything, can be omitted.
- local name
- namespace, scope: this and within this element
usingxmlns:prefix-name:namespaceto define a prefix, and byprefix:local-nameto use QName.
If the prefix is omitted, it will define a default namespace for this element (and its sub-elements).
If no namespace is defined, all the elements are in no namespace. That means there is no default namespace without definition.
Unprefixed attributes are in no namespace even if there is a default namespace in scope.
Best practice
- put all namespace bindings in root element
- Try to keep the same bindings across all documents whenever possible!
- use parsimoniously
JSON
Elements
- string
- number
- null: Null is invalid.
- boolean: true, false
- array
- object
Well-formedness
Others
XHTML, YAML (the “Python of JSON”), CSV(comma separated values)
Further Reading
XML in a Nutshell Chapter 1, 2, 4, 9
Thanks Alessandro Stolfo for reading notes