BD14 Graph Database


Graph Databases don’t like shards.

Issues

  • if using RDMS: the joins are really expensive, not efficient for relations. We often do traversal or reverse traversal in relations.
  • To solve that, we use index-free adjacency

Ingredients

nodes, edges, properties, labels

Graph databases

  • property graph: Neo4j
  • triple stores(RDF): subject, property, object.

Abbreviation

IRI: international resource identifier

Syntax

RDF/XML

  • subject <rdf:Description rdf:about="subject-IRI">
  • property: within the subject tags: <rdf:...> <geo:property1> ...</geo:property1></rdf:...>
  • object: within the property tags: <geo:property1 rdf:resource="object-IRI"/> or <geo:property2>object-value<geo:property2>

    JSON-LD

    {"@id":subject, "rdf:type":subject-type,"property1":"object1","property2":"object2"}

    Turtle

    @prefix sub:IRI. @prefix object:IRI. @prefix prop:IRI sub:self prop:sub-prop object:sub-obj ...

Querying

Cypher

Format: (node1)-[:edge-label1]->(node2)-[:edge-label2]<-(node3)

  • anchoring a label: (node1:label1)
  • filtering a property: (node1 {prop-key: 'prop-value'}
  • combining: (node1: label1 {prop-key: 'prop-value'}
  • variable repetition: (node1)-[:edge-label1]->(node2)-[:edge-label2]->(node1)
  • variable length path: (alpha)-[*1..4]->(beta)
  • MATCH clause: MATCH (one-query) (WHERE CONDIONS) RETURN gamma
  • CREATE clause: use , to seperate CREATE (),(),()

SPARQL

Architecture

No shards
Master-slave: Master has the entire graph, in the slaves, only have copies. Data replication to avoid data loss(synchronization), to improve performance of scalability(everybody gets to connect to master/slave).

Write

how to guarantee the consistency?

  • write to the master
  • or to a slave. It is blocked until it makes sure that the master is up-to-date.

Hardware

Fixed-size records: serialize the nodes/edges/labels/properties.

  • properties storage: save key-value pairs
  • relationship storage:
    • double links for free iteration
    • from an edge view: a pointer to the source(target) node, a pointer to the s/t-previous edge, a pointer to the s/t-next edge.
  • typical size:
    • node: 9 bytes
    • relationship: 33 bytes
    • relationship name: 5 bytes
    • property: 33 bytes

Further Reading

Graph Databases, 2nd Edition Chapter 1, 2, 3, 4, 6,


Author: Fululu
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source Fululu !
  TOC