I took the course Big Data in the autumen semester 2018, opened by Dr. Ghislain Fourny, Department Information ETH Zurich.
Content
From the course introduction
This course gives an overview of database technologies and of the most important database design principles that lay the foundations of the Big Data universe. The material is organized along three axes: data in the large, data in the small, data in the very small. A broad range of aspects is covered with a focus on how they fit all together in the big picture of the Big Data ecosystem.
- physical storage: distributed file systems HDFS, object storage S3, key - value stores
- logical storage: document stores MongoDB, column stores HBase, graph databases neo4j, data warehouses ROLAP
- data formats and syntaxes XML, JSON, RDF, Turtle, CSV, XBRL, YAML, protocol buffers, Avro
- data shapes and models: tables, trees, graphs, cubes
- type systems and schemas: atomic types, structured types :arrays, maps, set - based type systems ?, * , +
- an overview of functional, declarative programming languages across data shapes SQL, XQuery, JSONiq, Cypher, MDX
- the most important query paradigms: selection, projection, joining, grouping, ordering, windowing
- paradigms for parallel processing, two-stage MapReduce and DAG-based Spark
- resource management YARN
- what a data center is made of and why it matters: racks, nodes, …
- underlying architectures: internal machinery of HDFS, HBase, Spark, neo4j
- optimization techniques: functional and declarative paradigms, query plans, rewrites, indexing
- applications.
I happened to found all the course videos are avaiable on YouTube. Take it for free!
Experience
From my own experience, this course is very system-style. I didn’t have a strong background on this and made a lot of efforts to learn this course. The lecturer gives many recommended readings almost for each lesson. Reading them benefited me a lot while reading all of them was impossible 😂
I gonna share my course notes here as well as extra reading notes. Enjoy!