Big Data Systems (ELL886)

Credit

3.00   (L-T-P:   3-0-0)

Department / Center / School / Unit

Information Technology

Course Contents

Introduction; Hadoop, Map-Reduce, GFS/HDFS, Bigtable/HBASE; Extension of Map- Reduce: iMap-reduce (iterative), incremental map-reduce. SQL and Data-parallel programming, DryadLINQ. Data-flow parallelism vs. message passing. Data locality. Memory hierarchies. Sequential versus random access to secondary storage. NoSQL systems. NewSQL systems. Finding similar items and LSH; Search Technology: link analysis and Page-rank algorithm; Large Scale Graph Processing; Mining Streaming Data and Realtime analytics: Window semantics and window joins. Sampling and approximating aggregates (no joins). Querying histograms. Maintaining histograms of streams. Use of Haar wavelets. Incremental and online query processing: online aggregation.