This class is looking into recent topics around the principles and systems of Big Data Management and Data Science. We will be discussing topics such as: the Map-Reduce programming models and systems such as Hadoop, HBase using Hive/Pig. The HDFS storage file system. The Spark and Tensorflow platforms. Message-passing and stream processing systems (e.g., Kafka and Samza). Key value stores. Similar object detection (similarity search, locality sensitive hashing). Large-scale link analysis techniques (PageRank, Hubs & Authorities). Clustering. Recommender Systems. Computational Advertising. The class is structured around the presentation of recent research topics in these areas as well as practical implementation of several of the topics in the class. Students will be gaining hands-on experience on real Big Data systems, services, and applications through a set of exercises and labs.
INSTRUCTOR
COURSE DESCRIPTION: