M111: Big Data Management

Summary

This class is looking into recent topics around the principles and systems of Big Data Management and Data Science. We will be discussing topics such as: the Map-Reduce programming models and systems such as Hadoop, HBase using Hive/Pig. The HDFS storage file system. The Spark and Tensorflow platforms. Message-passing and stream processing systems (e.g., Kafka and Samza). Key value stores. Similar object detection (similarity search, locality sensitive hashing). Large-scale link analysis techniques (PageRank, Hubs & Authorities). Clustering. Recommender Systems. Computational Advertising. The class is structured around the presentation of recent research topics in these areas as well as practical implementation of several of the topics in the class.

Students will be gaining hands-on experience on real Big Data systems, services and applications through a set of exercises and labs.

Course Information

  • Fall Semester 2023
  • Class: Friday 11:00-15:00
  • Instructor: Alexandros Ntoulas, Office hours: Friday 10:00-11:00,
    antoulas -*at*- di -*dot*- uoa +*dot*+ gr

Announcements

  • 06/10: Please join eclass to see announcements.



References

There is no textbook required for the class. We will be studying materials from a number of different sources.

Syllabus & Schedule

Date Topic Assignments
Fri, Oct 6 Course description & logistics
Introduction to Big Data & Data Science