This course focuses on big data platforms and on key algorithmic ideas and methods used to implement them.
Contents
Advanced topics in cloud computing with emphasis on scalable big data platforms employed in cloud computing. Key cloud computing technologies and their algorithmic background.
Main topics are:
- distributed computing,
- Warehouse-Scale Computers,
- fault tolerance in distributed systems,
- distributed file systems,
- distributed batch processing with the MapReduce and the Apache Spark computing frameworks,
- stream processing,
- distributed cloud based databases, and
- Data Warehouse and Lakehouse architectures.
After completing this course you are able to
- list many of the key technologies used in big data processing,
- select suitable methods for solving challenging big data processing tasks using cloud computing technologies, and
- compare the scalability and fault tolerance implications of using the selected methodologies.