Students taking this module will acquire the computer programming skills necessary to analyse and manipulate big data. Big data in this context refers to datasets that are too large to be handled by the software tools commonly used to analyse and manipulate data within a tolerable elapsed time. Such big data environments involve working with distributed clustered computing environments and are used by companies around the world. The context and challenges for processing large datasets form a core part of this course, such that the student will be able to select the appropriate approaches, tools or methods for big data problems in addition to being able to implement and evaluate solutions using a variety of programming tools and techniques. Students will have the opportunity to explore these technologies in their own computing environment as well as utilising Cloud Computing environments.
Students are not expected to have advanced programming skills in order to take the module but will need to have fundamental knowledge and skills in computer programming, data science and some knowledge of machine learning.
Introduction to programming for big data
What is big data?How is programming for big data different?Examples of different architectures
Advanced Database Optimisations
Advanced Optimisation techniques in Enterprise DatabasesScaling Enterprise DatabasesBig Data/NoSQL Storage methods in Enterprise DatabasesWorking with 100+M records
Distributed programming paradigms
Distributed computing frameworks for Big DataComponents of framework Scaling for Big DataDistributed programming frameworks, tools and data analysis (e.g. Hadoop, Spark, Flink, etc.)Practical application of these various technologies for given problems and case studies
Advanced Big Data Analytics and Machine Learning
Examine various distributed analytics & machine learning languages (e.g. Mahoot, Flink, Spark, etc.)Language architectureData structures Working with different data sources Analysing and processing data
Managing Real Time Data & Streaming
Managing Data PipelinesStreaming Data, Producers, Consumers, Connectors, etcScaling Streaming Architectures
This module will be delivered over one semester, using a mixture of lectures and lab exercises.
Classes will be a mixture of lectures, live demonstrations, practical problems and lab exercises.
| Module Content & Assessment | |
|---|---|
| Assessment Breakdown | % |
| Other Assessment(s) | 100 |