Register now for our CAO Open Day.

Module Overview

Programming for Big Data

Students taking this module will acquire the computer programming skills necessary to analyse and manipulate big data. Big data in this context refers to datasets that are too large to be handled by the software tools commonly used to analyse and manipulate data within a tolerable elapsed time. Such big data environments involve working with distributed clustered computing environments and are used by companies around the world.  The context and challenges for processing large datasets form a core part of this course, such that the student will be able to select the appropriate approaches, tools or methods for big data problems in addition to being able to implement and evaluate solutions using a variety of programming tools and techniques. Students will have the opportunity to explore these technologies in their own computing environment as well as utilising Cloud Computing environments.

Students are not expected to have advanced programming skills in order to take the module but will need to have fundamental knowledge and skills in computer programming, data science and some knowledge of machine learning.

Module Code

DATA 5004

ECTS Credits

10

*Curricular information is subject to change

Introduction to programming for big data

What is big data?How is programming for big data different?Examples of different architectures

Advanced Database Optimisations

Advanced Optimisation techniques in Enterprise DatabasesScaling Enterprise DatabasesBig Data/NoSQL Storage methods in Enterprise DatabasesWorking with 100+M records

Distributed programming paradigms

Distributed computing frameworks for Big DataComponents of framework Scaling for Big DataDistributed programming frameworks, tools and data analysis (e.g. Hadoop, Spark, Flink, etc.)Practical application of these various technologies for given problems and case studies

Advanced Big Data Analytics and Machine Learning

Examine various distributed analytics & machine learning languages (e.g. Mahoot, Flink, Spark, etc.)Language architectureData structures Working with different data sources Analysing and processing data

Managing Real Time Data & Streaming

Managing Data PipelinesStreaming Data, Producers, Consumers, Connectors, etcScaling Streaming Architectures

This module will be delivered over one semester, using a mixture of lectures and lab exercises.

Classes will be a mixture of lectures, live demonstrations, practical problems and lab exercises.

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)100