Register now for our CAO Open Day.

Module Overview

Data Mining

Data mining is typically used to describe the entire life-cycle of deploying advanced analytical solutions in various domains. This module will take students through a typical life cycle (for example CRISP-DM), exploring each of the stages and what tasks and technologies are needed for particular domains and case studies across various disciplines. All data mining projects start with a business objective or question and then use various data discovery techniques and algorithms to find patterns in complex data in domains as diverse as biology, chemistry and health sciences. Building upon these patterns, solutions can be built and deployed in various parts of the organisation. Some of these solutions can be deployed for drug discovery, pollution monitoring, genomics and proteomics, evolutionary biology, epidemiology, medical diagnosis and personalized medicine. As with most life-cycles, the data mining process is iterative and challenges around this will be explored and how various solutions can be used for the aforementioned domains.

Module Code

DATA 5001

ECTS Credits

10

*Curricular information is subject to change

Data Mining Overview

Introduction to data mining and applications of data mining. Data, Information and Knowledge. Framing a case study for various disciplines. How Data Mining fits within the organisations.

Data Mining Life Cycle

Stages of a Data Mining (DM) project. Explore various data mining life-cycles. Evolving nature of roles and responsibilities of people involved in data mining projects.

Preprocessing in Data Mining

Data transformations, Data sampling, Data aggregation and Feature engineering

Supervised Data Mining Techniques

Classification and Regression

Unsupervised Data Mining Techniques

Clustering and Association Rule Mining

Model Evaluation and Selection

Understanding and evaluating the model outputs and determine what to use.

Unstructured Data

Unstructured data, text representations, text analysis (e.g. topic modelling, named entity recognition), generative text models.

Explainability approaches in Data mining

Understanding and explaining the outputs from Data Mining models for better decision making.

Other data mining techniques within various domain contexts

Exploring various data mining techniques in different context such as biology (sequence analysis, gene expression analysis, protein structure prediction etc.), health (human activity recognition) etc.

Deploying Data Mining Solutions

Issues around deployment of data mining solutions, Combining multiple algorithms and models, Creating pipelines for deployment, Various deployment architectures including, API, Docker, Function as a Service nd Model management and when to retrain models and solutions

Topics on the Management of the Data Mining Process and Life-Cycle

Ethical Issues, Biases in data, Using and managing different technologies

Lectures, tutorials and computer laboratory sessions

Classes are for 4 hours each week. There is no specific division of lecture/lab time; it will mainly follow the 2 hours of lectures with case study demonstrations and 2 hours of self-directed labs. The weekly 4-hour period is used for a combination of lectures, labs, and exercises. The allocation of time varies from week to week and is dependent on the topics being covered.

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)100