Data mining is typically used to describe the entire life-cycle of deploying advanced analytical solutions in various domains. This module will take students through a typical life cycle (for example CRISP-DM), exploring each of the stages and what tasks and technologies are needed for particular domains and case studies across various disciplines. All data mining projects start with a business objective or question and then use various data discovery techniques and algorithms to find patterns in complex data in domains as diverse as biology, chemistry and health sciences. Building upon these patterns, solutions can be built and deployed in various parts of the organisation. Some of these solutions can be deployed for drug discovery, pollution monitoring, genomics and proteomics, evolutionary biology, epidemiology, medical diagnosis and personalized medicine. As with most life-cycles, the data mining process is iterative and challenges around this will be explored and how various solutions can be used for the aforementioned domains.
Data Mining Overview
Introduction to data mining and applications of data mining. Data, Information and Knowledge. Framing a case study for various disciplines. How Data Mining fits within the organisations.
Data Mining Life Cycle
Stages of a Data Mining (DM) project. Explore various data mining life-cycles. Evolving nature of roles and responsibilities of people involved in data mining projects.
Preprocessing in Data Mining
Data transformations, Data sampling, Data aggregation and Feature engineering
Supervised Data Mining Techniques
Classification and Regression
Unsupervised Data Mining Techniques
Clustering and Association Rule Mining
Model Evaluation and Selection
Understanding and evaluating the model outputs and determine what to use.
Unstructured Data
Unstructured data, text representations, text analysis (e.g. topic modelling, named entity recognition), generative text models.
Explainability approaches in Data mining
Understanding and explaining the outputs from Data Mining models for better decision making.
Other data mining techniques within various domain contexts
Exploring various data mining techniques in different context such as biology (sequence analysis, gene expression analysis, protein structure prediction etc.), health (human activity recognition) etc.
Deploying Data Mining Solutions
Issues around deployment of data mining solutions, Combining multiple algorithms and models, Creating pipelines for deployment, Various deployment architectures including, API, Docker, Function as a Service nd Model management and when to retrain models and solutions
Topics on the Management of the Data Mining Process and Life-Cycle
Ethical Issues, Biases in data, Using and managing different technologies
Lectures, tutorials and computer laboratory sessions
Classes are for 4 hours each week. There is no specific division of lecture/lab time; it will mainly follow the 2 hours of lectures with case study demonstrations and 2 hours of self-directed labs. The weekly 4-hour period is used for a combination of lectures, labs, and exercises. The allocation of time varies from week to week and is dependent on the topics being covered.
| Module Content & Assessment | |
|---|---|
| Assessment Breakdown | % |
| Other Assessment(s) | 100 |