Register now for our CAO Open Day.

Module Overview

Data Exploration

The Data Exploration module is designed for students with little or limited prior knowledge of data processing and exploratory data analysis. It addresses the initial steps a data analyst takes in understanding, building, cleaning, and transforming a dataset. Through this module, the student will develop the knowledge and skillset to manipulate data in dataframes, visualise the data, assess the quality of the data and use advanced exploratory methods such as clustering, correlation and association rules to better understand the data and prepare it for analysis and processing.

Module Code

DATA 5002

ECTS Credits

10

*Curricular information is subject to change

Introduction to Data Exploration

  • Purpose of data exploration
  • Data types and sources
  • Intro to tools and software

The Exploratory Data Analysis (EDA) process

  • Structure of the EDA process
  • EDA best practices
  • Review

Data Collection

  • Survey design
  • Basic cleaning - missing values

Data manipulation

  • Creating data frames
  • Subsetting data
  • Applying functions to data structures
  • Method chaining
  • Reshaping data
  • Combining datasets
  • Grouping data
  • Windows
  • Normalisation
  • Binning
  • Sampling

Data cleaning

  • Identifying and removing duplicated data
  • Handling outliers: categorical, ordinal, interval data
  • Handling missing values: imputation
  • Dealing with imbalanced data

Data Quality Reports

  • Measuring data quality
  • Components of an effective data quality report
  • Creating a data quality report
  • Interpreting a data quality report

Advanced data exploration techniques

  • Clustering
  • Correlation
  • Association rules

Time-Series Data

  • Components of time-series data
  • Identifying trends, seasonality and noise
  • Decomposing time-series data

Matrix Data

  • Array manipulation (reshaping, flattening, concatenating, splitting arrays)
  • Broadcasting
  • Working with Multi-dimensional arrays
     

This module will be delivered using lectures, practical laboratory exercises, and assignments. There will be a strong practical element.

The module's co-requisite module on Programming for Data Analytics will provide students with the knowledge of programming that will be required to support the development of the report for this module.

Module Content & Assessment
Assessment Breakdown %
Other Assessment(s)100