The Data Exploration module is designed for students with little or limited prior knowledge of data processing and exploratory data analysis. It addresses the initial steps a data analyst takes in understanding, building, cleaning, and transforming a dataset. Through this module, the student will develop the knowledge and skillset to manipulate data in dataframes, visualise the data, assess the quality of the data and use advanced exploratory methods such as clustering, correlation and association rules to better understand the data and prepare it for analysis and processing.
Introduction to Data Exploration
- Purpose of data exploration
- Data types and sources
- Intro to tools and software
The Exploratory Data Analysis (EDA) process
- Structure of the EDA process
- EDA best practices
- Review
Data Collection
- Survey design
- Basic cleaning - missing values
Data manipulation
- Creating data frames
- Subsetting data
- Applying functions to data structures
- Method chaining
- Reshaping data
- Combining datasets
- Grouping data
- Windows
- Normalisation
- Binning
- Sampling
Data cleaning
- Identifying and removing duplicated data
- Handling outliers: categorical, ordinal, interval data
- Handling missing values: imputation
- Dealing with imbalanced data
Data Quality Reports
- Measuring data quality
- Components of an effective data quality report
- Creating a data quality report
- Interpreting a data quality report
Advanced data exploration techniques
- Clustering
- Correlation
- Association rules
Time-Series Data
- Components of time-series data
- Identifying trends, seasonality and noise
- Decomposing time-series data
Matrix Data
- Array manipulation (reshaping, flattening, concatenating, splitting arrays)
- Broadcasting
- Working with Multi-dimensional arrays
This module will be delivered using lectures, practical laboratory exercises, and assignments. There will be a strong practical element.
The module's co-requisite module on Programming for Data Analytics will provide students with the knowledge of programming that will be required to support the development of the report for this module.
| Module Content & Assessment | |
|---|---|
| Assessment Breakdown | % |
| Other Assessment(s) | 100 |