Data Wrangling and Preparation with Programming

Part of the ModularMaster in Data Science (Healthcare) programme

Data Wrangling and Preparation course provides a comprehensive understanding of the critical stages involved in preparing data for analysis, from sourcing and collection to transformation and cleaning. By mastering the techniques and tools involved in data wrangling, professionals can address data inconsistencies, missing values, and outliers, ensuring data quality and integrity. This process enables them to transform raw, unstructured data into a clean and usable format, making it ready for exploration and analysis. Effective data wrangling and preparation not only save time and effort during the analysis phase but also lay the groundwork for accurate and reliable insights. It allows data professionals to unlock the true potential of datasets, extract meaningful information, and make informed decisions that drive business success.

This course, spanning a duration of five days, is specifically designed to equip participants with comprehensive knowledge and hands-on experience in manipulating and preparing healthcare data for analysis. Over the course of the first four days, participants will learn various techniques to transform and clean healthcare data into a suitable format, addressing data quality issues, and employing effective data cleaning techniques. Participants will acquire expertise in data manipulation, including handling missing values, detecting outliers, and ensuring data consistency within the healthcare domain. They will learn practical methods to cleanse and preprocess healthcare data, ensuring it is well-prepared for further analysis. Participants will be actively involved in a healthcare-related project throughout the module. The final day, which is split into two half-days on separate weeks, will be dedicated to project consultation and project presentation. By the end of the course, participants will have a solid understanding of data wrangling principles and techniques, enabling them to adeptly tackle complex data challenges and effectively prepare datasets for analysis within the healthcare field.


Plan your learning path

This course can be taken as a module on its own or as part of the Graduate Certificate in Data Analytics (Healthcare) or ModularMaster in Data Science (Healthcare).


Course Details

Course Dates:
No available course dates

 
 

Who Should Attend



Catering to healthcare professionals and individuals aspiring to join the healthcare industry, this course is specifically designed to develop essential skills in data preparation and wrangling. It is highly recommended for clinicians, administrators, and managers who require practical, hands-on data wrangling skills in their day-to-day work, as well as individuals aspiring to become data analysts or data scientists with proficiency in data preparation within the healthcare sector.

Prerequisites

  • Participants should preferably have passed mathematics at least ‘O’ Level or equivalent.

  • Participants should be conversant with basic IT skills such as software installation, file management and web navigation.

  • Participants are encouraged to complete the Foundation of Data Science before enrolling in this course.

  • Participants are required to pass a pre-course assessment to ensure participants have the requisite knowledge of Python programming. This assessment can be waived if participants have completed both Fundamentals in Python (Basic) and Fundamentals in Python (Intermediate).

  • Participants are required to bring their laptops.

Programme Outline

Learning Objectives and Structure
  • Perform the basic ML model component for the role of a junior data scientist
  • Understand the types of data and databases in the business context
  • Appreciate the use of data dictionary and harness the potential of metadata for data science
  • Acquire organizational dataset from data lakes and other democratized data sources for data enrichment purposes
  • Structure data into an appropriate form for data analysis
  • Manipulate data structures to support data-wrangling phase
  • Perform data wrangling on the acquired dataset
  • Address data quality issues with appropriate data cleansing technique
  • Iterate the data mining process progressively with the provision of data wrangling and exploratory analysis tools
  • Understand healthcare case studies shared by SingHealth faculty members to gain insights into real-world scenarios.
  • Utilise curated public healthcare datasets to perform hands-on activities and assignments, fostering practical experience and understanding of the subject matter.
Day 1
  • Overview of Data Science Pipeline
  • What is Data Wrangling and Data Preparation?
  • Data Acquisition
  • Understand how data scientist prepares the dataset for data modelling
  • What is data discovery?
  • Types of data
  • Types of databases
  • Data Dictionary and Metadata
  • Data Models
Day 2
  • Data Mining and CRISP-DM
  • Common Computing Infrastructure
  • Interactive Data Exploratory Analysis (IDEA)
  • Basics of Descriptive Statistics
Day 3
  • Breakdown of Data Preparation Phases
  • Dataset Structuring: Data Frame Handling
  • Data Cleaning
Day 4
  • Data Enrichment and alternative sources
  • Data Enrichment: Data Aggregation
  • Data Enrichment: Data Standardisation
Day 5 - Consultation / Project presentation

Project Consultation

Each group of participants will present the progress of their projects and have the opportunity to ask questions and clarify any doubts pertaining to their projects.

Project Presentation

Each group of participants will showcase their work and respond to questions during a Q&A session.

Course Fees and Funding

Full course fee inclusive of prevailing GST

You pay
S$4,905.00

SkillsFuture Course Fee subsidy (70%)

  • For Singapore Citizens < 40 years old 
  • For Permanent Residents

You pay
S$1,471.50

Mid-Career Enhanced Subsidy (90%)

  • For Singapore Citizens ≥ 40 years old

You pay
S$571.50

Enhanced Training Support for SMEs (90%)

  • For SME - Sponsored employees

You pay
S$571.50

The above module fee payable is inclusive of 9% GST. 

Register your interest

What are you interested in?:






 
Subscribe to our mailing list :

*By subscribing to this mailing list, I agree that SUTD may collect, retain and utilise my Personal information, as furnished herein, for SUTD Academy’s communications including programme information, invitation to events, news updates and other related purposes, in accordance with the Personal Data Protection Act 2012.

Instructor

Thia Wei Soon
Instructor, SUTD Academy

Wei Soon has more than ten years of experience working in the manufacturing and IT sectors. He worked as a data scientist using data analytics and machine learning to deliver actionable insights and drive strategic marketing initiatives. In recent years, as a technology consultant, he successfully helped clients to streamline enterprise operations and achieved cost saving through the adoption of robotic process automation.

Wei Soon has a Master of IT in Business Artificial Intelligence from Singapore Management of University and a B.Eng in Mechanical Engineering from Nanyang Technological University. He is proficient with tools such as Tableau, Jupyter, RStudio, MS Visual Studio, Automation Anywhere, UiPath, and programming languages such as Python, R, C#, HTML5, and JavaScript.


 

Narayan Venkataraman
Assistant Director, Data Management & Informatics, Changi General Hospital

Narayan (Nari) is a Data Science and Biomedical professional with more than 22 years of experience in healthcare with diverse portfolio spanning data science, health informatics, data governance, medical technology, clinical quality and operational analytics, patient safety and risk management.
 
He is currently the Assistant Director, Data Management & Informatics at Changi General Hospital, Singapore. Recipient of the Singapore Commendation Medal 2022 for Covid19, he is a member of the CGH Covid19 Taskforce and many strategic committees at CGH and SingHealth (SHS). He has completed many medical projects across the Asia-Pacific region representing Singapore MOH and MFA. He is also an honorary biomed consultant for Smiles Asia and has volunteered for many surgical missions in Asia and Oceania. His current academic interests cover robotic process automation, AI/Machine Learning, data visualisation, risk analytics and enterprise data literacy.


 

 

Gao Yan
Senior Principal Analyst - Machine Learning, Health Services Research, Changi General Hospital

Presently holding the designation of Senior Principal Analyst, Changi General Hospital (CGH), Gao Yan used to work in research, education and finance sectors.
 
Trained in machine learning and computer vision, Gao Yan’s research interests include medical image processing, natural language processing, predictive modelling and data visualization. She obtained both her B.Eng and PhD in Information Systems from Nanyang Technological University (NTU).


 

Policies and Financing Options

SSG Funding Terms and Conditions

Use of Personal Details

In consideration of the subsidy provided by SkillsFuture Singapore Agency (“SSG”) through the SUTD Academy for the Course,
 

I consent to:

The collection, use and disclosure to relevant third parties of my personal data by the SUTD Academy including but not limited to personal particulars, attendance records, assessment/performance records, for the following purposes:

  1. Reporting of national statistics and conducting of holistic continuing education training research and analysis;

  2. Facilitate the conduct of the relevant surveys and audits in relation to the Course;

  3. General administration of the Course including but not limited to processing of the subsidy provided by SSG;

  4. Publicity and marketing of the Course or other Courses to be provided by SSG or SUTD Academy; and

  5. SSG or its Appointed Auditors or Nominated Representatives to directly contact Course Participant to obtain information deemed necessary for the purposes of conducting effectiveness survey or audits in relation to the Course.
     

I agree to:

  1. Attend and complete all lectures, class exercises, workshops and assessments;

  2. Complete the Course feedback at the end of the Course;

  3. Complete the post Course survey sent about 3 to 6 months after class attendance; and

  4. Sign up for a personal email account.

SUTD Privacy Statement

For more information on SUTD's privacy statement, please visit https://sutd.edu.sg/Privacy-Statement.

SUTD Terms and Conditions

Methods of Payment

Learn more about the available payment modes.

Cancellation & Refund Policy

  1. If a written notification is sent to sutd_academy@sutd.edu.sg within 24 hours after course registration deadline there will be no cancellation charges. A full refund will be made. 

  2. No refund is provided if written notification is more than 24 hours after course registration deadline. SUTD Academy reserves the rights to collect the full fee amount from the participant.

Replacement Policy

Companies may replace participants who have signed up for the course by giving a 3-working day notice before the course commencement date to sutd_academy@sutd.edu.sg. Terms and conditions apply.

Registration Policy

  1. Course may be cancelled due to insufficient participants. SUTD Academy will not be responsible or liable in any way for any claims, damages, losses, expenses, costs or liabilities whatsoever (including, without limitation, any direct or indirect damages for loss of profits, business interruption or loss of information) resulting or arising directly or indirectly from any course cancellation.

  2. Course enrolment is based on a first-come, first-served basis.

  3. SUTD Academy reserves the right to change or cancel any course or instructor due to unforeseen circumstances. 

Types of Funding

Funding under Mid-Career Enhanced Subsidy ("MCES")

  1. MCES is an enhanced Subsidy to encourage mid-career individuals to upskill and reskill, thereby helping them to remain competitive and resilient in the job market. With this, all Singaporeans aged 40 and above will receive higher subsidies of up to 90% course fee subsidy for SSG-funded certifiable courses.

  2. Individuals/employers are not required to submit an application for the MCES. Those pursuing SSG-funded programmes will be charged the appropriate subsidised fees by SUTD Academy if they are eligible MCES. Individuals/employers will only need to pay the nett fee (full course fee after SSG's grant).

    For more info, please visit SkillsFuture website at https://www.skillsfuture.gov.sg/enhancedsubsidy

Funding under Enhanced Training Support for SMEs ("ETSS")

  1. ETSS is an enhanced funding to enable SMEs to send their employees for training.

  2. SMEs will enjoy subsidies of up to 90% of the course fees when they sponsor their employees for SSG-funded certifiable courses.

  3. In addition to higher course fee funding, SMEs can also claim absentee payroll funding of 80% of basic hourly salary at a higher cap of $7.50 per hour. SMEs may apply for the absentee payroll via the SkillsConnect system.

  4. To qualify, SMEs must meet all of the following criteria:
    - Organisation must be registered or incorporated in Singapore
    - Employment size of not more than 200 or with annual sales turnover of not more than $100 million
    - Trainees must be hired in accordance with the Employment Act and fully sponsored by their employers for the course
    - Trainees must be Singapore Citizens or Singapore Permanent Residents

    For more info, please visit SSG website at https://www.ssg.gov.sg/programmes-and-initiatives/funding/enhanced-training-support-for-smes1.html


Funding under Union Training Assistance Programme ("UTAP")

UTAP is a training benefit for NTUC members to defray their cost of training. This benefit is to encourage more union members to go for skills upgrading.

NTUC members enjoy 50% unfunded course fee support for up to $250 each year when you sign up for courses supported under UTAP (Union Training Assistance Programme).

For more info, please visit https://e2i.com.sg/individuals/ntuc-education-and-training-fund/.