Friday 10 a.m.–10:30 a.m.

Data Cleaning on text to prepare for analysis and machine learning

Ian Ozsvald

Audience level:
Intermediate

Description

Dirty data makes analysis and machine learning harder (or impossible!) and more prone to failure. I'll talk on the techniques we use at ModelInsight to fix badly encoded, inconsistent and hard-to-parse text data that enable us to prepare real-world industrial data for research.

Abstract

Dirty data makes analysis and machine learning harder (or impossible!) and more prone to failure. I'll talk on the techniques we use at ModelInsight to fix badly encoded, inconsistent and hard-to-parse text data that enable us to prepare real-world industrial data for research.

Topics will include text cleaning through normalisation and similarity measures, date parsing, data joining and visualisation. This talk is aimed at helping you make rapid progress on new projects.

Sponsors