Surprisingly, data cleaning is quite time intensive and often not something that can be automated. In my experience, the skill set that makes someone good or bad at data cleaning is not exactly the same as the skill set that makes someone a good researcher or a good analyst. There is overlap of course – creativity is helpful in overcoming data cleaning challenges and in developing analytical models. But being a good analyst often requires one to be alert to the big picture, and to be able to make simplifying assumptions when necessary, whereas data cleaning is all about the odd cases – in fact I think I have spent the vast majority of my time dealing with the 0.1% of observations that do not quite look like the other 99.9%.
Leave a Comment