What is Data Wrangling?

Wrangling is a subtle variant of Data Preparation. It originates from the imagery of the American West and its cowboys. The word is mainly used by Trifacta.

After clarifying the difference between Data Preparation and Data Exploration , a related notion remained in the mist of the new vocabulary of BI and data: Data Wrangling, a term used by the  promising startup  Trifacta.

“Data wrangling and data preparation are very similar,” admits Trifacta bluntly.

This being so, the two are not exactly identical. “The distinction between the two comes from its geographical and cultural origins,” says the publisher.

The data “cattle” and the user “cowboy”

To understand this cultural context, you should know that the word Wrangling is almost part of the common vocabulary of the west coast. “In California, you could tell your child,” Wrangle your room! “. Which means in French “put away this mess,” says  Bertrand Cariou, Senior Director Partners & Solutions at Trifacta  in an exchange with LeMagIT.

Originally, the term is actually related to cowboys and cattle – and thus to the imagery of American open spaces and pioneers.

“A Wrangler is a cowboy who has to gather his livestock scattered in the plains of the Great West, a laborious, messy, exhausting but absolutely necessary activity,” says Bertrand Cariou . “The work of the data preparation is similar, hence the analogy with the term Data Wrangling”.

Historically, the concept of Wrangling was evoked in a BI context in 1997 in a publication on data preparation co-written by a Berkley professor and one of his students. Fourteen years later, in 2011, one of their followers concretely embodies these concepts in an open source tool (the Stanford Wrangler ) that is rapidly becoming successful.

The three academics – Professor Joe Hellerstein, student Vijayshankar Raman and disciple Sean Kandel – then found Trifacta.

Difference with Data Preparation and Data Exploration

The difference between Data Wrangling and the duo Data Prep / Exploration is ultimately rather tenuous. Trifacta does not deny it.

To make the distinction, the publisher puts forward the use of Artificial Intelligence to guide the user in his preparation process “and thus make his task easier because the tool is part of the work for him” .

Second distinction, the Data Wrangling is – always dixit Trifacta – associated with data in the complex format and / or with very large volumetrics. Several clients, including LinkedIn, use it for example on Hadoop clusters.

Founded in 2012, Trifacta established itself in Europe in December 2015 by opening two branches in London and Berlin. The startup has since opened a Paris office where it manages Southern Europe and North Africa.

Leave a Reply

Your email address will not be published. Required fields are marked *