Data Hygiene: Next big problem of big data

3 September 2019

Read Dr. Cobanoglu’s article at Hospitality Technology Magazine [PDF]

 

Everything is about data, right? Yes, it is. Data is one of the most important strategic tool for any company. Hospitality industry is no exception. It is now widely accepted in the literature that the rise of big data and its applications in many industries can be source of competitiveness. This competitiveness is driven by the processes of big data analytics which basically include the collection, analysis and application of data according to needs specific to each company and industry (Krsak & Kysela, 2016). Big data analytics is the application of advanced analytic techniques on big data sets (Russom, 2011).

Data mining is a process of big data analytics which is realized through 4 steps namely data collection, data cleansing, data analysis and data interpretation (Krsak & Kysela, 2016). For a given company, the quality of data, which ultimately affects the reliability of analytics, depends on its fit with the specific needs they have been collected for. In such conditions, the data cleansing step appears as a central and determinant one in big data analytics.

Data cleansing consists at checking for errors to make sure that the data collected is consistent and properly recorded (Krsak & Kysela, 2016). “Dirty data” -the term used for errors in datasets- usually comprises outdated data, incomplete data, or duplicate records. Another aspect of data cleansing is related to the fact that the data, which doesn’t always come from the same source system need to be integrated (i.e. in the case of mergers or acquisitions) but also transformed in order to be harmonized (Watson, 2002). Thus, the cleansing of data is performed based upon specific rules related to the company or the industry. These processes aimed at making the data “clean” can be understood as part of “data hygiene”. Although it is a concept that is not much discussed in the literature, data hygiene is simply keeping the data clean to ensure that no duplicate, incomplete, outdated or corrupt data exist in the datasets (Kulshrestha, 2015).

The hospitality industry is considered as the second largest industry in the world (WTTC, 2019), with 1.4 billion international tourist arrivals in 2018 (UNWTO, 2019). This high number of tourists represents a myriad of opportunities in terms of data generation and collection for a proper use of big data analytics to improve business decisions in the industry. Customer data in the hospitality industry comes in a variety of ways including reservations, contact information, demographics, but also from social media (i.e. through reviews on sites such as TripAdvisor or Holiday Check) or from housekeeping. But as it is the case for each industry, difficulties may arise in the data mining process, knowing that the collected data won’t be error free. In this sense, for the most desirable outcome of big data analytics, the step of data cleansing which occurs after the data collection can be seen as critical in the whole analytics process.

Data hygiene can also help hotels in their duties of maintaining customer loyalty, operational efficiency and to have a good mapping of customer database (Kulshrestha, 2015). Attention towards better data cleansing techniques has recently increased due to the difficulties encountered with misleading interpretations resulting from dirty data. The issues are also related to the big size of the data which suggest that automation of data cleansing would result in increased efficiency in the processes and outcomes of big data analytics. For instance, dailypoint™, a big data platform has developed a data cleansing software specially designed for hotels that will allow for automatically cleaning, correcting and duplicating customer profiles (Leitsch, 2019). Such software makes the data cleansing process smoother, and likely allowing also for automatic harmonization of the data.

In the hospitality industry, the data can be used to target specific offers to certain customer segments (Krsak & Kysela, 2016) -in this case we talk about marketing data-, to reduce costs (revenue data). The stiff influence such data can have on the hotel revenues definitely require for this data to be clean in order to allow for optimal decisions in terms of marketing and revenue management. Some data cleansing software are nowadays available and can help actors in the hospitality industry obtain the best insights from big data analytics.

 

References