The costs of poor data quality, Data are used in almost all the activities of companies and constitute the basis for decisions on operational and strategic levels. Poor quality data can, therefore, have significantly negative impacts on the efficiency of an organization, while high quality data are often crucial to a company’s success (Madnick et al., 2004; Haug et al., 2009; Batini et al., 2009; Even & Shankaranarayanan, 2009). However, several industry expert surveys indicate that data quality is an area, to which many companies seem not to give sufficient attention or know how to deal with efficiently (Marsh, 2005; Piprani & Ernst, 2008; Jing-hua et al., 2009). Vayghan et al. (2007) classify the data that most enterprises deal with in three categories: master data, transactional data, and historical data. Master data are defined as the basic characteristics of business entities, i.e. customers, products, employees, suppliers, etc. Thus, typically, master data are created once, used many times and do not change frequently (Knolmayer & Röthlin, 2006). Transaction data describe the relevant events in a company, i.e. orders, invoices, payments, deliveries, storage records etc. Since transactions are based on master data, erroneous master datacan have significant costs, e.g. an incorrect priced item may imply that money is lost. In this context Knolmayer and Röthlin (2006) argue that capturing and processing master data are error-prone activities where inappropriate information system architectures, insufficient coordination with business processes, inadequate software implementations or inattentive user behaviour may lead to disparate master data.
In spite of the importance of having correct and adequate data in a company, there seems to be ageneral agreement in literature that poor quality data is a problem in many companies. In fact, much academic literature claims that poor quality business data constitute a significant cost factor for many companies, which is supported by findings from several surveys from industrial experts (Marsh, 2005). On the other hand, Eppler and Helfert (2004) argue that although there is much literature that claims that the costs of poor data quality are significant in many companies, only very few studies demonstrate how to identify, categorize and measure such costs (i.e. how to establish the causal links between poor data quality and monetary effects). This is supported by Kim and Choi (2003) who state: “There have been limited efforts to systematically understand the effects of low quality data. The efforts have been directed to investigating the effects of data errors on computer-based models such as neural networks, linear regression models, rule-based systems, etc.” and “In practice, low quality data can bringmonetary damages to an organization in a variety of ways”. According to Kim (2002), the types of damage that low quality data can cause depend on the nature of data, the nature of the use of data, the types of responses (by the customers or citizens) to the damages, etc.As such, companies typically incur costs from two sides when speaking of master data quality. Firstly, companies incur costs when cleaning and ensuring high master data quality. Secondly, companies also incur costs for data that are not cleaned as poor master data quality might lead to faulty managerial decision-making. The purpose of this paper is to provide a better understanding of the relationship between such costs. To help determine the optimal data quality maintenance efforts, the paper provides: (1) a definition of the optimal data maintenance effort; and (2) a classification of costs inflicted by poor quality data. In this context the paper argues that there is a clear trade-off relationship between these two cost types and that thetask facing the companies in turn is to balance this trade-off. The remainder of the paper is organized as follows: First, literature on data quality is discussed in Section 2. Next, Section 3 proposes a model to determine the optimal data maintenance effort and a classification of different types of costs inflicted by poor quality data. Section 4 presents a case study to illustrate the application of the proposition. The paper ends with a conclusion in section 5.