Lesson 8: Data Mining, Data Warehousing, and Data Marts
Over the years, many large organizations have accumulated massive amounts of data about their customers, suppliers, products, and services. Even many new Web-based companies have amassed large databases about people and products as they have grown. The WWW is itself a large distributed data repository with untold potential. With the growing realization that these vast data resources can be tapped for significant commercial gain, interest in data mining, data warehousing, and data marts has virtually exploded.
After reading this lesson, you should be able to:
- Compare data mining, data warehousing, and data marts.
- Describe the purpose and value of data mining.
- Describe the purpose and value of data warehousing.
- Describe the purpose and value of data marts.
Data Mining (DM)
Data mining, also known as "knowledge discovery," refers to computer-assisted tools and techniques for sifting through and analyzing these vast data stores in order to find trends, patterns, and correlations that can guide decision making and increase understanding. Data mining covers a wide variety of uses, from analyzing customer purchases to discovering galaxies. In essence, data mining is the equivalent of finding gold nuggets in a mountain of data. The monumental task of finding hidden gold depends heavily upon the power of computers.
Applications of Data Mining
Data mining includes a variety of interesting applications. A few examples are listed below:
- By recording the activity of shoppers in an online store, such as Amazon.com, over time, retailers can use knowledge of these patterns to improve the placement of items in the layout of a mail-order catalog page or Web page.
- Telephone companies mine customer billing data to identify customers who spend considerably more than average on their monthly phone bill. The company can then target these customers to sell additional services.
- Marketers can effectively target the wants and needs of specific consumer groups by analyzing data about customer preferences and buying patterns.
- Hospitals use data mining to identify groups of people whose healthcare costs are likely to increase in the near future so that preventative steps can be taken.
Data Mining Summarized
In summary, the purpose of DM is to analyze and understand past trends and predict future trends. By predicting future trends, business organizations can better position their products and services for financial gain. Nonprofit organizations have also achieved significant benefits from data mining, such as in the area of scientific progress.
The concept of data mining is simple yet powerful. The simplicity of the concept is deceiving, however. Traditional methods of analyzing data, involving query-and-report approaches, cannot handle tasks of such magnitude and complexity.
The Need for Data Warehousing and Data Marts
The majority of databases are designed to hold the current data needed by an organization to perform its business activities. In a business organization, current data might include information concerning bills due, inventory levels, and product orders, and would most likely be contained in a billing/inventory/order database. In most cases, the minute that data become outdated, they are deleted from the database. For example, once a bill is paid, data about the bill is removed. Fortunately, many organizations have realized the value of being able to analyze historical data in order to discover patterns of behavior and predict future trends. For example, analyzing historical data can tell a retailer what items were ordered, in what quantities, and by which customers.
One of the keys to understanding the value of databases is to understand how one database, whether it is current or historical, can be related to another. If you think about it, it makes good business sense to relate customer data to inventory data (because customers place orders that affect inventory), and inventory data to supplier data (because suppliers provide inventory items). We could name many more examples like this. The problem with most databases is they are not designed to be accessed simultaneously in this fashion.
Data Warehousing and Data Marts
Many organizations now use data warehouses to bring multiple databases together and make them available for data mining and other forms of analysis. A data warehouse is a collection of data, usually current and historical, from multiple databases that the organization can use for analysis and decision making. The purpose, of course, is to bring key sets of data about or used by the organization into one place.
Bringing together so much data into a data warehouse makes analysis very difficult. To address this problem, organizations use what are called data marts. Data marts are related sets of data that are grouped together and separated out from the main body of data in the data warehouse. Data marts are designed to be made available to specific sets of users. For example, data about manufacturing can be put into a data mart and be made available to the production department. Human resource data can be put into another data mart and be provided to the human resources employees. This approach makes it easier for each group or constituency in the organization to access the data they need.
Knowledge Check
Which of the following best describes data marts?- Also known as "knowledge discovery," refers to the process of analyzing trends, patterns, and correlations that can guide decision-making and increase understanding.
- A collection of data, usually current and historical, from multiple databases that the organization can use for analysis and decision-making.
- Related sets of data that are grouped together and separated out from the main body of data.
- Designed to assist individuals and organizations in managing and extracting meaning from enormous amounts of data.
Lesson Wrap-Up
Data mining, data warehousing, and data marts are designed to assist individuals and organizations in managing and extracting meaning from enormous amounts of data. Data mining is used to analyze data sets and predict future trends. Data warehouses and data marts are used to store and analyze historical data in order to make better decisions and predictions about the future. The purpose of many of these activities and approaches is to relate data sets to each other, group related data together, and ensure the ability of users to access the data they need. Data is a resource that, in many cases, can be tapped for greater understanding and insight.
Now that you have completed this lesson, you should be able to:
- Compare data mining, data warehousing, and data marts.
- Describe the purpose and value of data mining.
- Describe the purpose and value of data warehousing.
- Describe the purpose and value of data marts.
