Following story was narrated to me by a friend of mine who runs a mid-size manufacturing company. The story, I believe, does a nice job of explaining the difference between transactional databases & reporting databases. Needless to say that the narration was not as detailed as I have put it here. In order for every-one to understand, I have taken the liberty add some explanation.
So here is how the story goes…
The other day, based on what the CEO asked, I requested and was provided information on days to first shipment for 3 of our plants for most selling product. Since we have an ERP, data for this matrix is available with us.
Those of you not familiar with this matrix, here is a little explanation.
One needs the customer, all the orders for the customer and the first shipment date for the order. So one tries to get a feel for the lead time before any order can be shipped.
For Original Equipment Manufacturers (OEMs) following Just In Time (JIT) model, this is a critical matrix to follow. They want the shipments to start as soon as possible.
This was 2 in the afternoon. I requested the person who submitted the report to send me the same report for another product as well. I was told that the report would be available the next day morning. I was told that the report will be generated when users are not using the system.
Why? I asked. I was told that generation of report may slow down the system for users.
Have you heard of such a things? Report slowing down transactions in the system?
I had to get to the bottom of this. This was interesting enough for me to get involved in this personally as I want more and more such reports to be available.
Our IT staff is quite efficient and if they give a reason, generally it tends to be valid.
The IT person came and started explaining to me that our implementation does not have reporting tables set up properly. That once reporting tables are set up, the reports generation will not interfere with transactions. He also used words like current transactional data is highly normalized.
I think I dozed off somewhere during the explanation.
Another IT person, who has this knack of making complicated things simple, walked in. He knew that making me understand the IT jargon was daunting to day the least.
He went up to the whiteboard and drew the following:
The explanation was as follows:
A customer gives us order and order is what has the information for a product. Then the order is linked to the shipment record. In order to get days to first shipment, one has to link customer and shipment data directly. That happens by linking 3 different types of information: Customer, Order & Shipment.
In a transactional database, all 3 pieces of information are kept in different tables. And linking these records takes vital system resources. These vital resources are needed to carry out system transaction as well as like entering an order.
I was beginning to get it. And I was awake too… which was probably helping somewhat.
In reporting databases or tables set up, these pieces of information are organized in one record. This causes lot of data duplication but ensures faster reporting as well as reduces query complexities. Data duplication is ok since the space available to store the data is not expensive.
So in reporting databases, the same data would look something like below:
So if there were 10 shipments, the above information would get duplicated 10 times. In some instances, one can do even more. If one knows that “Days To First Shipment” is going to be used repeatedly, then that information can be pre-calculated and stored with the record.
Hm.. These IT guys are smart. Not as smart as operations people though.
And that was the difference between OLAP – Online Analytical Processing & OLTP – Online Transaction Processing.
Once I understood the difference and the reason why complex reports can not be generated during processing time, I decided to champion the cause of getting reporting tables & BI tool project implemented for our ERP. It just made lot of business sense.