Components of Data Processing Systems

Data Processing Systems are normally complex. All of them have more or less similar design. They only differ, depending on which type of data they want to process more efficiently. For instance, the analytical processing systems and transaction processing systems should have different levels of optimizations, because OLTP systems typically deal with one record or a few records at most. So spending system resources in optimization of queries does not give much benefits. Whereas the analytical data engines handle thousands to millions of records for a single request. Missing to optimize a query may result in huge execution delays.

Another important part is the storage of data.  Data should always be stored keeping in mind the number of data blocks to accessed per query. When data arrives if you spend some extra time in organizing the data in an accessible format, it pays you back in terms of performance when retrieving/analyzing the data. If you do not pre-process, you may have to spend more CPU when retrieving the information. You have to decide where you can bear the delay and where you need better performance.


After thoroughly studying the commercial data processing engines, I came up with the following list of basic components for Data Processing Systems. The popular databases or applications have a subset of these components.

1. Connecting API (JDBC, ODBC, CLI).

2. Session controller.

3. Tokenizer and syntax analyzer.

4. Symbol resolver.

5. Semantic analyzer.

6. Rule based optimizer

7. Cost based optimizer

8. Query access planner.

9. Step dispatcher and scheduler.

10. Step executor

11. File system.

12. Operating system interface.

13. Memory manager.

14. Performance Booster Designs.

We will discuss these one by one in detail. At the end, we should have a clear idea on which component to design for our own database system.