Data Processing Systems with highest revenue share

Hundreds of flavors of data processors are used in today’s data world.  However they can be classified by the type of data they intend to process. We must understand which class of data processors are preferred by companies. The commercial data processing systems with highest revenue share can be roughly divided into 7 categories.

  1. Relational DBMS : OLTP
  2. Relational DBMS : OLAP/DSS
  3. Relational DBMS : In-Memory
  4. No-SQL
    • Key Value Store
    • Columnar databases
    • Document Oriented databases
    • Graph databases
  5. Massively Parallel Processing Frameworks : Hadoop
  6. Archives

800px-data_processing_system_english-svg

#1. Late in the 80’s, relational databases started to evolve. The information generated back then seemed fit for transaction oriented databases. Vendors like Oracle, IBM, Tandem became the leaders of this market.

#2. When the amount of data started to grow in scale of terabyte, the market focused on efficient retrieval of large amount of data. Also limited analytical capabilities was preferred. Relational model still seemed relevant. The OLAP systems came into the market and served this need for corporations. Oracle, DB2, Teradata, MySQL became leaders.

#3. With the cheaper main memory storage, there came a scope of making the transactions faster. The database management technologies were revised to better exploit the faster access from memory. Some notable vendors are MemSQL, Couchbase, AeroSpike, etc.

#4. Relational databases were not designed to store or analyze the unstructured data. With the increase in web content, people realized that unstructured text carries tonnes of information. Difficult to mine this data using relational databases. Different designs were explored to manage these huge datasets. Roughly anything that is nor relational database is known as NoSQL database. Cassandra, MongoDB, Sensage, etc.

#5. After Google published the paper on its distributed file system, people saw the need for a generic massively parallel data processing engine. Processing both structured and unstructured data became priority. Hadoop framework was born. Today Hadoop is popular in the opensource communities and can serve most of the data processing needs of corporations. Hadoop still has lots of scope of improvements.

#6. All the major companies are generating PetaByte scale of useful data every month. However all of the data is not being accessed frequently. So the software industry felt the need for compressing the old stale data and storing it in organized format. Data archive utilities will become a must have for the large corporations in the next few years.