A prominent Communications Services Provider (CSP) needed a near real-time, big data solution for the processing and analyzing of huge amounts of data (30 million files processed per month and approximately 3TB of data/day).
To process and analyze, in near real-time, massive amounts of data provided in different formats (structured and semi-structured) and from different sources (files, CDRs, XML, csv, etc.).
IBM Streams collects data from different sources (files – XML, CSV, ASN.1- , databases, sockets and REST Web Services). After data is collected, data is filtered, processed, decorated and grouped (sum, avg, max, min, etc.) through flows implemented in SPL (Streams Processing Language). Data is sent to different sinks (databases, HDFS, Filesystem, REST Web Service).
Apache Flink collects data from different sources (Kafka, files and REST Web Service). After collection, data is filtered, processed, decorated and grouped (sum, avg, max, min, etc.) through flows implemented in Java. Data is sent to different sinks (databases – sql and NoSql – and Filesystem)
Both solutions are implemented in cluster.
The solution resulted in an effective approach to reduce costs and accelerate decision-making times for client. Decision-making times were reduced from 48-72 hours to 2-3 minutes. In addition, the solution dramatically improved our client’s ability to manage massive volumes of data.
Use Case Examples:
To process and analyze, in near real-time, massive amounts of data provided in different formats (structured and semi-structured) and from different sources (files, CDRs, XML, csv, etc.).