Big Data

Project

prominent Communications Services Provider (CSP) needed a near real-time, big data solution for the processing and analyzing of huge amounts of data (30 million files processed per month and approximately 3TB of data/day).

Challenge

To process and analyze, in near real-time, massive amounts of data provided in different formats (structured and semi-structured) and from different sources (files, CDRs, XML, csv, etc.).

Solution

IBM Streams collects data from different sources (files – XML, CSV, ASN.1- , databases, sockets and REST Web Services). After data is collected, data is filtered, processed, decorated and grouped (sum, avg, max, min, etc.) through flows implemented in SPL (Streams Processing Language). Data is sent to different sinks (databases, HDFS, Filesystem,  REST Web Service). 

Apache Flink collects data from different sources (Kafka, files and REST Web Service). After collection, data is filtered, processed, decorated and grouped (sum, avg, max, min, etc.) through flows implemented in Java. Data is sent to different sinks (databases – sql and NoSql – and Filesystem) 

Both solutions are implemented in cluster. 

Result

The solution resulted in an effective approach to reduce costs and accelerate decision-making times for client. Decision-making times were reduced from 48-72 hours to  2-3 minutes. In addition, the solution dramatically improved our client’s ability to manage massive volumes of data.  

Use Case Examples: 

  • Detect when a customer went below a minimum account balance triggering an SMS to inform him/her about the situation. 
  • In real-time, analyze a huge number of files containing information about actual performance of sites, antennas and cells. Using this information, network quality indicator (NQI) is calculated. 
  • Analyze huge number of files containing configuration parameters of sites. 
  • Summarize traffic data and amount of money consumed by each client, based on some business rules 
  • Identify which and how many customers were requesting a “credit loan” from the provider when the account reached a zero balance. This information enabled client to provide higher revolving credit amounts for specific customers with “good credit and payment history”, thus increasing customer loyalty and reducing churn. 

Stack

To process and analyze, in near real-time, massive amounts of data provided in different formats (structured and semi-structured) and from different sources (files, CDRs, XML, csv, etc.).