Over the past decade, the sources and quantities of data that are being generated have increased tremendously. New software tools have had to be developed in order to capture, curate, manage, and process this data within a tolerable amount of time. Today’s hot buzzword “big data” refers to datasets that are large, unstructured, and that grow at such rapid rates that managing them with ordinary databases and statistical tools has become infeasible.
Unstructured data source examples include:
- 2 billion internet users in the world today
- 4.6 billion mobile phones in 2011
- 7TB of data are processed by Twitter everyday
- 10TB of data are processed by Facebook everyday
Big Data Use Cases for Verticals
- Credit Risk Score and Analysis: With volumes of data growing every second, it’s important to be able to score every record as soon as it is available.
- Fraud Detection and Security Analysis: High performance analytics, coupled with the ability to score every record and feed it to the system electronically, can identify fraud faster and more accurately.
- Abnormal Trading Pattern Analysis: The existing market surveillance systems available are facing crucial challenges of diversified, dynamic, distributed cyber-based misuse, mis-disclosure, and misleading information across the markets.
- Ad-targeting, Analysis, Forecasting and Optimization: High performance analytics, coupled with the ability to forecast, can identify the target segments and areas for optimization more accurately.
- Social Graph Analysis and Profile Segmentation: Mapping various metrics to people or groups can help in building better social intelligence.
- Large-scale Click Stream Analytics: The analysis of billions of clicks everyday can give a better insight of trends leading to better segmentation of target markets.
- Drug Discovery and Development Analysis: Data about drug research can be analyzed in real-time against existing data to derive results that are relevant for drug development.
- Patient Care Quality and Program Analysis: An analysis of different sets of parameters associated with patient care and hospital programs can help to improve the quality of services and offerings.
- Medical Device and Pharmaceuticals Supply Chain Management: Location and order information can be used to run analytics that help fulfill demand accurately, optimize the supply chain, and build optimized intelligence.
- Call Detail Record (CDR) Analysis: Monitoring telecommunication metrics like time, frequency, and location reveals key insights into individuals and organized groups.
- Network Performance and Optimization: Mapping various metrics to different network traffic types and then performing real-time analysis can lead to technological improvements and optimizations.
- Mobile User Location And Analysis: Location intelligence can be used to run predictive analysis as well as help in building operational intelligence
To process big data in a short amount of time, exceptional technology solutions must be implemented. In general, big data follows the 3V’s pattern (i.e. Variety, Velocity and Volume). The variety of data is Structured, Semi-structured and Unstructured. The velocity of data varies from regular batch processing with known speeds to real-time processing with unknown speeds. The large volume of data coming from different sources is too big to be handled by conventional database systems. Value (deriving business value from analytics) and variability (real-time streaming data or data stored in database) also have an impact. An increase in all types of resources is expected in the near future.
The basic solution for handling big data consists of five stages; identifying the sources, gathering the data from various sources, processing the data, storing the data, and data migration for the end business user. Data sources include structured master reference files and transactions, semi-structured machine generated log files, unstructured data-like images, videos, audio, and social media. Through data connectors or adapters provided by client software or by some other means, data is gathered for local storage on systems such as DBMS, OLTP or HDFS (Hadoop Distributed File System). The data is then processed using the ETL/ELT mechanism, custom applications, or using various frameworks provided by Hadoop. These include HIVE, HBASE, and MAPREDUCE. After the transformation, the data is sent for reporting, dashboard analysis, test analysis and research, data-mining, sentiment analysis and predictive analysis.
What about you, are you working with big data? What tools are you using to process your data? What sort of big data applications are you involved with?