Many Amazon Web Services (AWS) customers use batch data reports to gain strategic insight into long-term business trends, and a growing number of customers also require streaming data to obtain actionable insights from their data in real time.
Streaming Data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). Streaming data includes a wide variety of data such as log files generated by customers using your mobile or web applications, ecommerce purchases, in-game player activity, information from social networks, financial trading floors, or geospatial services, and telemetry from connected devices or instrumentation in data centers.
This data needs to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and used for a wide variety of analytics including correlations, aggregations, filtering, and sampling. Information derived from such analysis gives companies visibility into many aspects of their business and customer activity such as –service usage (for metering/billing), server activity, website clicks, and geo-location of devices, people, and physical goods –and enables them to respond promptly to emerging situations. For example, businesses can track changes in public sentiment on their brands and products by continuously analyzing social media streams, and respond in a timely fashion as the necessity arises.
Challenges in working with Streaming Data
Streaming data processing requires two layers: a storage layer and a processing layer. The storage layer needs to support record ordering and strong consistency to enable fast, inexpensive, and replayable reads and writes of large streams of data. The processing layer is responsible for consuming data from the storage layer, running computations on that data, and then notifying the storage layer to delete data that is no longer needed. You also have to plan for scalability, data durability, and fault tolerance in both the storage and processing layers. As a result, many platforms have emerged that provide the infrastructure needed to build data applications including Amazon Kinesis Streams, Amazon Kinesis Firehose, Apache Kafka, Apache Flume, Apache Spark Streaming and Apache Storm.
Working with Streaming Data on AWS
Amazon Web Services (AWS) provides a number options to work with streaming data. You can take advantage of the managed streaming data services offered by Amazon Kinesis, or deploy and manage your own streaming data solution in the cloud on Amazon EC2.
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data, and also enables you to build custom streaming data applications for specialized needs. It offers two services: Amazon Kinesis Firehose, and Amazon Kinesis Streams.
In addition, you can run other streaming data platforms such as –Apache Kafka, Apache Flume, Apache Spark Streaming, and Apache Storm –on Amazon EC2 and Amazon EMR.
Amazon Kinesis makes it easy to collect, process, and analyze so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as application logs, website clickstreams, IoT telemetry data, and more into your databases, data lakes and data warehouses, or build your own real-time applications using this data. Amazon Kinesis enables you to process and analyze data as it arrives and respond in real-time instead of having to wait until all your data is collected before the processing can begin.
Amazon Kinesis Firehose
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture and automatically load streaming data into Amazon S3 and Amazon Redshift, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It enables you to quickly implement an ELT approach, and gain benefits from streaming data quickly.
Amazon Kinesis Analytics
Amazon Kinesis Analytics is the easiest way to process streaming data in real time with standard SQL without having to learn new programming languages or processing frameworks. Amazon Kinesis Analytics enables you to query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights and respond to your business and customer needs promptly.
Amazon Kinesis Streams
Amazon Kinesis Streams enables you to build your own custom applications that process or analyze streaming data for specialized needs. It can continuously capture and store terabytes of data per hour from hundreds of thousands of sources. You can then build applications that consume the data from Amazon Kinesis Streams to power real-time dashboards, generate alerts, implement dynamic pricing and advertising, and more. Amazon Kinesis Streams supports your choice of stream processing framework including Kinesis Client Library (KCL), Apache Storm, and Apache Spark Streaming.
What Can you build with Amazon Kinesis?
You can use Amazon Kinesis for real-time applications such as application monitoring, fraud detection, and live leader-boards. You can ingest streaming data using Kinesis Streams, process it using Kinesis Analytics, and emit the results to any data store or application using Kinesis Streams with millisecond end-to-end latency. This can help you learn about what your customers, applications, and products are doing right now and react promptly.