Amazon Kinesis Data Streams
Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more.
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. It can capture, transform, and deliver streaming data to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt your data streams before loading, minimizing the amount of storage used and increasing security.
Kinesis Usage Patterns
Feature | Data Streams | Firehose |
---|---|---|
Purpose | Low latency streaming service | Store streaming data to S3, Redshift, Elasticsearch or Splunk |
Provisioning | Managed service with manual shard provisioning | Fully managed with autoscaling |
Processing | latency 200ms for classic; 70ms for fan-out | near real time or buffered every 60 seconds minimum |
Scaling | Manual shard config | automatic |
Data Storage | 1 day but extendable up to 7 days | None |
Replay | Yes | No |
Producers | Kinesis Prodcur Library(KPL), Kinesis Agent, CloudWatch, IoT | KPL, Kinesis Agent, CloudWatch, IoT plus Data Streams |
Consumers | Open ended – multiple – supports AWS SDK, KCL and Spark | Targets limited as defined above. |
When should I use AWS Glue Streaming and when should I use Amazon Kinesis Data Analytics?
Both AWS Glue and Amazon Kinesis Data Analytics can be used to process streaming data. AWS Glue is recommended when your use cases are primarily ETL and when you want to run jobs on a serverless Apache Spark-based platform. Amazon Kinesis Data Analytics is recommended when your use cases are primarily analytics and when you want to run jobs on a serverless Apache Flink-based platform.
Streaming ETL in AWS Glue enables advanced ETL on streaming data using the same serverless, pay-as-you-go platform that you currently use for your batch jobs. AWS Glue generates customizable ETL code to prepare your data while in flight and has built-in functionality to process streaming data that is semi-structured or has an evolving schema. Use Glue to apply both its built-in and Spark-native transforms to data streams and load them into your data lake or data warehouse.