AWS Pipeline Overview
AWS Data Pipeline is a web based ETL service for processing and moving data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals. It is used with AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR.
AWS Pipeline Benefits
- reliable
- easy to use
- flexible
- scalable
- transparent
- low cost
AWS Pipeline Features
- distributed, highly available infrastructure designed for fault tolerant execution
- automatic retry capability
- configured through visual interface
- library of templates
- scheduling
- dependency tracking
- error handling
- work can be dispatched to one machine or many in parallel
- full execution logs are automatically delivered to Amazon S3
- full control over the compute resources
AWS Pipeline Costs
- low monthly rate with low frequency jobs at 6oc per month
- high frequency jobs at $1 per month
- high frequency is twice or more times per day)