Storage options for sensor data
Select a storage service based on your requirements. Common options include:
Amazon S3: Use S3 to store raw sensor data or data that doesn't require a database.
Amazon DynamoDB: If you need a NoSQL database for real-time data storage.
Amazon RDS: If you need a relational database for structured sensor data.
Amazon Redshift: For data warehousing and analytical purposes.
Data Engineering options
For batch loading, use AWS Lambda or Glue. Lambda is more adhoc script. Glue is based on PySpark and has workflow orchestration. EMR is a platform for hosting Spark, HBase, Hive, etc.
For streaming, use Kinesis Firehose or Kinesis Data Streams. The later is for custom data streaming.
For connecting IoT devices, use AWS IoT Services, which connects to IoT devices and sencs data to other AWS services
Note both Kinesis and IoT Services can trigger a Lambda function to process new stream events.
Data Lake
AWS S3 storage, create buckets for different zones (raw, processed) (bronze, silver, gold), etc. This is similar to Azure Blob Storage
Files in the data lake normally needs a Catalog (metadata) to be consumed. Glue Catalog saves the metadata, and Glue Crawler helps to detect the catalog data.
AWS Athena is designed for runing SQL queries on data files in the datalake, by using the catalog information.