Tech Stack Requirements
Languages & Processing:
Python
PySpark
Apache Spark (on EMR)
Apache Flink (on EMR)
AWS Services:
Amazon S3
AWS Glue
AWS EMR
IAM
Lake Formation
Data Integration:
REST APIs
Kafka or Kinesis
CDC tools (e.g., AWS DMS or Debezium)
Formats & Architecture:
JSON, Parquet
Familiarity with layered data lake design (e.g., raw, processed, curated)