A financial company wants to implement a data mesh. The data mesh must support centralized data governance, data analysis, and data access
control. The company has decided to use AWS Glue for data catalogs and extract, transform, and load (ETL) operations.
Which combination of AWS services will implement a data mesh? (Choose two.)
A. Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.
B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.
C. Use AWS Glue DataBrew for centralized data governance and access control.
D. Use Amazon RDS for data storage. Use Amazon EMR for data analysis.
E. Use AWS Lake Formation for centralized data governance and access control.
Answer: BE
✅ Explanation
Amazon S3 is the preferred data lake storage solution for a data mesh. It is scalable, cost-effective, and supports diverse data formats.
Amazon Athena allows for serverless, interactive querying of data stored in S3, enabling efficient data analysis across datasets.
AWS Lake Formation is built to provide centralized data governance, fine-grained access control, and security policies on top of data stored in S3. It integrates well with AWS Glue Data Catalog and supports data mesh governance needs.
Together, these services provide a decentralized data ownership model with centralized governance—the core principle of a data mesh.
Why other options are less suitable:
-A. Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.
Aurora and Redshift are great for specific workloads, but Aurora is a relational DB, not ideal for a data mesh data lake.
Redshift is a data warehouse, which is more centralized rather than mesh-style distributed data ownership.
-C. Use AWS Glue DataBrew for centralized data governance and access control.
DataBrew is a visual data preparation tool; it does not provide centralized governance or access control capabilities.
-D. Use Amazon RDS for data storage. Use Amazon EMR for data analysis.
RDS is a relational database service, not designed as scalable data lake storage.
EMR is a big data platform, but combining it with RDS is less aligned with a mesh architecture compared to S3 + Athena.
-Final answers:
B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.
E. Use AWS Lake Formation for centralized data governance and access control.