Carry your workforce id to Amazon EMR Studio and Athena


Prospects at the moment could wrestle to implement correct entry controls and auditing on the consumer degree when a number of functions are concerned in knowledge entry workflows. The important thing problem is to implement correct least-privilege entry controls primarily based on consumer id when one software accesses knowledge on behalf of the consumer in one other software. It forces you to both give all customers broad entry by means of the appliance with no auditing, or attempt to implement advanced bespoke options to map roles to customers.

Utilizing AWS IAM Id Heart, now you can propagate consumer id to a set of AWS companies and reduce the necessity to construct and preserve advanced customized techniques to vend roles between functions. IAM Id Heart additionally offers a consolidated view of customers and teams in a single place that the interconnected functions can use for authorization and auditing.

IAM Id Heart permits centralized administration of consumer entry to AWS accounts and functions utilizing id suppliers (IDPs) like Okta. This permits customers to log in a single time with their current company credentials and seamlessly entry downstream AWS companies supporting id propagation. With IAM Id Heart, Okta consumer identities and teams might be robotically synced utilizing SCIM 2.0 for correct consumer data in AWS.

Amazon EMR Studio is a unified knowledge evaluation setting the place you possibly can develop knowledge engineering and knowledge science functions. Now you can develop and run interactive queries on Amazon Athena from EMR Studio (for extra particulars, confer with Amazon EMR Studio provides interactive question editor powered by Amazon Athena ). Athena customers can entry EMR Studio with out logging in to the AWS Administration Console by enabling federated entry out of your IdP through IAM Id Heart. This removes the complexity of sustaining totally different identities and mapping consumer roles throughout your IdP, EMR Studio, and Athena.

You may govern Athena workgroups primarily based on consumer attributes from Okta to manage question entry and prices. AWS Lake Formation can even use Okta identities to implement fine-grained entry controls by means of granting and revoking permissions.

IAM Id Heart and Okta single sign-on (SSO) integration streamlines entry to EMR Studio and Athena with centralized authentication. Customers can have a well-recognized sign-in expertise with their workforce credentials to securely run queries in Athena. Entry insurance policies on Athena workgroups and Lake Formation permissions present governance primarily based on Okta consumer profiles.

This weblog submit explains the way to allow single sign-on to EMR Studio utilizing IAM Id Heart integration with Okta. It exhibits the way to propagate Okta identities to Athena and Lake Formation to supply granular entry controls on queries and knowledge. The answer streamlines entry to analytics instruments with centralized authentication utilizing workforce credentials. It leverages AWS IAM Id Heart, Amazon EMR Studio, Amazon Athena, and AWS Lake Formation.

Resolution overview

IAM Id Heart permits customers to connect with EMR Studio with no need directors to manually configure AWS Id and Entry Administration (IAM) roles and permissions. It permits mapping of IAM Id Heart teams to current company id roles and teams. Admins can then assign privileges to roles and teams and assign customers to them, enabling granular management over consumer entry. IAM Id Heart offers a central repository of all customers in AWS. You may create customers and teams instantly in IAM Id Heart or join current customers and teams from suppliers like Okta, Ping Id, or Azure AD. It handles authentication by means of your chosen id supply and maintains a consumer and group listing for EMR Studio entry. Identified consumer identities and logged knowledge entry facilitates compliance by means of auditing consumer entry in AWS CloudTrail.

The next diagram illustrates the answer structure.

Solution Overview

The EMR Studio workflow consists of the next high-level steps:

  1. The tip-user launches EMR Studio utilizing the AWS entry portal URL. This URL is offered by an IAM Id Heart administrator through the IAM Id Heart dashboard.
  2. The URL redirects the end-user to the workforce IdP Okta, the place the consumer enters workforce id credentials.
  3. After profitable authentication, the consumer might be logged in to the AWS console as a federated consumer.
  4. The consumer opens EMR Studio and navigates to the Athena question editor utilizing the hyperlink obtainable on EMR Studio.
  5. The consumer selects the right workgroup as per the consumer function to run Athena queries.
  6. The question outcomes are saved in separate Amazon Easy Storage Service (Amazon S3) places with a prefix that’s primarily based on consumer id.

To implement the answer, we full the next steps:

  1. Combine Okta with IAM Id Heart to sync customers and teams.
  2. Combine IAM Id Heart with EMR Studio.
  3. Assign customers or teams from IAM Id Heart to EMR Studio.
  4. Arrange Lake Formation with IAM Id Heart.
  5. Configure granular role-based entitlements utilizing Lake Formation on propagated company identities.
  6. Arrange workgroups in Athena for governing entry.
  7. Arrange Amazon S3 entry grants for fine-grained entry to Amazon S3 sources like buckets, prefixes, or objects.
  8. Entry EMR Studio by means of the AWS entry portal utilizing IAM Id Heart.
  9. Run queries on the Athena SQL editor in EMR Studio.
  10. Overview the end-to-end audit path of workforce id.

Conditions

To observe alongside this submit, it’s best to have the next:

  • An AWS account – If you happen to don’t have one, you possibly can join right here.
  • An Okta account that has an energetic subscription – You want an administrator function to arrange the appliance on Okta. If you happen to’re new to Okta, you possibly can join a free trial or a developer account.

For directions to configure Okta with IAM Id Heart, confer with Configure SAML and SCIM with Okta and IAM Id Heart.

Combine Okta with IAM Id Heart to sync customers and teams

After you could have efficiently synced customers or teams from Okta to IAM Id Heart, you possibly can see them on the IAM Id Heart console, as proven within the following screenshot. For this submit, we created and synced two consumer teams:

  • Information Engineer
  • Information Scientists

Workforce Identity groups in IAM Identity Center

Subsequent, create a trusted token issuer in IAM Id Heart:

  1. On the IAM Id Heart console, select Settings within the navigation pane.
  2. Select Create trusted token issuer.
  3. For Issuer URL, enter the URL of the trusted token issuer.
  4. For Trusted token issuer title, enter Okta.
  5. For Map attributes¸ map the IdP attribute Electronic mail to the IAM Id Heart attribute Electronic mail.
  6. Select Create trusted token issuer.
    Create a Trusted Token Issuer in IAM Identity Center

The next screenshot exhibits your new trusted token issuer on the IAM Id Heart console.

Okta Trusted Token Issuer in Identity Center

Combine IAM Id Heart with EMR Studio

We begin with making a trusted id propagation enabled in EMR Studio.

An EMR Studio administrator should carry out the steps to configure EMR Studio as an IAM Id Heart-enabled software. This permits EMR Studio to find and hook up with IAM Id Heart robotically to obtain sign-in and consumer listing companies.

The purpose of enabling EMR Studio as an IAM Id Heart-managed software is so you possibly can management consumer and group permissions from inside IAM Id Heart or from a supply third-party IdP that’s built-in with it (Okta on this case). When your customers register to EMR Studio, for instance data-engineer or data-scientist, it checks their teams in IAM Id Heart, and these are mapped to roles and entitlements in Lake Formation. On this method, a bunch can map to a Lake Formation database function that permits learn entry to a set of tables or columns.

The next steps present the way to create EMR Studio as an AWS-managed software with IAM Id Heart, then we see how the downstream functions like Lake Formation and Athena propagate these roles and entitlements utilizing current company credentials.

  1. On the Amazon EMR console, navigate to EMR Studio.
  2. Select Create a Studio.
  3. For Setup choices, choose Customized.
  4. For Studio title, enter a reputation.
  5. For S3 location for Workspace storage, choose Choose current location and enter the Amazon S3 location.

Create EMR Studio with Custom Set up option

6. Configure permission particulars for the EMR Studio.

Observe that while you select View permission particulars beneath Service function, a brand new pop-up window will open. It’s worthwhile to create an IAM function with the identical insurance policies as proven within the pop-up window. You should utilize the identical in your service function and IAM function.

Permission details for EMR studio

  1. On the Create a Studio web page, for Authentication, choose AWS IAM Id Heart.
  2. For Consumer function, select your consumer function.
  3. Below Trusted id propagation, choose Allow trusted id propagation.
  4. Below Software entry, choose Solely assigned customers and teams.
  5. For VPC, enter your VPC.
  6. For Subnets, enter your subnet.
  7. For Safety and entry, choose Default safety group.
  8. Select Create Studio.

Enable Identity Center and Trusted Identity Propagation

It’s best to now see an IAM Id Heart-enabled EMR Studio on the Amazon EMR console.

IAM Identity Center enabled EMR Studio

After the EMR Studio administrator finishes creating the trusted id propagation-enabled EMR Studio and saves the configuration, the occasion of the EMR Studio seems as an IAM Id Heart-enabled software on the IAM Id Heart console.

EMR Studio appears under AWS Managed app in IAM Identity Centre

Assign customers or teams from IAM Id Heart to EMR Studio

You may assign customers and teams out of your IAM Id Heart listing to the EMR Studio software after syncing with IAM. The EMR Studio administrator decides which IAM Id Heart customers or teams to incorporate within the app. For instance, when you’ve got 10 whole teams in IAM Id Heart however don’t need all of them accessing this occasion of EMR Studio, you possibly can choose which teams to incorporate within the EMR Studio-enabled IAM app.

The next steps assign teams to EMR Studio-enabled IAM Id Heart software:

  1. On the EMR Studio console, navigate to the brand new EMR Studio occasion.
  2. On the Assigned teams tab, select Assign teams.
  3. Select which IAM Id Heart teams you need to embody within the software. For instance, it’s possible you’ll select the Information-Scientist and Information-Engineer teams.
  4. Select Completed.

This permits the EMR Studio administrator to decide on particular IAM Id Heart teams to be assigned entry to this particular occasion built-in with IAM Id Heart. Solely the chosen teams might be synced and given entry, not all teams from the IAM Id Heart listing.

Assign Trusted Identity Propagation enabled EMR studio to your user groups by selecting groups from Studio settings

Arrange Lake Formation with IAM Id Heart

To arrange Lake Formation with IAM Id Heart, just remember to have configured Okta because the IdP for IAM Id Heart, and ensure that the customers and teams type Okta at the moment are obtainable in IAM Id Heart. Then full the next steps:

  1. On the Lake Formation console, select IAM Id Heart Integration beneath Administration within the navigation pane.

You will note the message “IAM Id Heart enabled” together with the ARN for the IAM Id Heart software.

  1. Select Create.

In a couple of minutes, you will note a message indicating that Lake Formation has been efficiently built-in together with your centralized IAM identities from Okta Id Heart. Particularly, the message will state “Efficiently created id middle integration with software ARN,” signifying the combination is now in place between Lake Formation and the identities managed in Okta.

IAM Identity Center enabled AWS Lake Formation

Configure granular role-based entitlements utilizing Lake Formation on propagated company identities

We’ll now arrange granular entitlements for our knowledge entry in Lake Formation. For this submit, we summarize the steps wanted to make use of the prevailing company identities on the Lake Formation console to supply related controls and governance on the info, which we are going to later question by means of the Athena question editor. To study organising databases and tables in Lake Formation, confer with Getting began with AWS Lake Formation

This submit won’t go into the complete particulars about Lake Formation. As an alternative, we are going to give attention to a brand new functionality that has been launched in Lake Formation—the power to arrange permissions primarily based in your current company identities which can be synchronized with IAM Id Heart.

This integration permits Lake Formation to make use of your group’s IdP and entry administration insurance policies to manage permissions to knowledge lakes. Relatively than defining permissions from scratch particularly for Lake Formation, now you can depend on your current customers, teams, and entry controls to find out who can entry knowledge catalogs and underlying knowledge sources. Total, this new integration with IAM Id Heart makes it simple to handle permissions in your knowledge lake workloads utilizing your company identities. It reduces the executive overhead of protecting permissions aligned throughout separate techniques. As AWS continues enhancing Lake Formation, options like this can additional enhance its viability as a full-featured knowledge lake administration setting.

On this submit, we created a database known as zipcode-db-tip and granted full entry to the consumer group Information-Engineer to question on the underlying desk within the database. Full the next steps:

  1. On the Lake Formation console, select Grant knowledge lake permissions.
  2. For Principals, choose IAM Id Heart.
  3. For Customers and teams, choose Information-Engineer.
  4. For LF-Tags or catalog sources, choose Named Information Catalog sources.
  5. For Databases, select zipcode-db-tip.
  6. For Tables, select tip-zipcode.
    Grant Data Lake permissions to users in IAM Identity Center

Equally, we have to present the related entry on the underlying tables to the customers and teams for them to have the ability to question on the info.

  1. Repeat the previous steps to supply entry to the Information-Engineer group to have the ability to question on the info.
  2. For Desk permissions, choose Choose, Describe, and Tremendous.
  3. For Information permissions, choose All knowledge entry.

You may grant selective entry on rows and feedback as per your particular necessities.

Grant Table permissions in AWS Data Lake

Arrange workgroups in Athena

Athena workgroups are an AWS characteristic that permits you to isolate knowledge and queries inside an AWS account. It offers a strategy to segregate knowledge and management entry so that every group can solely entry the info that’s related to them. Athena workgroups are helpful for organizations that need to prohibit entry to delicate datasets or assist forestall queries from impacting one another. If you create a workgroup, you possibly can assign customers and roles to it. Queries launched inside a workgroup will run with the entry controls and settings configured for that workgroup. They allow governance, safety, and useful resource controls at a granular degree. Athena workgroups are an necessary characteristic for managing and optimizing Athena utilization throughout massive organizations.

On this submit, we create a workgroup particularly for members of our Information Engineering workforce. Later, when logged in beneath Information Engineer consumer profiles, we run queries from inside this workgroup to show how entry to Athena workgroups might be restricted primarily based on the consumer profile. This permits governance insurance policies to be enforced, ensuring customers can solely entry permitted datasets and queries primarily based on their function.

  1. On the Athena console, select Workgroups beneath Administration within the navigation pane.
  2. Select Create workgroup.
  3. For Authentication, choose AWS Id Heart.
  4. For Service function to authorize Athena, choose Create and use a brand new service function.
  5. For Service function title, enter a reputation in your function.
    Select IAM Identity Centre for Athena Authentication option
  6. For Location of question consequence, enter an Amazon S3 location for saving your Athena question outcomes.

This can be a obligatory subject while you specify IAM Id Heart for authentication.

Configure location for query result and enable user identity based S3 prefix

After you create the workgroup, it is advisable to assign customers and teams to it. For this submit, we create a workgroup named data-engineer and assign the group Information-Engineer (propagated by means of the trusted id propagation from IAM Id Heart).

  1. On the Teams tab on the data-engineer particulars web page, choose the consumer group to assign and select Assign teams.
    Assign groups option is available in the Groups tab of Workgroup settings

Arrange Amazon S3 entry grants to separate the question outcomes for every workforce id

Subsequent, we arrange Amazon S3 grants.

You may watch the next video to arrange the grants or confer with Use Amazon EMR with S3 Entry Grants to scale Spark entry Amazon S3 for directions.

Provoke login by means of AWS federated entry utilizing the IAM Id Heart entry portal

Now we’re prepared to connect with EMR Studio and federated login utilizing IAM Id Heart authentication:

  1. On the IAM Id Heart console, navigate to the dashboard and select the AWS entry portal URL.
  2. A browser pop-up directs you to the Okta login web page, the place you enter your Okta credentials.
  3. After profitable authentication, you’ll be logged in to the AWS console as a federated consumer.
  4. Select the EMR Studio software.
  5. After you federate to EMR Studio, select Question Editor within the navigation pane to open a brand new tab with the Athena question editor.

The next video exhibits a federated consumer utilizing the AWS entry portal URL to entry EMR Studio utilizing IAM Id Heart authentication.

Run queries with granular entry on the editor

On EMR Studio, the consumer can open the Athena question editor after which specify the right workgroup within the question editor to run the queries.

Athena Query result in data-engineer workgroup

The information engineer can question solely the tables on which the consumer has entry. The question outcomes will seem beneath the S3 prefix, which is separate for every workforce id.

Overview the end-to-end audit path of workforce id

The IAM Id Heart administrator can look into the downstream apps which can be trusted for id propagation, as proven within the following screenshot of the IAM Id Heart console.

AWS IAM Identity Center view of the trusted applications

On the CloudTrail console, the occasion historical past shows the occasion title and useful resource accessed by the particular workforce id.

Auditors can see the workforce identity who executed the query on AWS Data Lake

If you select an occasion in CloudTrail, the auditors can see the distinctive consumer ID that accessed the underlying AWS Analytics companies.

Clear up

Full the next steps to wash up your sources:

  1. Delete the Okta functions that you just created to combine with IAM Id Heart.
  2. Delete IAM Id Heart configuration.
  3. Delete the EMR Studio that you just created for testing.
  4. Delete the IAM function that you just created for IAM Id Heart and EMR Studio integration.

Conclusion

On this submit, we confirmed you an in depth walkthrough to convey your workforce id to EMR Studio and propagate the id to linked AWS functions like Athena and Lake Formation. This answer offers your workforce with a well-recognized sign-in expertise, with out the necessity to keep in mind extra credentials or preserve advanced function mapping throughout totally different analytics techniques. As well as, it offers auditors with end-to-end visibility into workforce identities and their entry to analytics companies.

To be taught extra about trusted id propagation and EMR Studio, confer with Combine Amazon EMR with AWS IAM Id Heart.


Concerning the authors

Manjit Chakraborty is a Senior Options Architect at AWS. He’s a Seasoned & Consequence pushed skilled with intensive expertise in Monetary area having labored with clients on advising, designing, main, and implementing core-business enterprise options throughout the globe. In his spare time, Manjit enjoys fishing, training martial arts and taking part in together with his daughter.

Neeraj Roy is a Principal Options Architect at AWS primarily based out of London. He works with International Monetary Providers clients to speed up their AWS journey. In his spare time, he enjoys studying and spending time together with his household.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox