blog,

Integrating Trino With DataHub (or other backend) For Authorization

Izhar Firdaus Izhar Firdaus Follow Support Oct 22, 2024 · 2 mins read
Integrating Trino With DataHub (or other backend) For Authorization
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Introduction

Trino is a great distributed SQL query tool for querying and integrating data across multiple data sources, and DataHub is a great metadata management platform. However, besides DataHub being able to index and catalog metadata from Trino, the integration between these two platforms remain as that.

DataHub on its own provides tag and categorization metadata on its platform, alongside access policies to datasets tagged with these metadata. However, these access policies set in DataHub is limited to only controlling access to dataset information on it.

On the other hand, if you are using Hadoop, you will know that you can do tag based access control policies through Apache Ranger and Apache Atlas, where access policies can be configured to Hive dataset based on tags tagged in Atlas. Trino supports Ranger and Atlas integration which can deliver similar capabilities, however, if your preferred metadata management platform is DataHub, you do not have any option for tag based authorization using DataHub tags.

Integrating Trino with DataHub

So what if you want to use DataHub for your access control policies, without having to dabble with the complexity of Apache Ranger and Apache Atlas setup?. Trino have a hook you can utilize for this purpose, which is its Open Policy Agent Plugin plugin, which allows you to point to a URL that validates whether user is allowed to access a particular resource or not.

The plugin is meant to be used with Open Policy Agent tool, but as the URL simply have to return following JSON for Trino uses to allow or reject access to the resource, it can be implemented by any custom API service.

{
    "result": true
}

With this in mind, you can build an intermediary service for authorizing access

sequenceDiagram
    participant trino as Trino
    participant opa as Authorization Server 
    participant datahub as DataHub 

    trino->>opa: Check access
    activate opa 
    opa->>+datahub: Fetch policy
    datahub-->>-opa: Policy
    opa->>opa: Validate policy
    opa-->>trino: True/False
    deactivate opa

Once you have your Authorization Server running, you can enable trino-opa plugin with the endpoint that validates authorization.

Other possibilities

As you would probably noticed by now, this strategy is quite generic. You can also use this method to build/integrate whatever custom authorization engine you want for Trino.

If you are looking for integrating your Trino authorization with Datahub and does not want to go through the hassle of building the integration, do reach out to me.

Written by Izhar Firdaus Follow Support
I'm a system architect, data engineer and developer advocate with passion in Free / Open Source software, entrepreneurship, community building, education and martial art. I take enjoyment in bridging and bringing together different FOSS technologies to help businesses and organizations utilize IT infrastructure to aid and optimize their business and organizational process.

Adjusting trackpoint (or any pointing device) sensitivity through evdev

Recently I got myself a Thinkpad X1 Tablet Gen 3, however, unlike other Thinkpads, this model doesnot seem to use the usual Thinkpad ...

In blog, Oct 22, 2024

« Previous Post