Enhance healthcare companies by affected person 360: A zero-ETL strategy to allow close to real-time information analytics


Healthcare suppliers have a chance to enhance the affected person expertise by accumulating and analyzing broader and extra various datasets. This consists of affected person medical historical past, allergic reactions, immunizations, household illness historical past, and people’ life-style information akin to exercise habits. Gaining access to these datasets and forming a 360-degree view of sufferers permits healthcare suppliers akin to declare analysts to see a broader context about every affected person and personalize the care they supply for each particular person. That is underpinned by constructing an entire affected person profile that permits declare analysts to determine patterns, tendencies, potential gaps in care, and adherence to care plans. They will then use the results of their evaluation to know a affected person’s well being standing, therapy historical past, and previous or upcoming physician consultations to make extra knowledgeable choices, streamline the declare administration course of, and enhance operational outcomes. Reaching this may also enhance basic public well being by higher and extra well timed interventions, determine well being dangers by predictive analytics, and speed up the analysis and improvement course of.

AWS has invested in a zero-ETL (extract, remodel, and cargo) future in order that builders can focus extra on creating worth from information, as a substitute of getting to spend time getting ready information for evaluation. The answer proposed on this submit follows a zero-ETL strategy to information integration to facilitate close to real-time analytics and ship a extra customized affected person expertise. The answer makes use of AWS companies akin to AWS HealthLake, Amazon Redshift, Amazon Kinesis Knowledge Streams, and AWS Lake Formation to construct a 360 view of sufferers. These companies allow you to gather and analyze information in close to actual time and put a complete information governance framework in place that makes use of granular entry management to safe delicate information from unauthorized customers.

Zero-ETL refers to a set of options on the AWS Cloud that allow integrating completely different information sources with Amazon Redshift:

Answer overview

Organizations within the healthcare trade are at present spending a major quantity of money and time on constructing complicated ETL pipelines for information motion and integration. This implies information can be replicated throughout a number of information shops through bespoke and in some instances hand-written ETL jobs, leading to information inconsistency, latency, and potential safety and privateness breaches.

With assist for querying cross-account Apache Iceberg tables through Amazon Redshift, now you can construct a extra complete patient-360 evaluation by querying all affected person information from one place. This implies you may seamlessly mix info akin to scientific information saved in HealthLake with information saved in operational databases akin to a affected person relationship administration system, along with information produced from wearable gadgets in close to real-time. Gaining access to all this information allows healthcare organizations to kind a holistic view of sufferers, enhance care coordination throughout a number of organizations, and supply extremely customized take care of every particular person.

The next diagram depicts the high-level resolution we construct to attain these outcomes.

Deploy the answer

You should utilize the next AWS CloudFormation template to deploy the answer elements:

This stack creates the next assets and needed permissions to combine the companies:

AWS Answer setup

AWS HealthLake

AWS HealthLake allows organizations within the well being trade to securely retailer, remodel, transact, and analyze well being information. It shops information in HL7 FHIR format, which is an interoperability normal designed for fast and environment friendly alternate of well being information. Once you create a HealthLake information retailer, a Quick Healthcare Interoperability Sources (FHIR) information repository is made accessible through a RESTful API endpoint. Concurrently and as a part of AWS HealthLake managed service, the nested JSON FHIR information undergoes an ETL course of and is saved in Apache Iceberg open desk format in Amazon S3.

To create an AWS HealthLake information retailer, check with Getting began with AWS HealthLake. Ensure to pick the choice Preload pattern information when creating your information retailer.

In real-world eventualities and whenever you use AWS HealthLake in manufacturing environments, you don’t have to load pattern information into your AWS HealthLake information retailer. As an alternative, you need to use FHIR REST API operations to handle and search assets in your AWS HealthLake information retailer.

We use two tables from the pattern information saved in HealthLake: affected person and allergyintolerance.

Question AWS HealthLake tables with Redshift Serverless

Amazon Redshift is the information warehousing service accessible on the AWS Cloud that gives as much as six occasions higher price-performance than some other cloud information warehouses available in the market, with a totally managed, AI-powered, massively parallel processing (MPP) information warehouse constructed for efficiency, scale, and availability. With steady improvements added to Amazon Redshift, it’s now greater than only a information warehouse. It allows organizations of various sizes and in numerous industries to entry all the information they’ve of their AWS environments and analyze it from one single location with a set of options beneath the zero-ETL umbrella. Amazon Redshift integrates with AWS HealthLake and information lakes by Redshift Spectrum and Amazon S3 auto-copy options, enabling you to question information straight from recordsdata on Amazon S3.

Question AWS HealthLake information with Amazon Redshift

Amazon Redshift makes it easy to question the information saved in S3-based information lakes with automated mounting of an AWS Glue Knowledge Catalog within the Redshift question editor v2. This implies you now not should create an exterior schema in Amazon Redshift to make use of the information lake tables cataloged within the Knowledge Catalog. To get began with this function, see Querying the AWS Glue Knowledge Catalog. After it’s arrange and also you’re related to the Redshift question editor v2, full the next steps:

  1. Validate that your tables are seen within the question editor V2. The Knowledge Catalog objects are listed beneath the awsdatacatalog database.

FHIR information saved in AWS HealthLake is extremely nested. To study the right way to un-nest semi-structured information with Amazon Redshift, see Tutorial: Querying nested information with Amazon Redshift Spectrum.

  1. Use the next question to un-nest the allergyintolerance and affected person tables, be part of them collectively, and get affected person particulars and their allergic reactions:
    WITH patient_allergy AS 
    (
        SELECT
            resourcetype, 
            c AS allery_category,
            a."affected person"."reference",
            SUBSTRING(a."affected person"."reference", 9, LEN(a."affected person"."reference")) AS patient_id,
            a.recordeddate AS allergy_record_date,
            NVL(cd."code", 'NA') AS allergy_code,
            NVL(cd.show, 'NA') AS allergy_description
    
        FROM "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."allergyintolerance" a
                LEFT JOIN a.class c ON TRUE
                LEFT JOIN a.response r ON TRUE
                LEFT JOIN r.manifestation m ON TRUE
                LEFT JOIN m.coding cd ON TRUE
    ), patinet_info AS
    (
        SELECT id,
                gender,
                g as given_name,
                n.household as family_name,
                pr as prefix
    
        FROM "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."affected person" p
                LEFT JOIN p.identify n ON TRUE
                LEFT JOIN n.given g ON TRUE
                LEFT JOIN n.prefix pr ON TRUE
    )
    SELECT DISTINCT p.id, 
            p.gender, 
            p.prefix,
            p.given_name,
            p.family_name,
            pa.allery_category,
            pa.allergy_code,
            pa.allergy_description
    from patient_allergy pa
        JOIN patinet_info p
            ON pa.patient_id = p.id
    ORDER BY p.id, pa.allergy_code
    ;
    

To eradicate the necessity for Amazon Redshift to un-nest information each time a question is run, you may create a materialized view to carry un-nested and flattened information. Materialized views are an efficient mechanism to take care of complicated and repeating queries. They comprise a precomputed outcome set, primarily based on a SQL question over a number of base tables. You possibly can difficulty SELECT statements to question a materialized view, in the identical approach you can question different tables or views within the database.

  1. Use the next SQL to create a materialized view. You employ it later to construct an entire view of sufferers:
    CREATE MATERIALIZED VIEW patient_allergy_info AUTO REFRESH YES AS
    WITH patient_allergy AS 
    (
        SELECT
            resourcetype, 
            c AS allery_category,
            a."affected person"."reference",
            SUBSTRING(a."affected person"."reference", 9, LEN(a."affected person"."reference")) AS patient_id,
            a.recordeddate AS allergy_record_date,
            NVL(cd."code", 'NA') AS allergy_code,
            NVL(cd.show, 'NA') AS allergy_description
    
        FROM
            "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."allergyintolerance" a
                LEFT JOIN a.class c ON TRUE
                LEFT JOIN a.response r ON TRUE
                LEFT JOIN r.manifestation m ON TRUE
                LEFT JOIN m.coding cd ON TRUE
    ), patinet_info AS
    (
        SELECT id,
                gender,
                g as given_name,
                n.household as family_name,
                pr as prefix
    
        FROM "awsdatacatalog"."datastore_01_179674d36391d68926a8d74c12599306_healthlake_view"."affected person" p
                LEFT JOIN p.identify n ON TRUE
                LEFT JOIN n.given g ON TRUE
                LEFT JOIN n.prefix pr ON TRUE
    )
    SELECT DISTINCT p.id, 
            p.gender, 
            p.prefix,
            p.given_name,
            p.family_name,
            pa.allery_category,
            pa.allergy_code,
            pa.allergy_description
    from patient_allergy pa
        JOIN patinet_info p
            ON pa.patient_id = p.id
    ORDER BY p.id, pa.allergy_code
    ;
    

You’ve got confirmed you may question information in AWS HealthLake through Amazon Redshift. Subsequent, you arrange zero-ETL integration between Amazon Redshift and Amazon Aurora MySQL.

Arrange zero-ETL integration between Amazon Aurora MySQL and Redshift Serverless

Purposes akin to front-desk software program, that are used to schedule appointments and register new sufferers, retailer information in OLTP databases akin to Aurora. To get information out of OLTP databases and have them prepared for analytics use instances, information groups might need to spend a substantial period of time to construct, check, and deploy ETL jobs which are complicated to take care of and scale.

With the Amazon Redshift zero-ETL integration with Amazon Aurora MySQL, you may run analytics on the information saved in OLTP databases and mix them with the remainder of the information in Amazon Redshift and AWS HealthLake in close to actual time. Within the subsequent steps on this part, we connect with a MySQL database and arrange zero-ETL integration with Amazon Redshift.

Connect with an Aurora MySQL database and arrange information

Connect with your Aurora MySQL database utilizing your editor of alternative utilizing AdminUsername and AdminPassword that you just entered when operating the CloudFormation stack. (For simplicity, it’s the similar for Amazon Redshift and Aurora.)

Once you’re related to your database, full the next steps:

  1. Create a brand new database by operating the next command:
    CREATE DATABASE front_desk_app_db;

  2. Create a brand new desk. This desk simulates storing affected person info as they go to clinics and different healthcare facilities. For simplicity and to exhibit particular capabilities, we assume that affected person IDs are the identical in AWS HealthLake and the front-of-office software. In real-world eventualities, this is usually a hashed model of a nationwide well being care quantity:
    CREATE TABLE patient_appointment ( 
          patient_id varchar(250), 
          gender varchar(1), 
          date_of_birth date, 
          appointment_datetime datetime, 
          phone_number varchar(15), 
          PRIMARY KEY (patient_id, appointment_datetime) 
    );

Having a major key within the desk is obligatory for zero-ETL integration to work.

  1. Insert new information into the supply desk within the Aurora MySQL database. To exhibit the required functionalities, ensure that the patient_id of the pattern information inserted into the MySQL database match those in AWS HealthLake. Substitute [patient_id_1] and [patient_id_2] within the following question with those from the Redshift question you ran beforehand (the question that joined allergyintolerance and affected person):
    INSERT INTO front_desk_app_db.patient_appointment (patient_id, gender, date_of_birth, appointment_datetime, phone_number)
    
    VALUES([PATIENT_ID_1], 'F', '1988-7-04', '2023-12-19 10:15:00', '0401401401'),
    ([PATIENT_ID_1], 'F', '1988-7-04', '2023-09-19 11:00:00', '0401401401'),
    ([PATIENT_ID_1], 'F', '1988-7-04', '2023-06-06 14:30:00', '0401401401'),
    ([PATIENT_ID_2], 'F', '1972-11-14', '2023-12-19 08:15:00', '0401401402'),
    ([PATIENT_ID_2], 'F', '1972-11-14', '2023-01-09 12:15:00', '0401401402');

Now that your supply desk is populated with pattern information, you may arrange zero-ETL and have information ingested into Amazon Redshift.

Arrange zero-ETL integration between Amazon Aurora MySQL and Amazon Redshift

Full the next steps to create your zero-ETL integration:

  1. On the Amazon RDS console, select Databases within the navigation pane.
  2. Select the DB identifier of your cluster (not the occasion).
  3. On the Zero-ETL Integration tab, select Create zero-ETL integration.
  4. Comply with the steps to create your integration.

Create a Redshift database from the mixing

Subsequent, you create a goal database from the mixing. You are able to do this by operating a few easy SQL instructions on Amazon Redshift. Log in to the question editor V2 and run the next instructions:

  1. Get the mixing ID of the zero-ETL you arrange between your supply database and Amazon Redshift:
    SELECT * FROM svv_integration;

  2. Create a database utilizing the mixing ID:
    CREATE DATABASE ztl_demo FROM INTEGRATION '[INTEGRATION_ID ';

  3. Query the database and validate that a new table is created and populated with data from your source MySQL database:
    SELECT * FROM ztl_demo.front_desk_app_db.patient_appointment;

It might take a few seconds for the first set of records to appear in Amazon Redshift.

This shows that the integration is working as expected. To validate it further, you can insert a new record in your Aurora MySQL database, and it will be available in Amazon Redshift for querying in near real time within a few seconds.

Set up streaming ingestion for Amazon Redshift

Another aspect of zero-ETL on AWS, for real-time and streaming data, is realized through Amazon Redshift Streaming Ingestion. It provides low-latency, high-speed ingestion of streaming data from Kinesis Data Streams and Amazon MSK. It lowers the effort required to have data ready for analytics workloads, lowers the cost of running such workloads on the cloud, and decreases the operational burden of maintaining the solution.

In the context of healthcare, understanding an individual’s exercise and movement patterns can help with overall health assessment and better treatment planning. In this section, you send simulated data from wearable devices to Kinesis Data Streams and integrate it with the rest of the data you already have access to from your Redshift Serverless data warehouse.

For step-by-step instructions, refer to Real-time analytics with Amazon Redshift streaming ingestion. Note the following steps when you set up streaming ingestion for Amazon Redshift:

  1. Select wearables_stream and use the following template when sending data to Amazon Kinesis Data Streams via Kinesis Data Generator, to simulate data generated by wearable devices. Replace [PATIENT_ID_1] and [PATIENT_ID_2] with the affected person IDs you earlier when inserting new information into your Aurora MySQL desk:
    {
       "patient_id": "{{random.arrayElement(["[PATIENT_ID_1]"," [PATIENT_ID_2]"])}}",
       "steps_increment": "{{random.arrayElement(
          [0,1]
       )}}",
       "heart_rate": {{random.quantity( 
          {
             "min":45,
             "max":120}
       )}}
    }

  2. Create an exterior schema referred to as from_kds by operating the next question and changing [IAM_ROLE_ARN] with the ARN of the position created by the CloudFormation stack (Patient360BlogRole):
    CREATE EXTERNAL SCHEMA from_kds
    FROM KINESIS
    IAM_ROLE '[IAM_ROLE_ARN]';

  3. Use the next SQL when making a materialized view to eat information from the stream:
    CREATE MATERIALIZED VIEW patient_wearable_data AUTO REFRESH YES AS 
    SELECT approximate_arrival_timestamp, 
          JSON_PARSE(kinesis_data) as Knowledge FROM from_kds."wearables_stream" 
    WHERE CAN_JSON_PARSE(kinesis_data);

  4. To validate that streaming ingestion works as anticipated, refresh the materialized view to get the information you already despatched to the information stream and question the desk to ensure information has landed in Amazon Redshift:
    REFRESH MATERIALIZED VIEW patient_wearable_data;
    
    SELECT *
    FROM patient_wearable_data
    ORDER BY approximate_arrival_timestamp DESC;

Question and analyze affected person wearable information

The leads to the information column of the previous question are in JSON format. Amazon Redshift makes it easy to work with semi-structured information in JSON format. It makes use of PartiQL language to supply SQL-compatible entry to relational, semi-structured, and nested information. Use the next question to flatten information:

SELECT information."patient_id"::varchar AS patient_id,       
      information."steps_increment"::integer as steps_increment,       
      information."heart_rate"::integer as heart_rate, 
      approximate_arrival_timestamp 
FROM patient_wearable_data 
ORDER BY approximate_arrival_timestamp DESC;

The outcome appears like the next screenshot.

Now that you know the way to flatten JSON information, you may analyze it additional. Use the next question to get the variety of minutes a affected person has been bodily energetic per day, primarily based on their coronary heart charge (better than 80):

WITH patient_wearble_flattened AS
(
   SELECT information."patient_id"::varchar AS patient_id,
      information."steps_increment"::integer as steps_increment,
      information."heart_rate"::integer as heart_rate,
      approximate_arrival_timestamp,
      DATE(approximate_arrival_timestamp) AS date_received,
      extract(hour from approximate_arrival_timestamp) AS    hour_received,
      extract(minute from approximate_arrival_timestamp) AS minute_received
   FROM patient_wearable_data
), patient_active_minutes AS
(
   SELECT patient_id,
      date_received,
      hour_received,
      minute_received,
      avg(heart_rate) AS heart_rate
   FROM patient_wearble_flattened
   GROUP BY patient_id,
      date_received,
      hour_received,
      minute_received
   HAVING avg(heart_rate) > 80
)
SELECT patient_id,
      date_received,
      COUNT(heart_rate) AS active_minutes_count
FROM patient_active_minutes
GROUP BY patient_id,
      date_received
ORDER BY patient_id,
      date_received;

Create an entire affected person 360

Now that you’ll be able to question all affected person information with Redshift Serverless, you may mix the three datasets you used on this submit and kind a complete affected person 360 view with the next question:

WITH patient_appointment_info AS
(
      SELECT "patient_id",
         "gender",
         "date_of_birth",
         "appointment_datetime",
         "phone_number"
      FROM ztl_demo.front_desk_app_db.patient_appointment
),
patient_wearble_flattened AS
(
      SELECT information."patient_id"::varchar AS patient_id,
         information."steps_increment"::integer as steps_increment,
         information."heart_rate"::integer as heart_rate,
         approximate_arrival_timestamp,
         DATE(approximate_arrival_timestamp) AS date_received,
         extract(hour from approximate_arrival_timestamp) AS hour_received,
         extract(minute from approximate_arrival_timestamp) AS minute_received
      FROM patient_wearable_data
), patient_active_minutes AS
(
      SELECT patient_id,
         date_received,
         hour_received,
         minute_received,
         avg(heart_rate) AS heart_rate
      FROM patient_wearble_flattened
      GROUP BY patient_id,
         date_received,
         hour_received,
         minute_received
         HAVING avg(heart_rate) > 80
), patient_active_minutes_count AS
(
      SELECT patient_id,
         date_received,
         COUNT(heart_rate) AS active_minutes_count
      FROM patient_active_minutes
      GROUP BY patient_id,
         date_received
)
SELECT pai.patient_id,
      pai.gender,
      pai.prefix,
      pai.given_name,
      pai.family_name,
      pai.allery_category,
      pai.allergy_code,
      pai.allergy_description,
      ppi.date_of_birth,
      ppi.appointment_datetime,
      ppi.phone_number,
      pamc.date_received,
      pamc.active_minutes_count
FROM patient_allergy_info pai
      LEFT JOIN patient_active_minutes_count pamc
            ON pai.patient_id = pamc.patient_id
      LEFT JOIN patient_appointment_info ppi
            ON pai.patient_id = ppi.patient_id
GROUP BY pai.patient_id,
      pai.gender,
      pai.prefix,
      pai.given_name,
      pai.family_name,
      pai.allery_category,
      pai.allergy_code,
      pai.allergy_description,
      ppi.date_of_birth,
      ppi.appointment_datetime,
      ppi.phone_number,
      pamc.date_received,
      pamc.active_minutes_count
ORDER BY pai.patient_id,
      pai.gender,
      pai.prefix,
      pai.given_name,
      pai.family_name,
      pai.allery_category,
      pai.allergy_code,
      pai.allergy_description,
      ppi.date_of_birth DESC,
      ppi.appointment_datetime DESC,
      ppi.phone_number DESC,
      pamc.date_received,
      pamc.active_minutes_count

You should utilize the answer and queries used right here to broaden the datasets utilized in your evaluation. For instance, you may embrace different tables from AWS HealthLake as wanted.

Clear up

To wash up assets you created, full the next steps:

  1. Delete the zero-ETL integration between Amazon RDS and Amazon Redshift.
  2. Delete the CloudFormation stack.
  3. Delete AWS HealthLake information retailer

Conclusion

Forming a complete 360 view of sufferers by integrating information from numerous completely different sources presents quite a few advantages for organizations working within the healthcare trade. It allows healthcare suppliers to achieve a holistic understanding of a affected person’s medical journey, enhances scientific decision-making, and permits for extra correct prognosis and tailor-made therapy plans. With zero-ETL options for information integration on AWS, it’s easy to construct a view of sufferers securely, cost-effectively, and with minimal effort.

You possibly can then use visualization instruments akin to Amazon QuickSight to construct dashboards or use Amazon Redshift ML to allow information analysts and database builders to coach machine studying (ML) fashions with the information built-in by Amazon Redshift zero-ETL. The result’s a set of ML fashions which are skilled with a broader view into sufferers, their medical historical past, and their life-style, and subsequently allow you make extra correct predictions about their upcoming well being wants.


Concerning the Authors

Saeed Barghi is a Sr. Analytics Specialist Options Architect specializing in architecting enterprise information platforms. He has in depth expertise within the fields of knowledge warehousing, information engineering, information lakes, and AI/ML. Based mostly in Melbourne, Australia, Saeed works with public sector prospects in Australia and New Zealand.

Satesh Sonti is a Sr. Analytics Specialist Options Architect primarily based out of Atlanta, specialised in constructing enterprise information platforms, information warehousing, and analytics options. He has over 17 years of expertise in constructing information property and main complicated information platform packages for banking and insurance coverage shoppers throughout the globe.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here

Stay on op - Ge the daily news in your inbox