adobe/acp-data-services-etl-reference

Name: acp-data-services-etl-reference

Owner: Adobe Systems Incorporated

Description: Examples for ETL Integrations with Adobe Cloud Platform - Data Services

Created: 2018-04-05 22:42:21.0

Updated: 2018-04-25 15:22:46.0

Pushed: 2018-04-25 15:21:56.0

Homepage: https://www.adobe.io/apis/cloudplatform/dataservices/services/allservices.html#!api-specification/markdown/narrative/integration_guides/etl_integration_guide/etl_integration_guide.md

Size: 147

Language: Java

GitHub Committers

UserMost Recent Commit# Commits

Other Committers

UserEmailMost Recent Commit# Commits

README

ETL Ecosystem Integration Reference Code

This repository contains example code for integration with Adobe Cloud Platform(ACP) via exposed HTTP APIs. Example code mainly covers following ACP Services

Reference documentation for integrating ETL tools with Adobe Cloud Platform - Data Services can be found here

Content
examples

This folder contains java code for various operations on Adobe Cloud Platform. Few are some of the major implementation examples it has

parquetio

This folder contains java code for parquet operations. This uses Hadoop's parquet library. Following examples can be found under it

Examples have snappy compression used for Parquet files.

Building

Java 8 is required to build this. This is a maven project. One can build it locally using

clean install

Once it is built, following artefacts will be created

ecosystem-refs/examples/target/ecosystem-examples.jar
ecosystem-refs/parquetio/target/parquet-io.jar
Usage

As a result of building two jars will be created, ecosystem-examples.jar and parquet-io.jar. Both are required to be added as dependency to the target project. Once the project is setup, following snippets can be used to write your own code

Authentication

User must obtain authentication details from Adobe.io Authentication Flow

Following snippet can be used to generate access token which can be used to call Adobe.io APIs

p<String, String> connectionAttributes = new HashMap<String, String>();
nnectionAttributes.put(SDKConstants.CREDENTIAL_PRIVATE_KEY_PATH, path_where_secret_key_is_kept); //From Adobe.io auth flow
nnectionAttributes.put(SDKConstants.CONNECTION_ENV_KEY, "prod");
nnectionAttributes.put(SDKConstants.CREDENTIAL_SECRET_KEY, secret_key);   //From Adobe.io auth flow
nnectionAttributes.put(SDKConstants.CREDENTIAL_CLIENT_KEY, client_id);   //From Adobe.io auth flow
nnectionAttributes.put(SDKConstants.CREDENTIAL_TECHNICAL_ACCOUNT_KEY, technical_account_id);   //From Adobe.io auth flow
nnectionAttributes.put(SDKConstants.CREDENTIAL_IMS_ORG_KEY, organization_id);   //From admin
nnectionAttributes.put(SDKConstants.CREDENTIAL_META_SCOPE_KEY, "ent_dataservices_sdk");

This will give you access token string
ring access_token = ConnectorSDKUtil.getInstance().getAccessToken();
List Catalog Dataset Entities

Following snippet helps in listing datasets. OFFSET can be used to page through datasets.

logService cs = CatalogFactory.getCatalogService();
<DataSet> datasets = cs.getDataSets(ims_org_id, access_token, OFFSET,CatalogAPIStrategy.ONCE);
Get Catalog Dataset Entity

Following snippet helps in getting single dataset by Id

logService cs = CatalogFactory.getCatalogService();
set ds = cs.getDataSet(ims_org_id, access_token, dataset_id);
Read data from Dataset

Following snippet helps in reading data from platform

aWiring dataWiring = new DataWiring(ims_org_id, dataset_object);
<String,String> readAttr = new HashMap<String,String>();

ptional start - Helps in paginating amongst batches in catalog
dAttr.put(SDKConstants.CONNECTOR_READ_ATTRIBUTE_EPOCHTIME, 1523096834);
dAttr.put(SDKConstants.CONNECTOR_READ_ATTRIBUTE_DURATION, 86400000);
ptional end

der platformReader = dataWiring.dataReaderFactory().getReader(readAttr);
NArray rows = null;
le(platformReader.hasMoreData()) {
ows = platformReader.read(num_of_rows);
rocess(rows);

Write data into Dataset

Following snippet helps in writing data to platform

aWiring dataWiring = new DataWiring(ims_org_id, dataset_object);
teAttributes writeAttributes = new WriteAttributes.WriteAttributesBuilder().
                withFlushStrategy(true).
                withSizeOfRecord(maximum_size_of_single_record).
                build();

ptional start - Helps in paginating amongst batches in catalog
dAttr.put(SDKConstants.CONNECTOR_READ_ATTRIBUTE_EPOCHTIME, 1523096834);
dAttr.put(SDKConstants.CONNECTOR_READ_ATTRIBUTE_DURATION, 86400000);
ptional end

ter platformWriter = dataWiring.dataWriterFactory().getWriter(writeAttributes);
t<SDKField> sdkFields = new ArrayList<SDKField>();
dd dataset fields in sdkFields object. For hierarchical schema in dataset you can get flatten fields.
t<List<Object>> dataTable = new ArrayList<List<Object>>();
ayList<Object> dataRow = new ArrayList<Object>();
aRow.add("1");
aRow.add("Stephen");
aRow.add("30");
aRow.add("stephen@stephen891820.com");
aRow.add("1");
aTable.add(dataRow);



 returnStatus = platformWriter.write(sdkFields, dataTable);

This work is supported by the National Institutes of Health's National Center for Advancing Translational Sciences, Grant Number U24TR002306. This work is solely the responsibility of the creators and does not necessarily represent the official views of the National Institutes of Health.