Overview

This document explains how to configure AWS Athena plugin, generate CSV files using the feed reportAthenaETLFeed, configure S3 details, query report model and report params for Reporting service.

Note: A sample order data has been used in this document.

About AWS Athena Plugin

  • Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage.
  • Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. 
  • Athena scales automatically—executing queries in parallel—so results are fast, even with large datasets and complex queries.

Why Amazon Athena

With Athena, you can:

  • generate reports
  • explore data with business intelligence tools
  • connect SQL clients with a JDBC or an ODBC driver

Prerequisite

To use Athena, S3 and Athena should be in the same region and should have complete access. For obtaining complete access, contact the Infosys Equinox Infrastructure team.

Transform and Upload Order Data

Since Athena uses S3 as its data source, you need to upload the data to S3 first. You can use the Infosys Equinox Feed service to transform and upload order data to S3. Infosys Equinox accesses order data which is uploaded to S3 by querying through AWS Athena via the Report service using the following steps:

  • configure Infosys Equinox Feed to extract, transform into .csv files, and upload to S3.
  • create Report models in the Report service. The SQL query details are configured in the Report models.
  • use report service to connect with Athena and execute the query. The Athena service runs a query on S3 files in the defined path and displays the result.

Order Data Flow Diagram

This diagram illustrates the order data flow from Order DB to S3 and  Infosys Equinox Reporting service to AWS (Athena and S3):

Using the Feed reportAthenaETLFeed

The feed reportAthenaETLFeed extracts MongoDB document of Order service from omsdb and transforms it into the structured .csv files. It uploads the .csv files into AWS S3.

Transformation Process

The steps involved in the transformation process are:

  1. Delete the previously created files in the output location.
  2. Parse the serviceConfig field of the Order and Feed services to fetch the Order and Feed DB details. While creating the feed, you have to specify the Feed and Order services in the serviceConfig field.
  3. Read the last successful run time from the feed run history table of the Feed DB. If the feed runs for the first time, extract the last run time from the job parameter or the last run time is calculated as past 24-hours from the current feed run time.
  4. Using the last run time from the step 3, the feed job queries the OMS DB and fetches the orders created and updated since the last run time till the current feed run time, and transforms it into the .csv files with the specific fields. There are five .csv files created for this feed.
  5. Check whether the job created the new files or not. If no orders were created or updated in a specific period, it indicates no files were created. In this case, you should abort the job and wait for the next run. If the new files were created, you upload all the files to AWS S3 and wait for the next feed run.

Input Params

The following table lists job parameters required for the feed job reportAthenaETLFeed

ParameterDescription
ordercollectionIndicates the name of the order collection to fetch orders.
orderdbIndicates the name of the omsdb in that environment
s3AccessKeySpecifies the access key of the AWS Identity and Access Management (IAM) role to connect with S3.
s3bucketIndicates the name of the S3 bucket to upload the .csv files.
s3BucketRegionSpecifies the region of the S3 bucket.
s3pathSpecifies the S3 bucket path where the .csv files are uploaded.
s3SecretKeySpecifies the secret key of the AWS Identity and Access Management (IAM) role to connect with S3.

Creating CSV Files

The following files are created by the feed job reportAthenaETLFeed. Each segment of files considered a separate table. The files are appended with an underscore and the timestamp (syntax: _YYYYMMDDHHmmSS) followed by an index starting from 0 while writing in the local path. In a segment, each file contains 1000 records. If a segment has more than 1000 records, it will create a new file by increasing the index by one.

The CSV files are:

Order Summary

The syntax for the file name is ordersummary_<_YYYYMMDDHHmmSS>_<IDX>.

The following table lists fields that are available in the order summary file:

Field NameDesccription
uploadtimeSpecifies the time when the file was created.
orderidIndicates the unique ID of the order.
collectionidIndicates the collection ID under which the order was created.
createdtimeContains the time when the order was created.
updatedtimeSpecifies the time when the order was last updated.
createdbyIndicates the user who has created the order in theAdmin console.
storeidIndicates the store in which the order was placed.
useridSpecifies end user who has created the order.
totalsaleContains the total sale of the order calculated from Math.
totaldiscountIndicates the total discount of the order calculated from Math.
totaltaxIndicates the total tax of the order calculated from Math.

Order Details

The syntax for the file name is orderdetails_<_YYYYMMDDHHmmSS>_<IDX>.

The following table lists fields that are available in the order details file:

Field NameDescription
uploadtimeSpecifies the time when the file was created.
orderidSpecifies the unique ID of the order.
collectionidSpecifies the collection ID under which the order was created.
createdtimeSpecifies the time when the order was created.
updatedtimeSpecifies the time when the order was last updated.
createdbySpecifies the user who has created the order in the Admin console.
storeidIndicates the store ID in which the order was placed.
useridSpecifies the end user who has placed the order.
firstnameIndicates the first name of the user.
lastnameIndicates the last name of the user.
itemidIndicates the ID of the item in the order.
productidContains the ID of the product from the catalog.
productnameIndicates the name of the product.
skuSpecifies the SKU ID of the product.
quantityIndicates the quantity placed.
styleContains the style type.
colorIndicates the color of the product placed.
fitIndicates the fit type of the product.
sizeSpecifies the size of the product placed.
categoryidIndicates the category ID under which the product was placed.
vendorSpecifies the vendor name of the product.
itemstatusIndicates the status of the item placed.
paymentidIndicates the payment ID in the order.
paymentmodeIndicates the mode of payment.
paymentvalueIndicates the total amount in the payment.
paymenttypeIndicates the payment type.
currencySpecifies the currency type of the payment.

Order Payment

The syntax for the file name is orderpayment_<_YYYYMMDDHHmmSS>_<IDX>.

The following table lists fields that are available in the order payment file:

Field NameDescription
uploadtimeSpecifies the time when the file was created.
orderidContains the unique ID of the order.
collectionidIndicates the collection ID under which the order was created.
createdtimeIndicates the time when the order was created.
updatedtimeIndicates the time when the order was last updated.
createdbyIndicates the user who has created the order in Admin console.
storeidIndicates the store in which the order was placed.
useridSpecifies the end user who has created the order.
paymentidContains the payment ID in the order.
paymentmodeIndicates the mode of payment.
paymentvalueSpecifies the total amount in the payment.
paymenttypeIndicates the payment type.
paymentstatusContains the payment status.

Order Product

The syntax for the file name is orderproduct_<_YYYYMMDDHHmmSS>_<IDX>.

The following table lists fields that are available in the order product file:

Field NameDescription
uploadtimeSpecifies the time when the file was created.
orderidSpecifies the unique ID of the order.
collectionidIndicates the collection ID under which the order was created.
createdtimeSpecifies the time when the order has been created
updatedtimeIndicates the time when the order was last updated
createdbyIndicates the user who created the order in admin.
storeidIndicates the store in which the order was placed.
useridIndicates the end user ID who has created the order.
firstnameIndicates the first name of the user.
lastnameIndicates the last name of the user.
itemidIndicates the ID of the item in the order.
productidSpecifies the ID of the product from the catalog.
productnameSpecifies the name of the product.
skuSpecifies the SKU ID of the product.
quantityContains the quantity placed.
styleIndicates the style type.
colorSpecifies the color of the product placed.
fitIndicates the fit type of the product.
sizeContains the size of the product placed.
categoryidSpecifies the category ID under which the product was placed.
vendorContains the vendor name of the product.
unitpriceSpecifies the price of the product per unit.
salepriceContains the sale price in the order.
modeIndicates the payment mode.
totalsaleIndicates the total sale of the product.

Order Promotion

The syntax for the file name is orderpromotion_<_YYYYMMDDHHmmSS>_<IDX>.

The following table lists fields that are available in the order promotion file:

Field NameDescription
uploadtimeIndicates the time when the file has been created.
orderidSpecifies unique ID of the order.
collectionidIndicates the collection ID under which the order was created.
createdtimeContains the time when the order was created.
updatedtimeIndicates the time when the order was last updated.
createdbyContains the user who has created the order in Admin console.
storeidIndicates the store in which the order was placed.
useridIndicates the end user who has created the order.
firstnameIndicates the first name of the user.
lastnameSpecifies the last name of the user.
totalordersaleIndicates the total sale value of the order.
totalordertaxContains the total tax of the order.
totalorderdiscountIndicates the total discount in the order.
offeridSpecifies the offer ID.
promocodeIndicates the promocode used.
promonameSpecifies the promotion name.
promolevelIndicates the promotion level.
modeIndicates the mode.
totaldiscountIndicates the total discount by the promotion.

Configuring S3 Details

The feed reportAthenaETLFeed is scheduled to generate and upload .csv files to S3. The report service uses athena-report-plugin to connect with Athena and query the .csv files in S3. The S3 details required by athena-report-plugin are given in the collection properties. For more information, see Collection Properties – Reporting.

Query Details in Reporting Model

Find below a sample query details in the Report model:

{
“description”: “totalsales report”,
“downloadable”: false,
“name”: “totalsales2”,
“processorClassName”: “com.skava.athena.report.AthenaReportDataProcessor”,
“reportProperties”: [ {
“name”: “DDLQuery”,
“value”: “CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.ordersummary ( uploadtime bigint, orderid string, collectionid string,createdtime bigint, updatedtime bigint, createdby string, storeid string, userid string, totalsale double, totaldiscount double, totaltax double ) ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES (‘serialization.format’ = ‘,’, ‘field.delim’ = ‘,’ ) LOCATION ‘s3://<S3_BUCKET_NAME>/<FOLDER_PATH>/‘ TBLPROPERTIES (‘has_encrypted_data’=’false’,’skip.header.line.count’=’1′);”
},{
“name”: “query”,
“value”: “Select count(t.orderid) as count, sum(t.totalsale+t.totaltax) as total, avg(t.totalsale+t.totaltax) as average from sampledb.ordersummary t inner join (SELECT b.orderid, MAX(b.uploadtime) as max_date FROM sampledb.ordersummary b GROUP BY b.orderid) a on a.orderid = t.orderid and a.max_date = uploadtime”
} ]
}

  • The syntax for the S3 location is s3://<S3_BUCKET_NAME>/<FOLDER_PATH>/; for example, s3://assets-internal-ecom-skavacommerce-com/report/athena/orders/ordersummary/.
  • Note in the above model, the report properties have two queries:
    1. DDL Query: In Athena, you need to create a table to query the .csv files in S3. You need to specify all the fields in the file to identify the fields by the Athena query engine. This process can be done in one of the two ways:
      • In the Athena console, create a table (DDL Query) with the fields as available in the .csv file, manually.
      • Using Report microservice, create a table by proving the DDLQuery property as shown in the above model.
    2. DML Query: Aggregate query to present the order data in the dashboard.

Report Model and Report Params

While querying CSV, you may pass some data dynamically in the request call getreportdata API

. Also, while creating a report model, you can define the params that are going to be used. These params are surrounded with ‘?{‘ and ‘}’ (?{param_name}) in queries in the Report properties. For example, in the above DML queries, storeid, stattime, and endtime have been used as params. While creating the Report model, if you have used any params in the queries, you should also create relevant reportParam as well. 

You need to pass these params as query param in the JSON format in the request call getreportdata API.

ReportIdReport NameCreated Report ModelSample Params
1QUICKSALE_SUMMARY
{
"description":"quick sale summary",
"downloadable":false,
"name":"quicksalesummary",
"processorClassName":"com.skava.athena.report.AthenaReportDataProcessor",
"reportProperties":[
{"name":"DDLQuery","value":"QUICKSALE_SUMMARY_DDL"},
{"name":"query","value":"QUICKSALE_SUMMARY_DML"}
],
"reportParams":[
{"name":"datepart","sequence":1,"type":"STRING"},
{"name":"storeid","sequence":2,"type":"STRING"},
{"name":"starttime","sequence":3,"type":"STRING"},
{"name":"endtime","sequence":4,"type":"STRING"}
]
}
{"datepart":"day", "storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"}

The supported date parts are:

  • YEAR
  • QUARTER
  • MONTH
  • DAY
  • HOUR
2TOTAL_SALES
{
"description":"top sales",
"downloadable":false,
"name":"topsales",
"processorClassName":"com.skava.athena.report.AthenaReportDataProcessor",
"reportProperties":[
{"name":"DDLQuery","value":"TOTAL_SALES_DDL"},
{"name":"query","value":"TOTAL_SALES_DML"}
],
"reportParams":[
{"name":"storeid","sequence":1,"type":"STRING"},
{"name":"starttime","sequence":2,"type":"STRING"},
{"name":"endtime","sequence":3,"type":"STRING"}
]
}
{"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"}
3PAYMENT_TYPES
{
"description":"payment types",
"downloadable":false,
"name":"paymenttypes",
"processorClassName":"com.skava.athena.report.AthenaReportDataProcessor",
"reportProperties":[
{"name":"DDLQuery","value":"PAYMENT_TYPES_DDL"},
{"name":"query","value":"PAYMENT_TYPES_DML"}
],
"reportParams":[
{"name":"storeid","sequence":1,"type":"STRING"},
{"name":"starttime","sequence":2,"type":"STRING"},
{"name":"endtime","sequence":3,"type":"STRING"}
]
}
{"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"}
4PROMOTIONS
{
"description":"promotion details",
"downloadable":false,
"name":"promotions",
"processorClassName":"com.skava.athena.report.AthenaReportDataProcessor",
"reportProperties":[
{"name":"DDLQuery","value":"PROMOTIONS_DDL"},
{"name":"query","value":"PROMOTIONS_DML"}
],
"reportParams":[
{"name":"storeid","sequence":1,"type":"STRING"},
{"name":"starttime","sequence":2,"type":"STRING"},
{"name":"endtime","sequence":3,"type":"STRING"}
]
}
{"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"}
5TOP_PRODUCTS
{
"description":"top products",
"downloadable":false,
"name":"topproducts",
"processorClassName":"com.skava.athena.report.AthenaReportDataProcessor",
"reportProperties":[
{"name":"DDLQuery","value":"TOP_PRODUCTS_DDL"},
{"name":"query","value":"TOP_PRODUCTS_DML"}
],
"reportParams":[
{"name":"storeid","sequence":1,"type":"STRING"},
{"name":"starttime","sequence":2,"type":"STRING"},
{"name":"endtime","sequence":3,"type":"STRING"}
]
}
{"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"}
6TOP_USERS
{
"description":"top users",
"downloadable":false,
"name":"topusers",
"processorClassName":"com.skava.athena.report.AthenaReportDataProcessor",
"reportProperties":[
{"name":"DDLQuery","value":"TOP_USERS_DDL"},
{"name":"query","value":"TOP_USERS_DML"}
],
"reportParams":[
{"name":"storeid","sequence":1,"type":"STRING"},
{"name":"starttime","sequence":2,"type":"STRING"},
{"name":"endtime","sequence":3,"type":"STRING"}
]
}
{"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"}

Note:

  • You need to match the relavant DDL and DML queries from the previous table with the help of the query name.
  • Param should be URL encoded while sending request to the getreportdata API.

Store Dashboard UI

Following is the sample Store Dashboard UI of the Reporting service showing order details. Each created report is associated with one graph:

 

Revision History

2022-04-10 | JP – Fixed link and http issues.
2020-03-27 | AN – Minor copyedits.
2019-01-28 | JP – Content uploaded for January 2020 release.