Overview
This document explains how to configure AWS Athena plugin, generate CSV files using the feed reportAthenaETLFeed, configure S3 details, query report model and report params for Reporting service.
Note: A sample order data has been used in this document.
About AWS Athena Plugin
- Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage.
- Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL.
- Athena scales automatically—executing queries in parallel—so results are fast, even with large datasets and complex queries.
Why Amazon Athena
With Athena, you can:
- generate reports
- explore data with business intelligence tools
- connect SQL clients with a JDBC or an ODBC driver
Prerequisite
To use Athena, S3 and Athena should be in the same region and should have complete access. For obtaining complete access, contact the Infosys Equinox Infrastructure team.
Transform and Upload Order Data
Since Athena uses S3 as its data source, you need to upload the data to S3 first. You can use the Infosys Equinox Feed service to transform and upload order data to S3. Infosys Equinox accesses order data which is uploaded to S3 by querying through AWS Athena via the Report service using the following steps:
- configure Infosys Equinox Feed to extract, transform into .csv files, and upload to S3.
- create Report models in the Report service. The SQL query details are configured in the Report models.
- use report service to connect with Athena and execute the query. The Athena service runs a query on S3 files in the defined path and displays the result.
Order Data Flow Diagram
This diagram illustrates the order data flow from Order DB to S3 and Infosys Equinox Reporting service to AWS (Athena and S3):
Using the Feed reportAthenaETLFeed
The feed reportAthenaETLFeed
extracts MongoDB document of Order service from omsdb
and transforms it into the structured .csv
files. It uploads the .csv
files into AWS S3.
Transformation Process
The steps involved in the transformation process are:
- Delete the previously created files in the output location.
- Parse the
serviceConfig
field of the Order and Feed services to fetch the Order and Feed DB details. While creating the feed, you have to specify the Feed and Order services in theserviceConfig
field. - Read the last successful run time from the feed run history table of the Feed DB. If the feed runs for the first time, extract the last run time from the job parameter or the last run time is calculated as past 24-hours from the current feed run time.
- Using the last run time from the step 3, the feed job queries the OMS DB and fetches the orders created and updated since the last run time till the current feed run time, and transforms it into the
.csv
files with the specific fields. There are five.csv
files created for this feed. - Check whether the job created the new files or not. If no orders were created or updated in a specific period, it indicates no files were created. In this case, you should abort the job and wait for the next run. If the new files were created, you upload all the files to AWS S3 and wait for the next feed run.
Input Params
The following table lists job parameters required for the feed job reportAthenaETLFeed:
Parameter | Description |
---|---|
ordercollection | Indicates the name of the order collection to fetch orders. |
orderdb | Indicates the name of the omsdb in that environment |
s3AccessKey | Specifies the access key of the AWS Identity and Access Management (IAM) role to connect with S3. |
s3bucket | Indicates the name of the S3 bucket to upload the .csv files. |
s3BucketRegion | Specifies the region of the S3 bucket. |
s3path | Specifies the S3 bucket path where the .csv files are uploaded. |
s3SecretKey | Specifies the secret key of the AWS Identity and Access Management (IAM) role to connect with S3. |
Creating CSV Files
The following files are created by the feed job reportAthenaETLFeed. Each segment of files considered a separate table. The files are appended with an underscore and the timestamp (syntax: _YYYYMMDDHHmmSS) followed by an index starting from 0 while writing in the local path. In a segment, each file contains 1000 records. If a segment has more than 1000 records, it will create a new file by increasing the index by one.
The CSV files are:
Order Summary
The syntax for the file name is ordersummary_<_YYYYMMDDHHmmSS>_<IDX>.
The following table lists fields that are available in the order summary file:
Field Name | Desccription |
---|---|
uploadtime | Specifies the time when the file was created. |
orderid | Indicates the unique ID of the order. |
collectionid | Indicates the collection ID under which the order was created. |
createdtime | Contains the time when the order was created. |
updatedtime | Specifies the time when the order was last updated. |
createdby | Indicates the user who has created the order in theAdmin console. |
storeid | Indicates the store in which the order was placed. |
userid | Specifies end user who has created the order. |
totalsale | Contains the total sale of the order calculated from Math. |
totaldiscount | Indicates the total discount of the order calculated from Math. |
totaltax | Indicates the total tax of the order calculated from Math. |
Order Details
The syntax for the file name is orderdetails_<_YYYYMMDDHHmmSS>_<IDX>.
The following table lists fields that are available in the order details file:
Field Name Description
uploadtime
Specifies the time when the file was created.
orderid
Specifies the unique ID of the order.
collectionid
Specifies the collection ID under which the order was created.
createdtime
Specifies the time when the order was created.
updatedtime
Specifies the time when the order was last updated.
createdby
Specifies the user who has created the order in the Admin console.
storeid
Indicates the store ID in which the order was placed.
userid
Specifies the end user who has placed the order.
firstname
Indicates the first name of the user.
lastname
Indicates the last name of the user.
itemid
Indicates the ID of the item in the order.
productid
Contains the ID of the product from the catalog.
productname
Indicates the name of the product.
sku
Specifies the SKU ID of the product.
quantity
Indicates the quantity placed.
style
Contains the style type.
color
Indicates the color of the product placed.
fit
Indicates the fit type of the product.
size
Specifies the size of the product placed.
categoryid
Indicates the category ID under which the product was placed.
vendor
Specifies the vendor name of the product.
itemstatus
Indicates the status of the item placed.
paymentid
Indicates the payment ID in the order.
paymentmode
Indicates the mode of payment.
paymentvalue
Indicates the total amount in the payment.
paymenttype
Indicates the payment type.
currency
Specifies the currency type of the payment.
Order Payment
The syntax for the file name is orderpayment_<_YYYYMMDDHHmmSS>_<IDX>.
The following table lists fields that are available in the order payment file:
Field Name Description
uploadtime
Specifies the time when the file was created.
orderid
Contains the unique ID of the order.
collectionid
Indicates the collection ID under which the order was created.
createdtime
Indicates the time when the order was created.
updatedtime
Indicates the time when the order was last updated.
createdby
Indicates the user who has created the order in Admin console.
storeid
Indicates the store in which the order was placed.
userid
Specifies the end user who has created the order.
paymentid
Contains the payment ID in the order.
paymentmode
Indicates the mode of payment.
paymentvalue
Specifies the total amount in the payment.
paymenttype
Indicates the payment type.
paymentstatus
Contains the payment status.
Order Product
The syntax for the file name is orderproduct_<_YYYYMMDDHHmmSS>_<IDX>.
The following table lists fields that are available in the order product file:
Field Name Description
uploadtime
Specifies the time when the file was created.
orderid
Specifies the unique ID of the order.
collectionid
Indicates the collection ID under which the order was created.
createdtime
Specifies the time when the order has been created
updatedtime
Indicates the time when the order was last updated
createdby
Indicates the user who created the order in admin.
storeid
Indicates the store in which the order was placed.
userid
Indicates the end user ID who has created the order.
firstname
Indicates the first name of the user.
lastname
Indicates the last name of the user.
itemid
Indicates the ID of the item in the order.
productid
Specifies the ID of the product from the catalog.
productname
Specifies the name of the product.
sku
Specifies the SKU ID of the product.
quantity
Contains the quantity placed.
style
Indicates the style type.
color
Specifies the color of the product placed.
fit
Indicates the fit type of the product.
size
Contains the size of the product placed.
categoryid
Specifies the category ID under which the product was placed.
vendor
Contains the vendor name of the product.
unitprice
Specifies the price of the product per unit.
saleprice
Contains the sale price in the order.
mode
Indicates the payment mode.
totalsale
Indicates the total sale of the product.
Order Promotion
The syntax for the file name is orderpromotion_<_YYYYMMDDHHmmSS>_<IDX>.
The following table lists fields that are available in the order promotion file:
Field Name Description
uploadtime
Indicates the time when the file has been created.
orderid
Specifies unique ID of the order.
collectionid
Indicates the collection ID under which the order was created.
createdtime
Contains the time when the order was created.
updatedtime
Indicates the time when the order was last updated.
createdby
Contains the user who has created the order in Admin console.
storeid
Indicates the store in which the order was placed.
userid
Indicates the end user who has created the order.
firstname
Indicates the first name of the user.
lastname
Specifies the last name of the user.
totalordersale
Indicates the total sale value of the order.
totalordertax
Contains the total tax of the order.
totalorderdiscount
Indicates the total discount in the order.
offerid
Specifies the offer ID.
promocode
Indicates the promocode used.
promoname
Specifies the promotion name.
promolevel
Indicates the promotion level.
mode
Indicates the mode.
totaldiscount
Indicates the total discount by the promotion.
Configuring S3 Details
The feed reportAthenaETLFeed
is scheduled to generate and upload .csv files to S3. The report service uses athena-report-plugin to connect with Athena and query the .csv files in S3. The S3 details required by athena-report-plugin are given in the collection properties. For more information, see Collection Properties – Reporting.
Query Details in Reporting Model
Find below a sample query details in the Report model:
{ |
---|
- The syntax for the S3 location is s3://<S3_BUCKET_NAME>/<FOLDER_PATH>/; for example, s3://assets-internal-ecom-skavacommerce-com/report/athena/orders/ordersummary/.
- Note in the above model, the report properties have two queries:
- DDL Query: In Athena, you need to create a table to query the .csv files in S3. You need to specify all the fields in the file to identify the fields by the Athena query engine. This process can be done in one of the two ways:
- In the Athena console, create a table (DDL Query) with the fields as available in the .csv file, manually.
- Using Report microservice, create a table by proving the DDLQuery property as shown in the above model.
- DML Query: Aggregate query to present the order data in the dashboard.
- DDL Query: In Athena, you need to create a table to query the .csv files in S3. You need to specify all the fields in the file to identify the fields by the Athena query engine. This process can be done in one of the two ways:
Report Model and Report Params
While querying CSV, you may pass some data dynamically in the request call getreportdata API
. Also, while creating a report model, you can define the params that are going to be used. These params are surrounded with ‘?{‘ and ‘}’ (?{param_name}) in queries in the Report properties. For example, in the above DML queries, storeid, stattime, and endtime have been used as params. While creating the Report model, if you have used any params in the queries, you should also create relevant reportParam as well.
You need to pass these params as query param in the JSON format in the request call getreportdata API.
ReportId | Report Name | Created Report Model | Sample Params |
---|---|---|---|
1 | QUICKSALE_SUMMARY | { | {"datepart":"day", "storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"} The supported date parts are:
|
2 | TOTAL_SALES | { | {"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"} |
3 | PAYMENT_TYPES | { | {"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"} |
4 | PROMOTIONS | { | {"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"} |
5 | TOP_PRODUCTS | { | {"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"} |
6 | TOP_USERS | { | {"storeid":"store01","starttime":"1547382309000","endtime":"1578918309000"} |
Note:
- You need to match the relavant DDL and DML queries from the previous table with the help of the query name.
- Param should be URL encoded while sending request to the getreportdata API.
Store Dashboard UI
Following is the sample Store Dashboard UI of the Reporting service showing order details. Each created report is associated with one graph:
Revision History
2022-04-10 | JP – Fixed link and http issues.
2020-03-27 | AN – Minor copyedits.
2019-01-28 | JP – Content uploaded for January 2020 release.