Saturday, August 26, 2023

How to Develop Alteryx Custom Tool

Post Index  

2023-08-26


How to Develop Alteryx Custom Tool

with the help of Platform SDK

AYX UI SDK, AYX Python SDK v2, AYX Plugin CLI



The platform SDK allows you to develop Alteryx custom tools and plugins.  Alteryx provides a lot of amazing tools but it does not come out of the box for everything.  Just like my previous posts of QVD Input Tool and QVD Output Tool, Alteryx does not come out of the box to have the capability to manipulate QVD files.  In the past, there are Engine and HTML GUI SDKs.  As mentioned in the Alteryx platform SDK help, the outdated technology is limited for extension.  There is a need to use the latest technology that can extend the opportunities for improvement.

For details of Platform SDK, you may refer to the help. This article will focus on the basic to let you understand how to develop a simple Alteryx Custom Tool.




Environment Setup

For basic requirements, please refer to AYX Python SKD v2 help page.

Before you can start the development, there are a few steps required.

1. Install Minicoda3 (download from the site and install).


2. Open Anacoda Prompt (click on the Windows button and search it)


3. Create a virtual environment with the command
    conda create -n [ENV_NAME] python=3.8.5

e.g. conda create -n test_env python=3.8.5

4. Activate the environment.
    conda activate [ENV_NAME]

e.g. conda activate test_env

4. Install the AYX Python SDK.
    pip install ayx-python-sdk


5. Install the AYX Plugin CLI
    pip install ayx-plugin-cli


6. Create (or initialize) the AYX Plugin Workspace
    ayx_plugin_cli sdk-workspace-init

    A few questions to answer to help initialize the configuration files.

    Package Name: [The name of the package]
    Tool Category [Python SDK Examples]: [The category of the tools]
    Description []: Description of the package.
    Author []: Who is the author of the package.
    Company []: The company name.
    Backend Language (python): python

7. Install NodeJS.  It is required to developer UI for the tool.
    conda install -c anaconda nodejs=16.13.1




7. Create the AYX Plugin
    ayx_plugin_cli create-ayx_plugin --use-ui
    * --use-ui must be explicitly input to make sure the UI part is created by the plugin CLI.




    A few questions to answer to help initialize the configuration files.

Tool Name: [The name of the tool]
Tool Type (input, multiple-inputs, multiple-outputs, optional, output, single-input-single-output, multi-connection-input-anchor) [single-   input-single-output]: [the type of the tool]
Description []: [description of the tool]
Tool Version [1.0]: [version of the tool]
DCM Namespace []:  [DCM Namespace to be used]

Once it is done, you will have a folder structure created as below.

Backend Development (Python)

After the initialization of the environment, workspace and tool, there is a backend file created in the path \backend\ayx_plugins.

Open this using an editor or any IDE to start the tool development.  There are four important functions:

def __init__(self, provider: AMPProviderV2) -> None:
This is to initialize the tools.  Usually, to initialize the variable to be used in the backend code.

def on_incoming_connection_complete(self, anchor: namedtuple) -> None:
When there is an incoming anchor complete, this will be called.  Input tool does not required to handle this.

def on_record_batch(self, batch: "Table", anchor: namedtuple) -> None:
This is to handle the record provided in batches.  Input tool does not required to manipulate this.

def on_complete(self) -> None:
Once the batches have been manipulated, this is to free up the resources and also finalize the completion.  All logics happen here for input tool.

For each function, the comment is already in the backend plugin file. 

Frontend Development (Reactjs)

For front-end development, it should locate to the path \ui\TestInputTool\src.  There is a file called index.tsx.  This is the file to be updated.

The content is reactjs.

More UI help can be found in the Alteryx UI help.


Create the YXI and Deploy to Alteryx Designer

Once the frontend, backend, configuration (like icon, description, version, etc) are completed, it can be packaged to YXI file to share and also install into Alteryx Designer.  It is simply running the command:
ayx_plugin_cli designer-install --use-ui
* --use-ui must be specified explicitly if UI component is involved.

It will prompt a question.
Install Type (user, admin) [user]: [user means to install user location that only for single user while admin means to install into admin location that allows all users in the machine to use]

user location: C:\Users\[user]\AppData\Roaming\Alteryx\Tools
admin location: C:\ProgramData\Alteryx\Tools


Test in Designer

Open up the Alteryx designer, the tool palette contains a new category named "Test Tools" and the Test Input Tool can be selected.  Once this executed, there is a sample result.



This concludes how to develop an Alteryx Custom Tool.  For details about python code, reactjs, etc, leave me message if further discussion is needed.


Thank you for reading.  I hope it help you.  Just leave a message if you have any discussion/share want to make.



Thursday, August 17, 2023

Alteryx QVD Output Tool - Prototype

 Post Index

2023-08-17


Alteryx QVD Output Tool

Alteryx Custom Tool




In the previous post, the Alteryx QVD Input Tool prototype is shared and following in this post, the Alteryx QVD Output Tool will be introduced.

It reads the data from Alteryx and then convert into QVD, i.e. convert each column into a list of symbols and symbol indexes and then compact each record  by the symbol index into the least bits required to store the record data.  XML information is saved in memory during the processes.  Once everything is ready, it flushes out the XML, symbols as well as the records.


Alteryx QVD Output Tool

The Alteryx QVD Input Tool is very simple.  It just takes in a QVD file and read all the content then convert it as an Alteryx output stream.  The input UI is as below.



* the new Alteryx SDK is now using reactjs where it is not possible to get through the security to get the full path.  Thus, there is no button to pop up a dialog to ask for file location.  Instead, there is only a textfield for inputting the path.  If you have any clue to get this through, it is welcome.  The prototype is hoping to show the possibility to integrate with QVD files.


The prototype

If you hope to try it, you can download it in my github.  https://github.com/kongson-cheung/Alteryx-QVD-Tools/blob/main/yxi/QVD%20Tools_v1.1.yxi

I have share the core files to create this Alteryx QVD Output Tool.  Since the SDK includes a large number of files, I did not upload them all.  If you need any help, feel free to drop me a message.


* Note: this is still very early version of prototype.  It still requires a number of improvements for intensive use.


Next

I will try summarize how to develop the Custom Alteryx Tool.


Thank you for reading.  I hope it help you.  Appreciated your sharing if you have any discussion/share want to make.



Monday, August 7, 2023

Alteryx QVD Input Tool - Prototype

 Post Index

2023-08-07


Alteryx QVD Input Tool

Alteryx Custom Tool


With the findings in the previous post (QlikView Data File (QVD) - Reverse Engineering), I have developed a prototype of the Alteryx QVD input tool.  Alteryx is an extremely good tool for data wrangling and contains a bundle of tools that allow simple data transformation and predictive analysis.  However, it does not have the ability to deal with QVD.  It only allows to read and write QVX.

Alteryx, in fact, is a good upfront data stream for QlikView and Qlik Sense.  It can help business to make clear use of data with drag and drop capability to explore, transform and try new business logics.  QVD integration would be benefitial for Qlik at the lower data stream in the data cycle.   Sound like advertisement but it is real project experience to conclude this.  QVX does not work great with heavy usage.  The performance is similar to CSV.  Still, QVD is the best for Qlik.


Alteryx QVD Input Tool

The Alteryx QVD Input Tool is very simple.  It just takes in a QVD file and read all the content then convert it as an Alteryx output stream.  The input UI is as below.


* the new Alteryx SDK is now using reactjs where it is not possible to get through the security to get the full path.  Thus, there is no button to pop up a dialog to ask for file location.  Instead, there is only a textfield for inputting the path.  If you have any clue to get this through, it is welcome.  The prototype is hoping to show the possibility to integrate with QVD files.


An Example

Taking a QVD file as an example.



The QVD file contains 2 columns named Num and Text.  It has total number of 4 records.  In Alteryx, the result runs as below.

If you hope to try it, you can download it in my github.  https://github.com/kongson-cheung/Alteryx-QVD-Tools/blob/main/yxi/QVD%20Tools_v1.0.yxi

I have share the core files to create this Alteryx QVD Input Tool.  Since the SDK includes a large number of files, I did not upload them all.  If you need any help, feel free to drop me a message.


* Note: this is still very early version of prototype.  It still requires a number of improvements for intensive use.


Next

The Alteryx QVD Input Tool is made up by Platform SDK, UI SDK, Python SDK.  This is a first prototype for reading QVD.  I am exploring a prototype of Alteryx QVD Output Tool that write QVD as output.


Thank you for reading.  I hope it help you.  Appreciated your sharing if you have any discussion/share want to make.




Monday, July 31, 2023

QlikView Data File (QVD) - Reverse Engineering

Post Index 

2023-07-31


QlikView Data File (QVD)

Reverse Engineering

Revealing what is inside a QVD file



QVD is a Qlik proprietary format that is widely use in the Qlik products.  And this format works very well within Qlik environment but other than using Qlik products, it is not possible to convert data into QVD format.

QVD is famous on its performance and compression.  Comparing to CSV, Excel and other formats, it has 10x performance gain because the format can, in fact, directly be loaded into memory and directly be used by the Qlik products.

From technically perspective, this article will try to discuss the QVD in detail so that we can understand the amazing elements in this QVD file.


Note: I was analyzing this because I was trying to do a project that hopes to read/write QVD using Alteryx.  I then put some effort for this reverse engineering and developed some prototypes but then a simpler method is used and the information and effort seem deemed.  Hopefully, with this article, I think it is good for every to applause for the design and also learn how to do reverse engineering a bit.  If I get enough time, I will release the prototype in github.



QVD is a file that contains of three major parts:

1) XML

The XML is providing the metadata information about the QVD.   It describes the QVD table and the QVD field with a number of internal used elements.


2) Symbol

Symbol means the unique value of a field.  Each field has a list of symbols.  This is why QVD is highly compressed because each unique value in a field is only saved once.  Each symbol is indexed by a unique number, i.e. 0, 1, 2, ....


3) Record Data

Record data is stored in binary format.  Each record is stored with the size of a record byte size.  With the record bytes, each field value in a record indicated of offset and length.  This portion of binary data can then be converted into the index to get the symbol value.  The entire method makes uses of of bit operations and bit masks to reduce the byte required.  This is one of the main reasons why the data volume is highly compressed.


XML

The QVD XML is illustrated below:


<QvdTableHeader>

    <QvBuildNo>...</QvBuildNo>

    <CreatorDoc>...</CreatorDoc>

    <CreateUtcTime>...</CreateUtcTime>

    <SourceCreateUtcTime>...</SourceCreateUtcTime>

    <SourceFileUtcTime>...</SourceFileUtcTime>

    <SourceFileSize>...</SourceFileSize>

    <StaleUtcTime>...</StaleUtcTime>

    <Fields>

        <QvdFieldHeader>

            <FieldName>...</FieldName>

            <BitOffset>...<BitOffset>

            <BitWidth>...<BitWidth>

            <Bias>...</Bias>

            <NumberFormat>

                <Type>...</Type>

                <nDec>...</nDec>

                <UseThou>...</UseThou>

                <Fmt>...</Fmt>

                <Dec>...</Dec>

                <Thou>...</Thou>

            </NumberFormat>

            <NoOfSymbols>...</NoOfSymbols>

            <Offset>...</Offset>

            <Length>...</Length>

            <Comment>...</Comment>

            <Tags>

                <String>...</String>

                <String>...</String>

            </Tags>

        </QvdFieldHeader>

    </Fields>

    <Compression>...</Compression>

    <RecordByteSize>...</RecordByteSize>

    <NoOfRecords>...</NoOfRecords>

    <Offset>...</Offset>

    <Length>...</Length>

    <Lineage>

        <LineageInfo>

            <Discriminator>...</Discriminator>

            <Statement>...</Statement>

        </LineageInfo

    </Lineage>

    <Comment>...</Comment>

</QvdTableHeader>


Some of the core tags are explained:

QvBuildNo

The QVD version.

CreateUtcTime

The QVD file created date time.

TableName

The QVD table name.

QvdFieldHeader

The details about each field for the QVD parser

QvdFieldHeader/FieldName

The Field Name

QvdFieldHeader/BitOffset

In the record byte, which starting bit to start extract the symbol index.

QvdFieldHeader/BitWidth

In the record byte, how many bits to extract starting from bit offset in order to get the symbol index.

QvdFieldHeader/Bias

It is a special indicator for special handling.

QvdFieldHeader/NoOfSymbols

The number of symbols in the field.

RecordByteSize

The size required to store a record in this QVD dataset.

NoOfRecords

The number of records in the QVD


Symbol

In each fields, there is a list of symbols stored with the pattern of symbol type and symbol data.

Symbol Type is 1 byte to indicate the what kind of data and pattern to parse.

Symbol data is the data content stored in the file.  It is unique in each field for each symbol.



For symbol type 5 and 6, it is where DUAL data type is introduced.  Dual is a special data type that how Qlik stores data.  It is a pair of number and text in the form of (Text, Number).  All data in QVD, in fact, are in dual form.  For example, integer 25, it is stored as (NULL, 25).  A text "Hello" is stored as ("Hello", NULL).  A date is special that it stores ("DATE-STRING", DATE_INT).  A datetime is stored as ("DATETIME-STRING", DATE_NUMBER).  In general, it can be any (Text, Number) pair but generally, DUAL is tackling date and datetime. Sometime, color code will also make use of dual, e.g. (RED, 1), (GREEN, 2), etc.


The known (as result of reverse engineering) symbol types are:

1. Symbol Type = 1

4-byte integer is in this type.


2. Symbol Type = 2

8-byte number is in this type.  This also include decimal point numbers.


3. Symbol Type = 4

Text is in this type.  And a NULL char is at the end to indicate the end of the text.


4. Symbol Type = 5

4-byte integer along with text with a NULL end.  It is date with the form (Text, 4-byte integer).  In fact, other than date, it is possible to store any text/integer pair.


5. Symbol Type = 6

8-byte number along with text with a NULL end.  It is datetime with the form (Text, 8-byte number)  In fact, other than date, it is possible to store any text/number pair.


The order of how these symbols are read indicates the corresponding symbol index.  For example in a field, "c" is the first read for the field, it will have a index =0, the "a" is the second symbol read, it will have index = 1.   It does not require a proper sorting.

Moreover, it is also required special attention on how it manipulates NULL.  Using an example will be easier to understand.  With a table with three fields Num, Text, Dummy with the following data:


Num,Text,Dummy

2, A,

1, B,

1, A,

2, B,

,,


Num has 3 symbols including values of 2, 1 and NULL.

Text has 3 symbols including values of A, B and NULL

Dummy has 1 symbol including values of i.e. NULL


The symbol stored of Num will be

[Symbol Type =5][1 and "1"]  => index =0

[Symbol Type =5][2, "2"] => index = 1

[Symbol Type =5][NULL, NULL] => index = 2


It requires 8 bytes + 2 bytes (utf-8, 2 bytes for a char) for the symbol data.  To indicate 3 symbol indexes, it requires 2 bits.

As a result in the QVDFieldHeader, BitOffset is 0 and BitWidth is 2 and Bias is 0.


* it will treat as 5 as it does not know the data type well.  It happens when the data is coming from a CSV without data type specification.  If it is coming from DB, it has a mapping between the DB type and the type to be used in QVD.


The symbol stored of Text will be

[Symbol Type =4][A] => index = 0

[Symbol Type =4][B] => index = 1

[Symbol Type =4][NULL] => index = 2

It requires 2 bytes (utf-8, 2 bytes for a char) for the symbol data.  To indicate 3 symbol indexes, it requires 2 bits.

As a result in the QVDFieldHeader, BitOffset is 2 and BitWidth is 6 and Bias is 0.  Since it is the last column with data, it will take up all bits to form a full byte, i.e. 6-bits even the smallest and required is just 2 bits.  And funny is that the last column means the last column with data.  If it is all null, it would not treat as the last column.


The symbol stored of Dummy will be

[Symbol Type =4][NULL] => index = 0

It requires nothing for data storage.  But the symbol type 4 is still required to store 1 bytes, i.e. NULL byte.

As a result in the QVDFieldHeader, BitOffset is 0 and BitWidth is 0 and Bias is 0.  Offset, width and bias are zeros indicate no bytes are required for record. 


Record Data

In QVD, each record is not storing the exact field values.  Indeed, it stores the symbol indexes of the all fields.

Taking the same example used in the symbol illustration, the table below:


Num,Text,Dummy

2, A,

1, B,

1, A,

2, B,

,,


First record: require Num[index=0] and Text[index=0] and dummy=nothing, the record represents as [0000], 2 bits for Num, 2 bits for Text.

Second record: require Num[index=1] and Text[index=1] and dummy=nothing, 3 bits represents as [0101]

Third record: require Num[index=1] and Text[index=0] and dummy=nothing, 2 bits represents as [0001]

Forth record: require Num[index=0] and Text[index=1] and dummy=nothing, 2 bits represents as [0100]

Fifth record: require Num[index=2] and Text[index=2] and dummy=nothing, 2 bits represents as [1010]

The first field will be stored in the rightmost bits while the last field will be stored in the left most bits.

Thus, the 5 four records will be stored as [0000], [0101] [0001] [0100], [1010].  The just enough bytes are required to hold these binary data, i.e. 8 bits, 1 byte.  Thus, they become [0000 0000], [0000 0101], [0000 0100], [0000 0001], [00001010] => 0, 5, 1, 4, 10.  These 5 integers are used to represent total of 15 values.

To complete the QvdTableHeader description, the RecordByteSize is 1.  And NoOfRecords are 5 because there are 5 rows.


* Bias, still, needs more investigation on the exact usage.


Reverse Engineering - What have been done?

In order to try further understanding the format, a simple way to is generate a simple QVD file and look into the details.  For example a single column with 2 rows with integer only.  Keep iterating with different data and review the details, it is easy to spot the changes.  It might require a notepad editor that can show the invisible byte like NULL, EOT, etc.  A good notepad editor is notepad++.

Obviously, there might be still more handling in QVD but it is already showcase the beauty of it and why it can process that fast and compress that much.

Personally, it terms of file operation, I seriously hope that Qlik can further expand this usage because nowadays the data usage is huge and file is everywhere.  A proper and manageable file is important.  In particular to support the cloud computing, an enhanced of QVD might do the trick as well.  If it can break through the area and become open-source to use, it will greatly beneficial to everyone dealing with data.





Wednesday, July 19, 2023

Qlik Sense Server Migration - API and CURL Detail

Post Index

2023-07-19

How to use CURL and QRS API

Qlik Sense Server Migration Overview


Qlik Sense Server Migration

API and CURL Detail

Migrating Qlik Sense items using APIs

There are a number of QS items required to be migrated in the Qlik Sense server migration process.  This article focuses two core components to illustrate the concept and steps required for migration.  First, it will discuss about the App migration that involves App, App Object and App Content.  Then it will discuss about the Reload Task that involves Reload Task, System Event and Composite Event.

For other, the concept is similar.  I am happy to help if you have questions.


General QRS API Call

More technically, Qlik Sense Server Migration involves a few types of QRS API calls:

1) GET

To obtain the metadata in JSON and this JSON will be modified or trimmed off in order to POST/PUT into the new cloud server.

2) POST

POST command is generally to create a new item in the QS server.

3) PUT

PUT command is usually to update the QS item that is already created in the server.


For POST and PUT, usually, it requires details to be provided in the HTTP body.  It would be much easier to GET the metadata directly from the on-premises server and modify the necessary component in the JSON for the POST and PUT APIs.  One obvious example is the modified date and it must be larger than the last modified date.  More related to this article discussion is the app ID.  Since APP ID will be generated after upload, it should maintain a mapping in order to update the JSON for the POST and PUT APIs.


App Object ID vs Engine Object ID

These two IDs are tricky and easy to get confused.  When the app object is created, a unique GUID will be created and this ID is used internally in QS for QMC.  At the same time when the object is created, the Engine Object ID is also created.  It is an in-app unique ID to distinguish the object.  It is for the qvf used.

In the QRS API, app object ID is referring to the App Object ID.  However, in QMC or in the link, it is showing the Engine Object ID.


QS Application Migration

Before migration QS applications, firstly, some QS items should be migrated beforehand.  For instance, data connection, extension, content libraries, stream, etc.  This makes sure the application can make use of all these components.

In fact, to migrate an application, it is similar to do what manually do with the following steps:

1) Export the app with data and cover all the base, shared and private app objects.

2) Import the app with data with all the exported app objects and maintain the ownership

3) Publish the app to stream (if necessary)

4) Unapprove the app object

5) Unpublish the app object

6) Change the owner of the app object to maintain all the ownership


* If app is exported without data, the always one selected value setting in the field will be lost.  If you are certain that the application does not have this setting, it is recommended to export without data and do the reload afterwards.


Export App

There are three methods that are available to obtain the QS app with all the app objects:


1) using QRS API /qrs/app/exportapps (refer to May 2023 version)

This is only available in or above May 2023 version of Qlik Sense.  The export scope should be configured to "all".

2) copy the app directly in the Qlik shared folder, i.e. \\server-name\QlikShare\Apps

This folder contains all the Qlik Sense applications that are in the Qlik Sense server.  The files in this folder are named by the QS Application ID without extension.  These files are actually a binary qvf. 


3) Publish/Approve the sheets and then export the app.

This method has impact to the end users because the private and community app objects will be published and approved to base app objects.   After triggering the export, it can simultaneously unapprove and unpublish the app objects but it still take a while for these operations.  If the server is frozen, this method would be a good choice.


This method requires the following QRS API:

1) PUT /qrs/app/object/{id}/publish

To publish app objects into sharing state.

2) POST /qrs/app/object/{id}/approve

To approve app objects into base state.

3) POST /qrs/app/{id}/export

To create the download link and export the app

4) GET /qrs/download/app/{id}/{exportTicketId}/{fileName}

To download the app based on the download link

5) * if require to restore app object state, POST /qrs/app/object/{id}/unapprove

To unapprove the app objects back to sharing state.

6) * if require to restore app object state, PUT /qrs/app/object/{id}/unpublish

To unpublish the app objects back to private / personal state.


Import App

There are two APIs available for importing the QVF into the Qlik Sense server:

1) POST /qrs/app/import

This requires the qvf first located in the app folder.  C:\ProgramData\Qlik\Sense\App


2) POST /qrs/app/upload

This method is recommended.

Apparently, manual upload is also possible.


Publish App to Stream

The next step is to publish the app into stream with the QRS API.

PUT /qrs/app/{id}/publish

Some of the applications might be in the work stream that does not require this step.


Unapprove App and Change Ownership

This step is a bit tricky because we are not using the unapprove API.  Instead, 

the app object put is used.

PUT /qrs/app/object/{id}

As a result, we can update both the unapprove flag as well as the app object ownership in one-go.


Unpublish App

The last step is to unpublish the app object if they are private/personal.

PUT /qrs/app/object/{id}/unpublish

*** Note if binary QVF is used

The binary qvf files obtain all the app objects even it is deleted (reference).  So, there should be a steps to remove the redundant objects.  This can easily be cross-checked with the app object metadata and do a DELETE /qrs/app/object/{id}.

And there are a few more drawbacks:

The binary qvf files obtain all the app objects even it is deleted (reference).  So, there should be a steps to remove the redundant objects.  This can easily be cross-checked with the app object metadata and do a DELETE /qrs/app/object/{id}.

i) If the application has referenced to the In-App pictures, the link is broken.  One way to fix this is to make sure of the Engine JSON API to replace the on-premises app ID to the cloud app ID.

ii) the app properties are also lost.  This needs to be fixed by PUT /qrs/app/{id}.  Obviously, it requires a JSON that can be obtained by GET /qrs/app/{id}

iii) the app content is lost.  And it requires to POST /qrs/appcontent/{appid}/uploadfile.  The content physical files can be found in \\server-name\QlikShare\Static Content\App Content.

QS Reload Task Migration

The reload task migration is less complicated compared to the app migration.  The only tricky part is to make sure the reload task is pointing to the updated app ID.  And reload task always comes with trigger where

a) Schema event is based on the time schedule.

b) Composite event is used to chain up the app reload.



Overall, the below API calls will be used to obtain the metadata.

GET /qrs/reloadtask/full

GET /qrs/schemaevent/full

GET /qrs/compositeevent/full


The below API call will be used to create the reload task, schema event as well as the compositive event.

POST /qrs/reloadtask

POST /qrs/schemaevent

POST /qrs/compositeevent


Thank you for reading.  I hope you find it useful.  See you in the next post :)