Friday, December 19, 2025

Analytical Calendar - Qlik Associative Model Generation

Post Index

Technical Discussion | Analytical Calendar The Story | The Index Page

2025-12-20

Analytical Calendar

Qlik Associative Model Generation


Thanks for following up all the posts.   I am eventually confident to release the code for public use.  Please leave your feedback and use it happily and evolve analytics.  The code is for Qlik to make best of the usage of associative data model to maximise the usage of the analytical calendar.


How to Configure the Analytical Calendar

First, you can customise the code by yourself but make sure you understand well before doing so.  The easiest way is to import the analytical_calendar.qvs into the Qlik script and start the configuration.  What it needs two major steps:

1.  Change the parameter.

vStartDate

It is the start date of the calendar.


vEndDate

It is the end date of the calendar.


vMaxYRolling

It is the maximum rolling years.


vMaxQRolling

It is the maximum rolling year quarters.


vMaxMRolling

It is the maximum rolling year months.


vMaxWRolling

It is the maximum rolling year weeks.


vMaxDRolling

It is the maximum rolling days.


vMaxYComparison

It is the maximum comparison year range.


vMaxQComparison

It is the maximum comparison year quarter range.


vMaxMComparison

It is the maximum comparison year month range.


vMaxWComparison

It is the maximum comparison year week range.


vMaxDComparison

It is the maximum comparison date range.


2.  To pick one of the week usage, either Incomplete Week (Broken Week) or Complete Week (Unbroken Week).

Just remove the one that you do not want to use.


Associate to the Analytical Calendar and How to Write the Expression

Just load them into your dashboard and make sure you have a fact table with %DATE_KEY to associate to the analytical calendar.

In general, you will just need two similar expressions:

1. analysis period, sum({<[Comparison Perspective]={'N/A'}>} Exp1)

2. comparison period, sum(Exp1)


I keep this one very short.  In case if you have any questions, feel free to jot me a note.


Same details, you can find in my github.  Thank you!


Thank you for reading.  I hope you enjoy using the analytical calendar.  Just leave a message if you have any discussion/share want to make.



* If you like my post, support me buy me coffee https://buymeacoffee.com/kongsoncheung.

Wednesday, December 17, 2025

Analytical Calendar - Public Release for 2022 to 2026

   Post Index

Technical Discussion | Analytical Calendar The Story | The Index Page

2025-12-18

Analytical Calendar

Public Release

Analytical Calendar is continuing its journey with many versions developed with different forms and attempts that aims at reducing the development effort and boost the overall experience in data analytics.   Each version has its own pros and cons but after a lot of effort, I am now trying to release a public version that I think that is easy enough for everybody.


Analytical Calendar - Public Release for 2022 to 2026

The QVD files can be found in:

1. ANALYTICAL CALENDAR

2. DIM_CALENDAR

3. LINKED_TABLE_DATE_RANGE

Just load them into your dashboard and make sure you have a fact table with %DATE_KEY to associate to the analytical calendar.

In general, you will just need two similar expressions:

1. analysis period, sum({<[Comparison Perspective]={'N/A'}>} Exp1)

2. comparison period, sum(Exp1)


Analytical Calendar Review

Instead of selecting year, month or day, analytical calendar requires you to select three basic fields including:

1. Analysis Perspective

i.e. in what time analysis you want to perform, e.g. year? Year month? Date? YTD? YTM? rolling?

2. Unique Date

i.e. Which date? e.g. 2025 (for Year granularity), 2025-Q1 (for quarter granularity), etc.

3. Comparison Perspective

i.e. the comparison regarding to the analysis period, e.g. 1 Year before, 2 months before, etc.


Look like not user friendly?  Yes, it is just a habit stuff.  Once you start using it, you can start to understand that the traditional calendar requires you a lot of effort to manipulate for a valid combination.  The analytical calendar has incorporated all dependencies and integrated well.  You would not be able to make time analysis in a strange way, neither in selection nor your brain processing.

Analytical Calendar Configuration

It covers 5 years of data with period from 2022-01-01 to 2026-12-31 with 5 level of granularity including Year, Year Quarter, Year Month, Year Week and Date.

And it has the following analysis perspective and comparison perspective:

  • Analysis Perspective
    • Year
    • Rolling Year (2 - 3 years)
    • Quarter
    • Year to Quarter
    • Rolling Quarter (2 - 3 quarters)
    • Month
    • Year to Month
    • Rolling Month (2 - 11 months)
    • Week
    • Year to Week
    • Rolling Week (2 - 26 weeks)
    • Date
    • Year to Date
    • Month to Date
    • Rolling Day (2 - 30 days)
  • Comparison Perspective
    • Comparison Year Before (1 - 3 years)
    • Comparison Quarter Before (1 - 3 quarters)
    • Comparison Month Before (1 - 11 months)
    • Comparison Week Before (1 - 26 weeks)
    • Comparison Day Before (1 - 30 days)
    • Comparison - N/A
* Week is complete week (unbroken week).



Thank you for reading.  I hope you enjoy using the analytical calendar.  Just leave a message if you have any discussion/share want to make.



* If you like my post, support me buy me coffee https://buymeacoffee.com/kongsoncheung.


Saturday, February 15, 2025

Practice - Data Flow Practice

   Post Index

2025-02-15

Data Flow Practice

Generic Pattern for Reference and Discussion

In business analytic realm, it usually involves two core parts.  One is data transformation and the other is data visualization.  There are many terms for data transformation like data wrangling, data preparation, data processing, etc, no matter how it is called, it is a process to prepare the analytic data for visualization or presentation.

In a lot of the analytic teams, it involves tools to help transform the data and it takes the data sources into the tool and then perform a lot of data operations to generate and save the output to storage.  In which, the data flow is often a headache to many data analysts because it is not easy to write a maintainable flow.

In this article, it hopes to provide you some concepts about some important areas in data flow and let you understand the practice.  As the rapid change of the world, it does not cover everything but at least, it is a concept you might consider to follow.



Data Flow Process

In all the data flow, it has three main components involving:

  • Input
  • Preparation
  • Output


Input is a process to take in the necessary data into the tool for processing.  Necessary data means the required columns and required rows.   Required columns are columns that are required in the output or they are for making new derived columns.  Required rows refer to the relevant records.  For instance, rolling 4 years of data.  Input process is crucial to filter out irrelevancy because it helps the downstream processing to save resources by the means of not doing redundant operations.

Many data analysts do not pay much attention to this.  It is understood that while implementing the data flow, it keeps rapidly changing of the flow logics.  However, at the end, the redundant input should be tidied up.  Apparently, it is good for the system performance but it also increase the readability and maintainability.


Preparation is a process to prepare the necessary business data for the output.  It generally involves tasks like data cleansing, creating new columns for business purpose, joining / union, grouping, etc.  It could be a complicated process dependent on the analytic requirements.  More commonly, it is a process to prepare a data model for visualization tool.  The enterprise data is taking in and the business units are on top to generate new values for the data based on  their operation or analytic purposes.  Without a proper concept, it always leads into a messy flow which has low performance and low maintainability.  In fact, there is some steps that might consider for better flow.


Output is obvious a processing to save the data into the storage.  It also is a process to tidy up the data for the downstream usage.  Data flow because it helps the downstream to continue the flow for processing and generating data values.  The tidy-up process usually involves a column renaming.  For example, the columns are renamed from database naming convention to business naming convention to facilitate the visualization purpose.  Output is, in fact, the input of the other data flow / processing.  It should well understand both the up/down stream impact in order for the best data flow development.


Single Table Pattern

Let's take a simple case that the data flow only involves one table.

For all data operations, it starts with single table (* there are semi-structure data that is not in tabular format, the concept here is to deal with a single set of data initially).

When dealing with single table, it involves the generic steps as shown in the below diagram.



Let's break down into the three core components in data flow to explain what it actually does.

Input process, as discussed, brings in the necessary columns and rows.  It specifies column name and row filters.  No matter what kinds of data, it is the first rule of thumb to do to just bring in relevant data.

Preparation process is then to manipulate the brought-in data by formula or logics such as data cleansing (trim space, split text, change date format, etc), deriving new columns for business analysis, creating new join key (it happens when more tables are involved).  Once the required data processing is completed, the next step is to remove the irrelevant columns.  It means to only keep the columns that are required for the output.

Output process is then to rename all the relevant columns for the downstream usage.   More often, it applies the business naming convention.  Once it is completed, it will then write out the data to the target storage.

The pattern looks simple but in reality, you will see a lot of data flow is not achieving this simple practice.  The reason is understood that the rapid change of the requirement and data might be difficult to keep this practice.  However, indeed, if this practice is always maintained, it is far easier than not to apply it.  Once applied, the benefit is obvious that you feel more happy to read and digest the data flow details.


Two Table Pattern

Let's then take a another simple case of involving two tables.

When dealing with two tables, it involves the generic steps as shown in the below diagram.



Two tables mean two single tables.  Therefore, it follows the same practice for input and preparation processes for two single tables individually before it combines the data into the combined table.  In here, the single table should first complete all single table operations that means it finishes creating the new derived fields if it relies on only one single table data.  In such as way, the data flow is easier to track and trace.

Let then takes a closer look the combined table preparation process and output process.

Preparation process is actually the same as single table with an additional step to combine the data using join or union.  After combing the tables, it continues the data operations.  The data operations refers to the operations require columns in both tables.  Once completed, the irrelevant columns would be removed.

Output process is the same as the single table pattern that it rename the columns to facilitate downstream operations before saving out the data.

Again, the pattern looks simple but in reality, you will see a lot of data flow is not achieving this simple practice.  You can image that it is just a step by step process to first deal with merely a single table and then two tables.


Three Table Pattern

Three tables are apparently very similar to two tables.  It is the same that it first manages the single table and the combined table.  The only additional step is that it has two combined steps. 

The generic steps as shown in the below diagram.



It actually adds a step to combine the third table against the two combined table.  The general steps are same.  The concept is very clear to deal with one table and then two, then three and so on.

Sound easy and try to keep the flow clear.


Data Flow Design Pattern

The data flow design is actually simple if it is executed step by step with the concept.   The problem that is always faced is mixed up the steps.  The mix up of the order of steps could lead to unbelievable impact.

For example, the renaming happens in the middle of the flows.  It generates extra and untraceable data operations.  Or even, it leads to unnecessary and excessive renaming operations.  Firstly, it is difficult to find where the rename operations, second, it takes your brain to memorize the different naming convention for the same stuff.  The column name should be consistent from being to the end such that it is easy to trace and use.

Another example is the single table operation happens in the multiple table operations.  Yes, it works for the output.  But if you are considering more precisely and having a deeper thought, you will know that it wastes the resources.   Apparently, the combines table might have more data than a single table.  In terms of resources, it processes more rows for the same calculation.  Moreover, it also generates a sense why the operations happens there because there is no clue when the single table operation will take place.  It leads to difficulty in reading and maintain the flow.

Overall, the below diagram summarize the general steps as practice to deal with data flow.  Yes, data flow could be more complicated than this.  But please bear in mind that generically, it is simply these simple operations.  Isn't it?





Data Flow Optimization

Optimization is a not a must do task but it has big advantages in general to enterprise working on data flow.  Optimization actually is reducing the human readability because it helps optimize the data flow to facilitate the computer to run faster and more efficient.  It could not be hard to image that it helps save money because nowadays, every piece of cloud resources is actually money.

So, working on data flow, which is more important?  Human readability or computer efficiency?  The answer is obvious to strike the balance.   But frankly, it should first be human readable then computer runs efficiently because maintaining the data flow is human.   Unless, there is a very strong team that is very technical.  Otherwise, let's vote for human readability.  In the digitalization decades, more business users are actually starting to deal with data.  Even the skill sets are increasing for business users to deal with data, it is still far possibly they are more technical than IT specialists.  At least, for my personal experience, enabling the business units to get into the data world, the benefit is much more than having just a fast running flow.  Do you agree?



Thank you for reading.  I hope you can create a proper data mindset and continue the future data journey.  Just leave a message if you have any discussion/share want to make.



* If you like my post, support me buy me coffee https://buymeacoffee.com/kongsoncheung.