What after Level-2

Parshva1304 · June 2024

Hello
I just finished my Level-2 Model building , as of now I cannot land a job or do freelance work until I master model building , but for that I need some guidance
1)How to practice model building , resources or sample models I can work on
2)How to build a good profile around model building
I am very eager to practice model building and other projects
As my next target is Solution Architect I must have experience so how to gain that experience

Thank you

Dikshant · June 2024

I would suggest going through the community posts and following articles

The Truth About Sparsity: Part 1

DavidSmith

Apr 19, 2019

Throughout my five-plus years at Anaplan the conventional wisdom has been that we should eliminate sparsity in order to model efficiently. Throughout the last year, as part of the PLANS standards initiative, we have been critically re-evaluating all of the existing best practices and techniques and sparsity was included as part of that review. In this article (part one), I will explain what sparsity is, and why you shouldn’t be scared of it.

What is Sparsity?

Anaplan is a multi-dimensional planning platform, based on modules that hold data and calculations. These modules are primarily made up of lists (or dimensions) that describe the different aspects of the data. In most cases these modules contain more than one list, making them multidimensional.

Let’s use the following as an example. Assume we have a module containing the following dimensions:

Customers

Products

Channels

Monthly timescale of 2 years by month

If every customer sold every product, in every channel, every month, we would have dense data. In this scenario, 100 percent dense. However, in the real world, this situation is extremely rare—it’s very likely that some products are only sold by some customers, not all channels or months will contain data points.

In a multi-dimensional paradigm, having zero or null data is how we define the level of sparsity (the “gaps”). A sparse dataset (stored in a module) is where there are more gaps than data or the product of calculations.

Why is Sparsity Deemed a Problem?

So, taking our example above, let’s assume we have the following items for each dimension:

100 customers

1000 products

10 channels

26 time (2 * 12 months + 2 Year totals)

A single line item using this combination will require 26 Million cells (100 * 1000 * 10 * 26). This table shows the amount of memory required by each data type:

Boolean: 1 byte = 0.026Gb

List, Date, Time: 4 bytes = 0.104Gb

Numeric = 8 bytes = 0.208Gb

Text: 8 bytes = 0.208Gb

These values are the same regardless of how big the number is or how many characters are in the list format or text string.

The majority of the data in Anaplan modules contain numbers, so you can see why, with a lot of data, the size of the model can increase quite dramatically as we add more items to a list or extend the model timescale. This is the primary reason for sparse modules being deemed a problem: they are large!

Historically, proactively modeling for sparsity management was a key technique. As we evolved our thinking through deeper analysis and gained a better understanding of the way our customers used the Anaplan platform, we moved away from the approach of keeping model sizes as small as possible to reduce workspace usage.

In practice, this meant that instead of creating modules with a lot of sparsity, modelers created new structures by combining lists for only the valid combinations that existed in the dataset. The resulting hierarchy has many names—flattened, numbered, concatenated, and combined—but the result is the same and looks something like this:

Customer 1

Product 1
Channel 1

Channel 3

Product 2
Channel 5

Channel 2

Product 3
Channel 4

Channel 10

Customer 2

Product 4
Channel 7

Channel 9

Product 5
Channel 6

Channel 8

This approach results in a much smaller cell count because the resulting module now only contains two dimensions (the combined hierarchy list and time). This combined hierarchy is dense as only valid combinations exist and it eliminates most of the zero values (although there may still be a little sparsity in the month dimension). This is an effective technique for reducing the size of a model, but it does come with some issues, which I will touch on in part two of this article.

But first, let me dispel some other myths associated with sparsity.

Myth 1 - The bigger the model the worse it will perform

This is totally untrue. Yes, the larger the model the more data it will contain, but there is no direct correlation between size and performance. The performance of a model is based on the design of the models and the calculation structures, not simply the size. To give you some context, the worst-performing model we have seen in the field was less than 1Gb in size but took more than thirty minutes to open. We have many models that are more than 100Gb and open in under two minutes. Well-designed models perform at scale.

Myth 2 – Sparse modules are inefficient

I would partly agree with this, but only in relation to module size (see Myth 3). Sparse modules are not inefficient when it comes to calculations. Anaplan’s engine (the Hyperblock) is designed to work with multi-dimensional structures. At its heart, the Directed Acyclic Graph (D.A.G.) indexes data to calculate only what is needed when upstream data points have changed. So, once the model is open — and a full calculation has been performed — the calculations thereafter are very efficient. Anaplan doesn’t re-calculate zeros as there is nothing to do; nothing has changed.

Myth 3 – Multi-dimensional modules are always bigger than those with a combined hierarchy list

This relates to the technique I referred to earlier. This is not as black and white as the previous two points, but, in certain circumstances, this premise is entirely false. Let me give you an example from the field.

One of the long-standing techniques for dealing with transactional data was to combine all “dimensions” into a single-entry or transaction list. Using a unique key, that includes the time period (e.g. month), the records are imported and stored in Anaplan as single rows in a one-dimensional module. The resulting module would look something like this:

The module and structure are 100 percent dense; job well done! Well, no.

We looked at the impact of storing data in a two-dimensional module by removing the timescale from the key and adding time as a dimension in the data module. The results were surprising!

Have a look at the size of the data:

2 years of data by month

3 elements to create uniqueness

80 percent sparse

Model 1 used a transaction key that included the date

Model 2 was multi-dimensional with time as a dimension

You can see that not only did Model 2 open more than 90 percent faster than Model, 1 but it was also more than 75% smaller. How can that be, given the adage that sparse multi-dimensional modules take up more space than dense modules? Well, the reason is list items themselves also make up part of the model size. I didn’t mention it earlier when discussing the cell count calculation because lists don’t add to the cell count itself, but each list item uses 500b, so the bigger the list, the more space used.

Look at the size of each unique transactions list in the two models. Removing the date from the key resulted in the creation of only 300K unique transactions rather than 7.3M. In turn, this results in a much smaller Transaction Details module. The sparse module calculates more efficiently than the large dense module, and, combined with the smaller list, we see improved model-opening time.

Join me for part two, where I will discuss the pros and cons of multi-dimensional structures versus combined hierarchies.

The Truth About Sparsity: Part 2

DavidSmith

Apr 26, 2019

In part one I defined sparsity, shared how combined hierarchy lists have been a common technique to avoid it, and dispelled some of its associated myths. I also discussed some issues you will encounter when using combined hierarchies. In part two I will discuss the modeling considerations for handling sparsity and the best approach to take to deliver an efficient model.

Let’s start by looking at a few considerations of using multi-dimensional structures rather than combined hierarchies.

Simplicity

Quoting Tim Peters from the "Zen of Python," simple is better than complex. Anaplan is designed to be simple to use, so we should aim to keep the modeling experience as simple as we can.

Consider the effort and complexity of creating a combined hierarchy list, mapping data to and from this list, and running actions to create and update the list anytime a valid combination changes (see Dynamic). Compare that with the simplicity of creating a multi-dimensional module. We should look to use native functionality wherever possible—my adage is model naturally.

Flexibility

Multi-dimensional modules are flexible. Using filters, end users can view all of the products sold by customers or all of the customers that sell each product, just by pivoting the data.

However, when you create a combined hierarchy list, you have to decide which is the most common element that the users want or need. You are forcing the user to view the data in a defined format. To replicate the functionality from above, you will have to create a second combined hierarchy list to display the data in the other way. This means duplication of data and modules, leading to models that are larger and more complex.

Efficiency

Anaplan is designed to work with large data sets in a multi-dimensional form. It is very efficient in calculating across common dimensions, especially where the order of dimensions is the same (see Best Practice for Dimension ordering for more detail). This is why we advocate the D.I.S.C.O. methodology (Best practice for Module design), which groups like structures together.

Moving away from this approach increases model inefficiency: combined hierarchy lists often require mapping to and from other parts of the model, leading to a greater number of calculations and decreased efficiency of the model as a whole. They are not natural structures.

Size

In general, yes, multi-dimensional modules can be larger than those using combined hierarchy lists (although, in Part 1, I gave an example where this isn’t true). So it’s very likely that, where you have large data sets, you’ll not be able to have multi-dimensional modules everywhere (see Balanced Approach for further discussion). However, here are more of my adages:

Sparse and fast is better than dense and slow.

Big and fast is better than small and slow.

One of my colleagues says, end users never complain about a model being too big, they complain about performance. As I mentioned earlier: small models can often be slower due to the calculation efficiencies mentioned above.

From an end-user perspective, a fast model is better than a slow model. It’s better to have multiple models that are very fast and efficient than a single, smaller and inefficient model delivering a poor user experience through slow performance.

With Application Lifecycle Management (ALM) and workspace-to-workspace connectivity, it is now easier to manage and maintain models that are split than it was in the past.

There are many other techniques that can optimize the speed of calculations (Reduce calculations for better performance) and affect the size of a model, but that’s not the focus of this article. Managing sparsity through combined hierarchies is one technique, but it shouldn’t be the default approach—explore other options first. I will discuss this further in the Balanced Approach section below.

Easy to Maintain

With a multi-dimensional module, when a new item is added to or removed from a list (or subset), the module is automatically updated: there are no additional steps to take. However, with a combined hierarchy list, each time a new combination occurs or when a combination is no longer valid, an update process must run to ensure the list (or lists) are up to date.

This leads me to the last benefit of multi-dimensional modules: they are dynamic.

Dynamic

Anaplan is a very powerful modeling platform enabling users to plan and analyze data in real-time. Multi-dimensional modules enable elements of the calculations to be turned on and off with the selection of one checkbox.

Consider this example: a planner has various promotions for a product which they want to model against different scenarios to see the effects.

The modules are dimensioned by promoted products, scenarios, and time. Not all promotions are applicable for all scenarios, so there’s some sparsity in the structure. However, the results of the what-ifs can be re-calculated and viewed in real-time by turning them on or off, as desired.

Now consider the alternative where a combined hierarchy list is created: Scenarios > Promotions > Promoted Products

Yes, this will be smaller, but what happens when the user changes their mind and wants to add a promotion to, or remove a promotion from, the scenario? A user-driven action would have to be run to restructure the hierarchies every time. This would be very inefficient, disruptive for the end users, and lose the dynamism that makes Anaplan so special.

Balanced Approach

Every circumstance and set of requirements is different, so the answer to the question about the best approach to use is often, “it depends.” I mentioned a balanced approach, and this is where judgment plays a part.

Creating combined hierarchies to handle model size is okay for some calculations but is not appropriate for flexible and dynamic multi-dimensional modeling. Neither is having many large multi-dimensional modules that are difficult to navigate and make models unnecessarily large.

Appropriate is an important word—remember, Anaplan can perform calculations using the appropriate data structures for different elements of the model—the same structures or hierarchies don’t need to be used throughout the entire model.

We recommend using the D.I.S.C.O. methodology for building models (Best practice for module design), using different structures for inputs, calculations, and outputs. This is where the balanced approach comes in. You need to decide the most appropriate structure for input, calculation, and output modules, considering performance, model size, usability, and maintenance (PLANS - This is how we model).

It may be necessary to optimize a Calculation module using combined hierarchies (possibly with all summary options turned off). This could deliver a poor user experience, but if the combined hierarchy structure is fairly static (e.g. only updated at the start of the planning cycle) and if the end users are not expected to use this module on dashboards, then it is the correct approach to use; the result would be a small and efficient module. Sometimes, to deliver a flexible, intuitive end-user experience, you need to build input and output or reporting modules that are sparse and multi-dimensional.

Booleans are your friend

I want to re-emphasize that Booleans are the most efficient format of data you can use. Computers are built on 1s and 0s, and Booleans take up 1/8^th of numbers or text. Using Booleans in multi-dimensional structures is a very efficient way of modeling in a simple, flexible way without disproportionately increasing the size of the model.

Summary

To build a large model well, you will have to use a combination of the techniques described here and in part one.

Please pause and take a breath before modeling — don’t revert to the default behavior of eliminating sparsity at all cost. Model naturally, in line with Anaplan’s strengths. Consider flexibility; the end user experience; and the impact updates to structures using actions will have on this experience. Remember, it’s your end users who are using the model!

Hopefully, the points outlined above will help you make the best decisions when building the most efficient, flexible, and powerful model for your end users. Don’t be scared of multi-dimensionality— it’s Anaplan DNA!