New Line Item Text Format Type - SmallText

erobbins · September 2021

First, I would like to address that I do not know how the back end of Anaplan is built, so I am unsure as to the possibility or ease of implementation of this idea.

The idea is taking a concept from SQL Server. SQL Server has the following data types for storing integer values: bigint, int, smallint, and tinyint. The below chart shows the range of values and storage requirement for each data type.

int data type sql.png

The concept of this idea would be the same. Make a new line item text format, SmallText, that would limit the input of text to 60 characters to reduce the memory/storage requirements and increase model performance when used versus the traditional text format. Anaplan already has this text format requirement for list member codes (codes cannot exceed 60 characters), so it would seem that the Anaplan platform code for this item, that is already in production, could be leveraged to apply to the new SmallText line format type. I am sure that is an oversimplification of the engineering required, but hopefully, the idea is clear.

While minimizing text formatted line items and their calculations is a best practice in Anaplan model building, some models have the need for using smaller text strings at scale.

Assumptions

Limiting the amount of characters a text formatted line item can accept will have a direct impact on the memory/storage requirements of that text formatted line item thus increasing model performance.

Benefits

Performance increase across entire platform for all models where line items can be converted from Text to SmallText.

Faster models
Quicker save state times
Decreased model sizes
Etc.

jprince · September 2021

I've had requests on limiting the amount of text end users can enter into Anaplan; this can be a useful tool for some of our customers.

ben_speight · September 2021

We currently allocate 8 bytes of workspace allocation per text-formatted cell to hold a reference, and employ techniques like deduplication (multiple appearances of "ABC" in a detail line item can share the same representation as it is immutable) and reuse (eg "A" + "" or UPPER("A") -> the same "A" supplied to it) internally to keep the memory used by the representations themselves to a reasonable level. Just adding limits would help Anaplan's resource management a bit, and prevent very long text cells from causing model performance issues, but if all text was already within bounds would not change the performance characteristics. We would need to think about how to deal with text that exceeds the limit. For detail data, either silent truncation or a data input / import error could result. If we introduce/change the limit on an existing line item, would we first check that all values were within limits? For calculated data, it would be difficult to do anything other than silently truncate values that exceed bounds - and users would not be able to discern whether or not this had happened unless we added additional support.
If we went further (and I have understood the proposal correctly), we could technically store the representation inline instead of indirectly referencing them. This would help resource allocation, but instead of 8 bytes per cell Anaplan would need to count 2 * (1 + max length) bytes per cell, as we could not achieve the optimisations mentioned earlier. Some things like calculations could get an improvement in performance, but every kind of text calculation/function would need to be re-implemented to get that improvement, and conversions between non-limited and limited text could easily negate it.
Supporting lower-precision/range numeric data would require much less work and add more value.
However, if the required behaviour can be pinned down, there are benefits in just imposing a limit - for example, where data will be fed into external systems that themselves impose similar limits.

MarkWarren · September 2021

From my analysis of models across the platform most text is below 60 characters anyway.
Now I can't measure the length of every cell, so this is an average of text lengths in cells.
But in most models it is the volume of text rather than the length that is the problem, so we, as a team and as a community, need to look at strategies for avoiding using text, showing ways to model that don't require it.

erobbins · September 2021

@ben_speight, thank you for the thought out response from the engineering perspective. This proposal assumes that text is already within the current bounds, so I would like to steer the discussion toward your idea of storing inline instead of indirectly referencing. If this idea (and any others like it) has the opportunity to increase calculation performance, it would be good to know to what extent for models with low, medium, and high amounts of text and calculation of text line items.

Even if the text calculations/functions would need to be re-implemented to get the improvement, I think it potentially would be worth it given how costly text is to models at scale. For all of the use cases that I can think of, I would not see a need to convert between limited and non-limited text thus locking in the performance improvements.

New Line Item Text Format Type - SmallText

New · Last Updated January 2023

Comments

Categories