Format of export data (ISO-8859-1 or UTF-8)

Hi Product team,

 

What is the format of data when exported as .CSV or .TXT?  Is it ISO-8859-1 or UTF-8.  What I have seen is if the module is English only it outputs as ISO-8859-1 and if the module is Chinese, output is UTF-8.  Can we have some clarity on this please?  Thanks in advance.

 

-Regimon Sebastian

Answers

  • @sebastian,

     

    This article in Anapedia explains which special characters you can import and those you should avoid. It also provides instructions on how to encode import files containing special characters.

     

    https://help.anaplan.com/anapedia/Content/Import_and_Export/Get_Started_with_Imports/Tips_for_Importing_Special_Characters.html

     

    Thanks

    Sathya

  • Hi @SathyaM ,

    I am not referring to import data. I am referring to export files.

    Is the exported data in ISO-8859-1 or UTF-8 format?

     

    Regimon

  • It is always in UTF-8. If a program reads it as a file it may try to autodetect the encoding, and if it only has ASCII characters it may think ISO-8859-1 is a good match.

  • That is what I thought too .. but when I imported back the same file I exported out (export format expected to be UTF-8), Anaplan auto detected it as ISO-8859-1 instead of UTF-8. Is it import or export that is getting it wrong?

  • Hi Sebastian, 

     

    The encoding character set is used to make possible to save special language characters in TXT files (extensions of files CSV TXT, etc.).

    For instance, if the file is setup as "ISO-8859-1" you will not able to see correctly the Russian alphabet but if the file is saved as "UTF-8" the special characters are correct viewed. 

     

    Another particularity: "UTF-8" character set is including the "ISO-8859-1". "UTF-8" is just an extension of "ISO-8859-1".

     

    So, simplifying: you can always setup the Anaplan import action with "UTF-8". If the file that you import it is saved with "ISO-8859-1" the file will still be able to be read correctly and loaded in Anaplan without an issue.

     

    When creating an Anaplan import action if the file is recognized as "ISO-8859-1" I would always change it manually to "UTF-8" for the simple reason that the "UTF-8" includes the "ISO-8859-1". 

    If the Anaplan action is saved using "ISO-8859-1" and you try to load a "UTF-8" file - the import process will not work. 

     

    UTF means "Unicode Transformation Format". 

     

    To have a file in UTF-8 from Excel use the option "CSV UTF-8 (Comma delimited) (*.csv) option when saving a file. 

     

    However if from Excel, the file is saved using the option "Unicode Text(*.txt)" the encoding of the saved txt file will UTF-16LE.  Even is still a UTF type encoding, the "UTF-16LE" and "UTF-8" encoding are not compatible between them.   

     

    Ciao

    Alex

     

  • Hi Alex,

    As I had stated earlier, I am not referring to import data. I am referring to export file format.

     

    Regi

  • Sorry, 

     

    You are correct. As @ben_speight  mentioned the Anaplan export is always in UTF-8. 

     

    I made also some tests with simple data and both CSV and TXT exported files were always in UTF-8, even in the module were present only English characters. 

    How did you verify the encoding of the exported file? 

  • Hi

    I imported back and Anaplan detected some files as ISO-8859-1 and some as UTF-8.

     

    rgds

    Regi

  • Hi Regi, 

     

    You are correct again. Interesting behavior indeed. 

     

    But, if you check the exported form Anaplan file using Notepad++ in menu "Encoding" you will notice that it is checked "Encode in UTF-8" also for only English character files. 

     

    I always change an import action in Anaplan from encoding "ISO-8559-1" or any Windows encoding that is automatically detected by Anaplan into "UTF-8". 

     

    "ISO-8559-1" and other Windows encoding are included in "UTF-8".  

     

    However, if the file is encoded "UTF-16LE" then it is needed to be used in Anaplan action "UTF-16LE" as well. 

     

    Regards, 

    Alex

     

     

  • The auto-detection of file encoding for uploaded files is based on a heuristic analysis of a sample taken from the first 64k of the file. Whilst this is effective for differentiating between, say, big- and little-endian UTF-16, it can can produce varying results for encodings that share most common code units, depending on the contents of that first 64k. As Alex suggests, knowing what encoding a text file is in and checking it is correct on upload is the best policy for trouble-free imports.