Skip to main content

Unicode Normalization

Unicode is a standard for representing characters from all worldwide languages in one machine readable format., like most cloud services uses Unicode when working with text. Because paths are particularly sensitive in a filesystem, follows a specific pattern for normalizing Unicode values in paths. uses the "NFKC" (Normalization Form Compatibility Composition) algorithm for normalizing Unicode as part of path comparison.

Although normalizes Unicode for path comparison, is Unicode preserving, meaning that the path name will be stored using the actual Unicode representation used when the file or folder is first created.

Exact Algorithm For Path Normalization uses 2 algorithms for path normalization. Our Normalize algorithm is applied to all paths provided to the service to remove noncompliance with our path requirements. If you are building an SDK or manual API integration to, we recommend that you implement this algorithm prior to sending any paths to the API to ensure that they will be treated identically on the server side as to how you provided them.

Additionally, our Normalize For Comparison algorithm is used to compare two paths to determine whether they are the same. If you are building an SDK or manual API integration to which needs to determine whether two file paths are the same, we recommend that you also implement this algorithm.

The official SDKsExternal LinkThis link leads to an external website and will open in a new tab implement both algorithms natively and we encourage the use of our SDKs rather than implementing either of these algorithms by hand. For completeness, we describe the algorithms here. Sample code for the following algorithms can be found in our SDKsExternal LinkThis link leads to an external website and will open in a new tab.

Normalize Algorithm

Convert the path to UTF-8

Remove any characters with byte value of 0

Convert any backslash \ characters to a forward slash /

Remove any trailing or leading slashes

Remove any path parts that are . or ..

Replace any duplicate forward slashes (such as /// with a single forward slash /)

Normalize For Comparison Algorithm

Run the path through the Normalize Algorithm

Unicode Normalize the Path using Unicode NFKC algorithm

Transliterate and remove accent marks by using the official transliteration map specified below. Any instance of the first character in the map should be replaced with the remaining characters.

Convert the Path to lowercase using the case mapping found in Unicode 9.0. (Note: we are aware that this version of Unicode is fairly old and many modern programming languages now implement Unicode 15.0. The only differences affect two very rare languages and we have never seen these differences cause any actual issues in practice at We suggest using whichever version of Unicode your environment supports, as that will most likely be fine.)

Remove any trailing whitespace (\r,\n,\t or the space " " character)

Any two paths with the same resulting string from this algorithm are considered the same file on

TRANSLITERATION_MAP = "ÀA,ÁA,ÂA,ÃA,ÄA,ÅA,ÆAE,ÇC,ÈE,ÉE,ÊE,ËE,ÌI,ÍI,ÎI,ÏI,ÐD,ÑN,ÒO,ÓO,ÔO,ÕO,ÖO,ØO,ÙU,ÚU,ÛU,ÜU,ÝY,ßss,àa,áa,âa,ãa,äa,åa,æae,çc,èe,ée,êe,ëe,ìi,íi,îi,ïi,ðd,ñn,òo,óo,ôo,õo,öo,øo,ùu,úu,ûu,üu,ýy,ÿy,ĀA,āa,ĂA,ăa,ĄA,ąa,ĆC,ćc,ĈC,ĉc,ĊC,ċc,ČC,čc,ĎD,ďd,ĐD,đd,ĒE,ēe,ĔE,ĕe,ĖE,ėe,ĘE,ęe,ĚE,ěe,ĜG,ĝg,ĞG,ğg,ĠG,ġg,ĢG,ģg,ĤH,ĥh,ĦH,ħh,ĨI,ĩi,ĪI,īi,ĬI,ĭi,ĮI,įi,İI,IJIJ,ijij,ĴJ,ĵj,ĶK,ķk,ĹL,ĺl,ĻL,ļl,ĽL,ľl,ŁL,łl,ŃN,ńn,ŅN,ņn,ŇN,ňn,ʼn'n,ŌO,ōo,ŎO,ŏo,ŐO,őo,ŒOE,œoe,ŔR,ŕr,ŖR,ŗr,ŘR,řr,ŚS,śs,ŜS,ŝs,ŞS,şs,ŠS,šs,ŢT,ţt,ŤT,ťt,ŨU,ũu,ŪU,ūu,ŬU,ŭu,ŮU,ůu,ŰU,űu,ŲU,ųu,ŴW,ŵw,ŶY,ŷy,ŸY,ŹZ,źz,ŻZ,żz,ŽZ,žz"

Get Instant Access to

The button below will take you to our Free Trial signup page. Click on the white "Start My Free Trial" button, then fill out the short form on the next page. Your account will be activated instantly. You can dive in and start yourself or let us help. The choice is yours.