Laverage

Oct 04 2020 at 12:30 GMT

Given a string of text, for example "Mañana", how can I encode it into a number and then decode that number back into the original string in JavaScript?

The reason why I want to do this is that I want to write a simple implementation of the RSA algorithm and there the message to be encrypted is required to be a number.

Therefore, if I want to encrypt strings of text, I need to have a way to convert a string of text into a number so that I can encrypt it with the RSA algorithm. Then, when I decrypt the number, I want to be able to recover the original string from it.

So, how can I convert Unicode text to a number and then back in JavaScript?

Mike The Programmer

Oct 04 2020 at 13:41 GMT

Let's walk through the process of encoding the string `"Mañana"`

into a number.

`const text = "Mañana";`

First of all, it would be nice if all we have to deal with are just ASCII characters rather than the whole range of Unicode characters.

We can get the equivalent string consisting of only ASCII characters by passing the original string to `encodeURIComponent`

, which makes the string safe to be used as part of a URL, and therefore it consists of only ASCII characters.

```
const asciiStr = encodeURIComponent(text);
// "Ma%C3%B1ana"
```

The reason why this is great is that an ASCII character fits into a single byte, and therefore it can be represented as a pair of hexadecimal digits. For example, the character `'a'`

corresponds to the number `97`

, which is expressed in hexadecimal notation as `"61"`

.

Next, we split the string into an array, so we can work with the individual characters more easily.

```
const chars = asciiStr.split("");
// ["M", "a", "%", "C", "3", "%", "B", "1", "a", "n", "a"]
```

Now, let's convert each character to the corresponding integer and get it expressed in hexadecimal notation.

```
const hexChars = chars.map((ch) =>
ch.codePointAt(0).toString(16).padStart(2, "0")
);
// ["4d", "61", "25", "43", "33", "25", "42", "31", "61", "6e", "61"]
```

Note that we are using `.padStart(2, "0")`

because we want to make sure that each character is express by exactly two hexadecimal digits. So, if a character expressed in hexadecimal notation is `"5"`

, we pad it with a `"0"`

, thus making it `"05"`

. The reason why we do this is that when we will be converting back to the original string, we don't want ambiguities. For example, if we see `"55"`

, is it two `"5"`

s or a single `"55"`

? There is no such ambiguity with the padding since two `"5"`

s would correspond to `"0505"`

.

Next, we form a single large number by joining all the characters expressed in hexadecimal notation.

```
const hexNumber = hexChars.join("");
// "4d61254333254231616e61"
```

It turns out that we get a huge number. The above number expressed in decimal notation corresponds to `93546045030950519871467105`

.

We definitely can't express such numbers using the JavaScript `number`

type. However, JavaScript has the new `BigInt`

type which can be used for such large numbers.

We can create a `BigInt`

from a number in hexadecimal notation by passing a string that starts with `0x`

to `BigInt`

.

```
const m = BigInt(`0x${hexNumber}`);
// 93546045030950519871467105n
```

We just encoded the original string into a number!

Here's the `textToNumber`

function that does all the steps we just went through:

```
function textToNumber(text) {
const asciiStr = encodeURIComponent(text);
const chars = asciiStr.split("");
const hexChars = chars.map((ch) =>
ch.codePointAt(0).toString(16).padStart(2, "0")
);
const hexNumber = hexChars.join("");
const m = BigInt(`0x${hexNumber}`);
return m;
}
```

Let's see how to decode the number we got back into the original string. That is, how we go from `93546045030950519871467105n`

back to `"Mañana"`

.

In short, we basically do the reverse of what we did to encode the text into a number.

First, we get the number expressed in hexadecimal notation.

```
let hexNumber = m.toString(16);
// "4d61254333254231616e61"
```

We have to be careful here though. If the first character expressed in hexadecimal notation started with a `0`

, e.g., `"04"`

, then in the hexadecimal representation that we get by calling `.toString(16)`

we would be missing that leading `0`

. So, we would get something like `"4..."`

instead of `"04..."`

. We can take care of this case by checking if the length of the `hexNumber`

is odd. If that's the case, we need to add the missing `0`

to the front.

```
if (hexNumber.length % 2 === 1) {
hexNumber = "0" + hexNumber;
}
```

Next, we split the hexadecimal number into pairs of hexadecimal digits. We know that each pair corresponds to a character.

```
const hexChars = hexNumber.match(/\w{2}/g);
// ["4d", "61", "25", "43", "33", "25", "42", "31", "61", "6e", "61"]
```

Next, we convert each number expressed in hexadecimal notation back to a character.

```
const chars = hexChars.map((hex) =>
String.fromCodePoint(parseInt(hex, 16))
);
// ["M", "a", "%", "C", "3", "%", "B", "1", "a", "n", "a"]
```

We concatenate all those character to form the ASCII string.

```
const asciiStr = chars.join("");
// "Ma%C3%B1ana"
```

Finally, we can decode the URI component to get back the Unicode string.

```
const text = decodeURIComponent(asciiStr)
// "Mañana"
```

Here's the `numberToText`

function that does all the steps we just went through:

```
function numberToText(m) {
let hexNumber = m.toString(16);
if (hexNumber.length % 2 === 1) {
hexNumber = "0" + hexNumber;
}
const hexChars = hexNumber.match(/\w{2}/g);
const chars = hexChars.map((hex) =>
String.fromCodePoint(parseInt(hex, 16))
);
const asciiStr = chars.join("");
const text = decodeURIComponent(asciiStr);
return text;
}
```

That's it! We can convert a string of text to a number using the `textToNumber`

function and then go back from the number to the original string using the `numberToText`

function!