This package is an efficient, native-array-based JavaScript trie implementation. The project includes Java-based tools to build your own native arrays from word lists.
Consider the following dictionary:
To save memory, we can compact the trie into a DAWG by combining shared suffixes.
This is more efficient, but a large word list can still take up a lot of memory. We can convert this structure to a flattened array.
Each element in the data array is 32 bits. SBTrie uses a typed array for speed and memory-saving purposes.
The first array is an integer version number corresponding to which version of the Java tool created the array.
The next element is a header bit set for the root of the trie. Header ints identify whether a node is a valid end-of-word, and which children a given node has. The end-of-word bit is the left-most bit in the integer. The children are represented in the right-most bits.
After each header element come child pointers. There are as many child pointers as the header node has children. Each pointer is an integer address to another element in the array.
This is enough to represent the entirety of our trie. Here's what our simple animal list might look like:
Index | Integer | 32 Bits | Notes |
---|---|---|---|
0 | 1 | 00000000000000000000000000000001 | Version 1 of SBTrie data structure |
1 | $_____ZYXWVUTSRQPONMLKJIHGFEDCBA 00000000000000000000000000101110 |
Root node has children 'B','C','D','F', and is not a valid end-of-word ($) | |
2 | 6 | Array index of the 'B' node | |
3 | Array index of the 'C' node | ||
4 | Array index of the 'D' node | ||
5 | Array index of the 'F' node | ||
6 | $_____ZYXWVUTSRQPONMLKJIHGFEDCBA 00000000000000000000000000000001 |
'B' node has child 'A', is not a valid end-of-word ($) | |
7 | 8 | Array index of the 'A' child node | |
8 | $_____ZYXWVUTSRQPONMLKJIHGFEDCBA 10000000000010000000000000000000 |
'BA' node has child 'T', is not a valid end-of-word ($) | |
9 | $_____ZYXWVUTSRQPONMLKJIHGFEDCBA 10000000000010000000000000000000 |
'BA' node has child 'T', is not a valid end-of-word ($) |