Which data structure is most suitable to implement a Dictionary?

سیف خان

I have to write a Dictionary program as a semester project for an undergraduate course on Data Structures and Algorithms, and I am expected to find the most suitable solution (Data Structure) to the problem.

I considered using either a hash table or a trie. I have been suggested to use treaps by someone but have not been able to look into them yet.

My database has about 100k distinct words and their meanings. The basic functionalities the program is expected to provide are insert, update, remove and search a word/definition. If I manage to squeeze in auto-completion and spell correction, that'd be an added bonus.

So, my question is, keeping in mind my requirements, which data structure would be best suited for my purposes. When I say 'best', I am asking for the data structure which has best runtime complexity and low cost (memory requirements).

Also, I wanted to be able to have an algorithm which returned all words starting with the given prefix. For example, say I make a function call dictionary.getWordsStartingWith("fic") it should return a list of all words that start with fic such as fiction, fictitious,fickle etc. I know I can do this if I implement my dictionary as a trie, I could do this, but is this possible to do it with a hash table?

Nir Friedman

You almost certainly want a trie if you want to do auto completion/prefix matching. Hash tables don't really make this possible; in fact good hash functions are designed such that even very similar keys (e.g. same prefix) map to completely different parts of the array. For hashing purposes this is considered a feature.

Treaps are basically binary search trees that use stochasticity + heap property to do their balancing. In general the interface is the standard BST tree interface; so it's really just an implementation detail that only leads to moderately different properties than a red black tree or an AVL tree.

BST's aren't nearly as suited to the problems that you seem to be looking to solve as a trie. BST's tend to be all about following inequalities downwards, whereas trie's are about following equalities downward. When you're dealing with numeric data, inequality comparisons are everything because equality is very rare (since the space of possibilities is huge). With strings, each character has very few possibilities so it makes more sense to exploit equalities, leading to optimizations like not actually storing keys at most nodes.

In summary, I'd recommending proceeding with tries. They're very heavily used for exactly this sort of thing, and you can find a ton of resources on optimizing them (especially for space) since they're particularly used for text input on mobile where space/cycles are at a premium. It's also a very interesting data structure to learn IMHO, compared to BST's which you a) probably learned about heavily in freshman data structures, and b) Isn't really that interesting of a data structure; everything other than the balancing scheme is trivial and the balancing schemes are more tedious than anything else (RB trees have something like 7 truly distinct cases for balancing or something like that, pretty hard to code a RB tree and get them all exactly right).

The wikipedia page has some good info: https://en.wikipedia.org/wiki/Trie. Bitwise tries look especially interesting.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Which data structure will be suitable for storing three related values? And how to implement?

From Dev

Suggest Data Structure which implement this

From Dev

Which import type is the most suitable in case of a hierarchical data?

From Dev

Which import type is the most suitable in case of a hierarchical data?

From Dev

dictionary or a list is suitable data structure for my o/p?

From Dev

Choosing a suitable data structure

From Dev

Which is the most efficient data structure for storing followers and follows

From Dev

How implement data structure which is map many to one?

From Dev

Which data structure in C++ suits to implement webbrowser history?

From Dev

What is the most suitable way to save that data in android

From Dev

which data structure to use to implement phone book which can be searched either on Name or phone number?

From Dev

Building data structure for Dictionary

From Dev

Is a dictionary the right data structure for this?

From Dev

Which is the most suitable stage in the DirectShow pipeline to initialize a resource?

From Dev

Which codecs are most suitable for playback with Windows Media Player on Windows XP?

From Dev

Which Map implementation is most suitable for River Crossing puzzle?

From Dev

Which technology is most suitable to develop Add-In for MS Excel

From Dev

C++ : suitable Data Structure for this given scenario

From Dev

How to make a data structure which can hold 100 most recent record_values basis on timestamp?

From Dev

Picking the most efficient data structure

From Dev

What is the most suitable data type for multiple properties (keys) and a list as a value?

From Dev

What is the most suitable data type for multiple properties (keys) and a list as a value?

From Dev

Data structure similar to Dictionary, but with range?

From Dev

Python - webscraping; dictionary data structure

From Dev

Is dictionary the best data structure for this case?

From Dev

Python - webscraping; dictionary data structure

From Dev

Data Structure: Dictionary Like Tree

From Dev

When to use which Data Structure?

From Dev

Not sure which data structure to use

Related Related

  1. 1

    Which data structure will be suitable for storing three related values? And how to implement?

  2. 2

    Suggest Data Structure which implement this

  3. 3

    Which import type is the most suitable in case of a hierarchical data?

  4. 4

    Which import type is the most suitable in case of a hierarchical data?

  5. 5

    dictionary or a list is suitable data structure for my o/p?

  6. 6

    Choosing a suitable data structure

  7. 7

    Which is the most efficient data structure for storing followers and follows

  8. 8

    How implement data structure which is map many to one?

  9. 9

    Which data structure in C++ suits to implement webbrowser history?

  10. 10

    What is the most suitable way to save that data in android

  11. 11

    which data structure to use to implement phone book which can be searched either on Name or phone number?

  12. 12

    Building data structure for Dictionary

  13. 13

    Is a dictionary the right data structure for this?

  14. 14

    Which is the most suitable stage in the DirectShow pipeline to initialize a resource?

  15. 15

    Which codecs are most suitable for playback with Windows Media Player on Windows XP?

  16. 16

    Which Map implementation is most suitable for River Crossing puzzle?

  17. 17

    Which technology is most suitable to develop Add-In for MS Excel

  18. 18

    C++ : suitable Data Structure for this given scenario

  19. 19

    How to make a data structure which can hold 100 most recent record_values basis on timestamp?

  20. 20

    Picking the most efficient data structure

  21. 21

    What is the most suitable data type for multiple properties (keys) and a list as a value?

  22. 22

    What is the most suitable data type for multiple properties (keys) and a list as a value?

  23. 23

    Data structure similar to Dictionary, but with range?

  24. 24

    Python - webscraping; dictionary data structure

  25. 25

    Is dictionary the best data structure for this case?

  26. 26

    Python - webscraping; dictionary data structure

  27. 27

    Data Structure: Dictionary Like Tree

  28. 28

    When to use which Data Structure?

  29. 29

    Not sure which data structure to use

HotTag

Archive