Language Vocabulary Sampling

You are given a stream of conversation in a mysterious language from planet Mars. Approximate the set of word vocabulary in this language.

Problem Statement

You are given a reference to an instance of a class called "WordStream" that has a method called "string GetNextMartianWord()", where Martian is a mysterious language from planet Mars. This method gives you a random word from the Martian language, picked with equal probability, and with replacement (i.e. it could pick the same word more than once). Given N is the maximum number of times that you can sample Martian language by calling this method, and the guarantee that N is at least an order of magnitude larger than the number of possible Martian words, write a pseudo-code algorithm to approximate the number of words Martians have in their language.

Note: The signature of your method is:

int ApproximateHowManyMartianWords (WordStream wordStream, int N)

Evaluation

  • Completeness of the pseudo-code algorithm (25%)
  • Correctness of the algorithm, demonstrated by quick manual unit tests (25%)
  • What is the time and space complexity in terms of N? (25%)
  • Describe how you would change your code to print the most frequently encountered 3-letter prefix before returning. (25%)

References

I designed this question

Recommended reading

Author
Amin Ariana
Published
January 2010