Map Reduce Data Mining and Ranking
Given a Google Search Query, find the AdWords Categories best describing it
You have one file with (query, URL, score) comma separated row data in which for each URL, the score of its relevance to the given query is given. You have another file that has (URL, categories) info where categories are also comma separated and describe the URL content. Given a query, find the categories best describing its results.
search, www.google.com, 100 search, www.facebook.com, 20 social, www.google.com, 2 awesomeness, www.aminariana.com, 100
www.google.com, advertising, engineering, internet www.facebook.com, advertising, php
- Writing an algorithm that outputs some meaningful results (25%)
- Being memory-efficient at scale (25%)
- Producing complete result coverage (25%)
- Producing percentages for category rankings (25%)
Sashi from Google
Google search results for the top 1000 highest ranking URLs
You have trillions of URLs stored uniquely, without order, across numerous machines, each with a score. Find the 1000 URLs with the highest scores.
MapReduce parallel processing of all known URLs for median length
You are given the data set of all Google crawled URLs on the Internet -- that's a very large set. Write an algorithm to find the median URL length.
String to Byte Array Serialization and Deserialization
Serialize a list of strings into a byte array. Then deserialize it.
Language Vocabulary Sampling
You are given a stream of conversation in a mysterious language from planet Mars. Approximate the set of word vocabulary in this language.
- Amin Ariana
- April 2011