Map Reduce Data Mining and Ranking

Given a Google Search Query, find the AdWords Categories best describing it

Problem Statement

You have one file with (query, URL, score) comma separated row data in which for each URL, the score of its relevance to the given query is given. You have another file that has (URL, categories) info where categories are also comma separated and describe the URL content. Given a query, find the categories best describing its results.

Example:

search, www.google.com, 100
search, www.facebook.com, 20
social, www.google.com, 2
awesomeness, www.aminariana.com, 100

and

www.google.com, advertising, engineering, internet
www.facebook.com, advertising, php

Evaluation

  1. Writing an algorithm that outputs some meaningful results (25%)
  2. Being memory-efficient at scale (25%)
  3. Producing complete result coverage (25%)
  4. Producing percentages for category rankings (25%)

References

Sashi from Google

Recommended reading

Author
Amin Ariana
Published
April 2011