LBSC708A Quiz 1 Section 0101 Fall 2000 The quiz should be completed in one continuous block of time. You may take up to two hours to complete this quiz, but I it to be completed within about 1 hour. Algorithm track students should complete the quiz and then take up to 1 extra hour for the algorithms track questions. The quiz will be due at class time on Monday. You should email the doc file to Dr. Allen and keep a copy for yourself. You may use a calculator. You should not browse the Web. You may use one page of handwritten notes, but you may not use your textbook or other printed materials. You may not communicate with anybody about the quiz until after you have completed it. BEGIN QUIZ NAME: Email-Address: 1. (20 points) - Short Definitions (1-2 sentences): a. TREC b. Stop List c. Proximity Operators d. Zipf's Law 2. (20 points) Medium-Length Definitions (no more than 2-4 sentences): a. Page Rank b. Cluster Hypothesis 3. (30 points) Longer Discussions (about 2 paragraphs each) a. If you were developing a search engine for the Web, what factors about the structure of the Web would you need to consider (illustrate your points from the class discussion and readings). b. Compare and contrast the strengths of the Vector and LSI Models. 4. (30 points) Problems - I may give partial credit so it's a good idea to show your work. a. Vector Model d1 d2 d3 +---+---+---+ t1 | 0 | 0 | 1 | +---+---+---+ t2 | 4 | 3 | 6 | +---+---+---+ t3 | 6 | | 2 | +---+---+---+ Calculate the TF*IDF matrix in which each element is computed as TF * IDF. Briefly explain what you are doing. b. Cosine Distance For the data in the previous table, calculate the cosine distance for a query composed of t2 and t3. Briefly explain what you are doing and why Cosines are good distance measure. Algorithms Students Only 5. (20 Points) Generate an Inverted File and B+ tree for the following string of words. The quick brown fox jumped. 6. (10 Points) Explain the ``Law of Surfing'' and some of its implications. END QUIZ