代做COMPSCI 5011 INFORMATION RETRIEVAL M 2020代做留学生SQL语言

项目预算：开发周期：发布时间：要求地区：

DEGREES OF MSci, MEng, BEng, BSc, MA and MA (Social Sciences)

INFORMATION RETRIEVAL M

COMPSCI 5011

Wednesday 13 May 2020, 09:30 BST

SECTION A

1. (a)

The following documents have been processed by an IR system where no stemming has been applied:

DocID	Text
doc1	france is world champion 1998
doc2	croatia and france played each other in the semifinal
doc3	croatia was in the semifinal 1998
doc4	croatia won the semifinal in russia 2018

(i) Assume that the list of stopwords is the following: and, in, the, was, each, other. Build an inverted file for these documents, showing clearly the dictionary and posting list components. Your inverted file needs to store sufficient information for computing a simple tf*idf term weight, where wij = tfij *log2(N/dfi) [5]

(ii) Compute the term weights (wij) of the terms ‘world ’ and ‘ 1998 ’ in doc1. Show your calculations. [2]

(iii) Assuming the use of a best match ranking algorithm, rank all documents after

computing their relevance scores for the following query:

1998 semifinal

Show your calculations. Note that log2(0.75)= -0.4150 and log2(1.3333)= 0.4150. [4]

(iv) Typically, a log scale is applied to the tf (term frequency) component when

scoring documents using a simple tf*idf term weighting scheme. Explain why this is the case illustrating your answer with a suitable example in IR. Explain through examples how models such as BM25 and PL2 control the term frequency counts. [4]

(b) Assume that the set of relevant documents for a query q is Rq, where:

Rq = {doc1, doc2, doc5, doc7, doc8}

An IR system ranks the top 10 results for this query as follows (the leftmost item is the top ranked search result):

doc1 doc12 doc7 doc6 doc4 doc5 doc2 doc8 doc9 doc3

(i) Show the computed precision and recall values at each rank of the produced ranking for the query q. [3]

(ii) Show the calculation of the interpolated precision values at the 11 standard

recall levels for the ranking produced by the system for the query q. [2]

(iii) Compute the average precision of query q for the returned ranking. Show your calculations. [2]

(iv) Based on your understanding of the average precision measure, explain

whether the obtained average precision score for query q was expected.

Justify your answer. [3]

(c) In various DFR models, term frequency tf is replaced with a normalised term frequency tfn as follows:

Using an example of at least two documents of varying length, explain why such a normalisation is needed. Describe in detail how the above formula will behave for your chosen example documents.

What is the role of the c parameter in the formula? [5]

2. (a) Consider a corpus of documents C written in English, where the frequency distribution of words approximately follows Zipf’s law r * p(wr |C) = 0.1, where r = 1, 2, … , n is the rank ofa word by decreasing order of frequency. wr is the word at rank r, and p(wr |C) is the probability of occurrence of word wr in the corpus C. Compute the probability of the most frequent word in C. Compute the probability of the 2nd most frequent word in C. Justify your answers. [4]

(b) Suppose we have a corpus of documents with a dictionary of 6 words w1, ..., w6.

Consider the table below, which provides the estimated language model p(w|C) using the entire corpus of documents C (second column) as well as the word counts for doc1 (third column) and doc2 (fourth column), where ct(w, doci) is the count of word w (i.e. its term frequency) in document doci . Let the query q be the following:

q = w1 w2

Word	p(w\|C)	ct(w, doc1)	ct(w, doc2)
w1	0.8	2	7
w 2	0.1	3	1
w3	0.025	2	1
w4	0.025	2	1
w 5	0.025	1	0
w6	0.025	0	0
SUM	1.0	10	10

(i) Assume that we do not apply any smoothing technique to the language model for doc1 and doc2. Calculate the query likelihood for both doc1 and doc2, i.e. p(q|doc1) and p(q|doc2) (Do not compute the log-likelihood; i.e. do not apply any log scaling). Show your calculations.

Provide the resulting ranking of documents and state the document that would be ranked the highest. [3]

(ii) Suppose we now smooth the language model for doc1 and doc2 using Jelinek-Mercer Smoothing with λ = 0.1. Recalculate the likelihood of the query for both doc1 and doc2, i.e., p(q|doc1) and p(q|doc2) (Do not compute the log-likelihood; i.e. do not apply any log scaling). Show your calculations.

Provide the resulting ranking of documents and state the document that would be ranked the highest. [4]

(iii) Explain which document you think should be reasonably ranked higher (doc1 or doc2) and why? [3]

(c)

Assume that a user’s original query is bargain books bargain DVDs really bargain books. The user examines two documents, d1 and d2. She judges d1, with the content books bargain bestsellers really bargain books, as relevant and d2 with the content bargain great DVDs as non-relevant.

Assume that the IR system is using raw term frequency (with no IDF). Moreover, for relevance feedback, suppose the use of Rocchio’s Relevance Feedback (RF) algorithm, with alpha= 1, beta= 0.75, gamma= 0.25.

First, provide the original query vector, clearly stating its dimensions. Provide also the vector representations for d1 and d2. Next, provide the reformulated query that will be run by the IR system after the application of Rocchio’s RF algorithm.

Terms in the reformulated query with negative weights can be dropped, i.e. their weights can be changed back to 0 (please list all the vector dimensions in alphabetical order). Show your calculations. [7]

(d) Explain, with a suitable example, why the terms added in a query that has been reformulated by a query expansion approach are weighted. [4]

(e) Using an appropriate analogy from Computer Science of your choosing, explain in detail why positive feedback is likely to be more useful than negative feedback to an information retrieval system. Illustrate your answer with an example from a suitable search scenario. [5]

3. (a) A Web search engine receives a large number of ambiguous queries. Briefly discuss three possible techniques a Web search engine might use to deal with such queries. [4]

(b) Using a suitable example, explain why Precision at Rank R (P@R) is not a good metric to use for learning an effective learning-to-rank model. What metric would you suggest to use instead and why? [5]

(c) A posting list for a term in an inverted index contains the following three entries:

id=3 tf=2 id=6 tf=2 id=8 tf=3

Applying the delta compression of ids, show the binary form of the posting list when compressed using Unary compression for both ids and frequencies. What is the resulting (mean) compression rate, in bits per integer? [5]

(d) Given the following hyperlink structure among five web pages, and assuming teleportation, show the step by step construction of the Google matrix for PageRank with a teleportation factor λ = 0.1. You need to write down all the matrices and intermediate results but you do not need to compute the final PageRank scores.

NB: You can indicate matrices in plain text like a table, or write matrices row by row, e.g. ([a,b];[c,d]) shows a matrix with two columns and two rows where the first row is [a,b] and the second row is [c,d].

Explain which web page you think should intuitively have the highest PageRank value and why? [6]

(e) The University of Glasgow has deployed a new search engine to support students and staff accessing up-to-date information. The University’s search engine receives a large number of time-sensitive queries such as ‘exams timetable ’ and ‘graduation ceremony’ as well as queries where the users aim to find a particular web page (e.g. ‘international student support appointments ’ or ‘assessment scale’).

The University would like its search engine to be evaluated and appoints you to be in charge of this evaluation. Using a maximum of 750 words, explain how you will evaluate such a system. As part of your answer, you will need to describe a number of alternative evaluation approaches you will suggest to the University, what resources you will require from the University and how you will practically deploy these approaches, describing their advantages and disadvantages as well as the evaluation metrics you will use.

Which evaluation approach you would recommend to the University and why? [10]

SECTION B

(a) In the classical offline retrieval evaluation method, given a corpus of documents and a set of queries, the assessors judge the relevance of each document in relation to each query. The relevance of a given document is also assessed independently from other documents. Apart from the cost of developing a test collection, using suitable examples, briefly describe the three other main limitations of this type of evaluation. [3]

(b) Consider the following TF*IDF ranking formula:

where ct(w,d) is the raw count of word w in document d (in other words, the term frequency TF of w in d), and IDF(w) is the IDF of word w.

(i) Provide three reasons why the ranking formula above is unlikely to perform well. [3]

(ii) How would IDF(w) change (i.e., increase, decrease or stay the same) in each of the following cases: (1) adding the word w to a document; (2) making each document twice as long as its original length by concatenating the document with itself; (3) Adding some documents to the collection. [4]

(c) Suppose that on a four-point relevance scale, we give a 0 score for a non-relevant document, 1 for a partially-relevant document, 2 for a relevant document, and 3 for a perfect document. Suppose also that a query q returns five results doc1, doc7, doc20, doc15, doc2 (in that order) that are assessed as: perfect, relevant, non-relevant, perfect, and relevant, respectively.

Assume that 1/log2 (rank) is used as the discount factor. You might wish to note that:

log2 2 = 1; log2 3 = 1.59; log2 4 = 2; log2 5 = 2.32.

(i) Provide the ideal DCG values for the perfect ranking for the given query q. [2]

(ii) Compute the Normalised Discounted Cumulative Gain (NDCG) at rank 5 for the given query q (provide NDCG with 2 decimal places). Show your calculations. [3]

(d) Consider a query with two terms, whose posting lists are as follows:

term1 -> [id=3, tf=2], [id=8, tf=1], [id=12, tf=1]

term2 -> [id=3, tf=5], [id=12, tf=4]

Explain and provide the exact order in which the above two posting lists will be traversed by the TAAT & DAAT query evaluation strategies and the memory requirements of both strategies for obtaining a result set ofK documents from a corpus of N documents (K<N). [5]

(e) Twitter is experiencing an increasing rate of queries during the COVID-19 pandemic, and seeks to provide an effective search functionality that better serves its customers’ information needs. Within Twitter, some data scientists are still contending that classical IR techniques such as TF*IDF or BM25 are holding well and they are sufficient as a basis to the search functionality so there is no need for them to do more work. Other researchers within the company are advocating a new ranking functionality based on learning to rank. Twitter decides to recruit you as an IR consultant to help convince the data scientists that it is time to move to learning rank and to help deploy the new search functionality. Discuss how you would convince the Twitter data scientists to move to learning to rank.

As part of your answer (750 word max):

- Describe the major technical challenges for the deployment of a search functionality for Twitter.

- What has to be changed in a classical TF*IDF-based system and why?

- What type of learning to rank models will you use and why?

- Provide three types of features the system should use to increase the effectiveness of searching for tweets, and illustrate your answer with suitable examples.

- What other functionalities might you advise the company to add to their Twitter search functionality to support information seeking in relation to COVID-19 and why? [10]

软件开发、广告设计客服