COMP2271-WE01
DATA SCIENCE
2022
Section A Probability
Question 1
(a) The 26 letters of the alphabet (a,b,. . . ,z) are arranged in random order to produce a string S of length 26 (each permutation is equally likely).
i. Let A denote the event that S contains the word “durham” . Calculate the probability P(A). [3 Marks]
ii. Let B denote the event that S contains the word “end” . Calculate the probability P(B). [2 Marks]
iii. Calculate the conditional probability P(B | A). [3 Marks]
iv. Are the events A and B independent? Justify your answer. [2 Marks]
(b) Log-ins at a server S1 can be modelled as a Poisson process with an arrival rate of 12 log-ins in 24 hours.
i. Let A be the event that on a particular day there will be no log-in during 1pm-2pm. Calculate P(A). [2 Marks]
ii. Let B be the event that on a particular day there will be two log-ins during 2pm–4pm. Calculate P(B). [2 Marks]
Another server S2 has 20 regular users. Each of those users logs in exactly once on each day, at a time between 0:00 and 23:59.59 chosen uniformly at random, and nobody else uses the server S2 .
iii. Let C be the event that on a particular day there will be exactly two log-ins during 2pm–3pm on this server S2 . Calculate P(C). [3 Marks]
iv. Let X be a random variable that denotes the number of log-ins dur- ing 2pm–3pm on sever S2 . Calculate the expectation, variance and standard deviation of X . [3 Marks]
(c) The joint density function of two continuous random variables X and Y is given by
i. Determine the value of c. [2 Marks]
ii. Determine the marginal densities fX of X and fY of Y. [4 Marks]
Note: If you need to refer to a Z-table to answer the following questions, you can find one on the following pages.
(d) The annual salaries of data scientists in the UK follow a normal distribution with standard deviation σ = 15,000. We choose a simple random sample of n data scientists and record their annual salaries.
i. Describe how to determine a two-sided confidence interval [a,b] so that, with 99% confidence, the true mean of the annual salaries of data scientists in the UK lies in the interval. [4 Marks]
ii. How large does the sample size n need to be at least if we want the confidence interval (with 99% confidence as in (i)) to have length at most ε5,000. [4 Marks]
iii. We choose a sample size of n = 50, and the mean of the salaries in the sample is ε52,000. Determine a one-sided confidence interval I = (−∞, b] that, with 99% confidence, contains the true mean of the annual salaries of data scientists in the UK. [3 Marks]
(e) It is commonly accepted that 10% of the users of a social media platform believe in conspiracy theories. We suspect that the proportion is actually higher. To investigate this further, we carry out a hypothesis test. We interview 100 randomly selected users and record whether each of them believes in conspiracy theories or not. We find that 15 of the 100 users in the sample believe in conspiracy theories.
i. Formulate a suitable null hypothesis and a suitable alternative hy- pothesis. [2 Marks]
ii. Is the test a two-tailed, left-tailed or right-tailed test? [1 Marks] iii. What is the value of the test-statistic z that we should calculate from the sample data? [2 Marks]
iv. What is the resulting p-value? [2 Marks]
v. If we choose a significance level of 0.01, how should we formulate the outcome of this hypothesis test? [2 Marks]
vi. Assume that the true proportion of users who believe conspiracy the- ories is 20%. What is the probability β of a Type II error for our test procedure? [4 Marks]
Section B Computer Graphics
Question 2
This question relates to rendering the steam train model as shown in Figure 1.
Figure 1: Steam train model.
(a) Assume that when the train translates, its wheels also rotate. Draw a scene graph for the train model, with the aim of simplifying the graph hierarchy. Marks will be given based on:
i. Correct structure and organisation of model parts. [8 Marks]
ii. Correct transformation operations involved. [6 Marks]
(b) Assume that drawbox() and drawcircle() are given functions for you to render a box and a circle with normalised dimensions, respectively. Write a WebGL code segment based on the scene graph in (a). Marks will be given based on:
i. Correct overall program structure. [5 Marks]
ii. Correct usage of WebGL statements to model train parts and support train motion. [9 Marks]
(c) Suppose you have put 10000 steam train models spreading around a 3D virtual environment for rendering through scan-conversion. When visualis- ing the virtual environment, a user is expected to see some trains moving closely to the user while other trains may be far away from the user or temporarily out of the user’s sight.
i. Which component of the scan-conversion rendering pipeline contributes the most in supporting interactive rendering of the virtual environ-ment? Justify your answer. [5 Marks]
ii. Analyse why it may still be difficult to render the virtual environment interactively in practice. Suggest a solution by modifying the im-plementation of the virtual environment to significantly improve its rendering speed. [7 Marks]
(d) To enhance rendering quality of the virtual environment in (c), both direc- tional lighting and normal mapping are applied.
i. Suggest which shader is suitable for implementing directional lighting. Justify your answer. [4 Marks]
ii. Analyse if applying directional lighting is sufficient to support nor- mal mapping. If yes, justify your answer. Otherwise, suggest with explanation whether any extra lighting(s) is/are required. [6 Marks]