代写Data science Assignment代写Processing

项目预算：开发周期：发布时间：要求地区：

Data science

Assignment

Due: 5pm EST, 2/21/2025

1. n-gram

Given the training data

<s> John read another book </s>

<s> I read a different book </s>

(a) Calculate bigrams using maximum likelihood estimates (MLE) and fill out the table.

Bigram

Probability

Bigram

Probability

P(John \| <s>)		P(another \| read)
P(read \| John)		P(book \| another)
P(a \| read)		P(</s> \| book)
P(book \| a)		P(I \| <s>)
P(by \| book)		P(read \| I)
P(Jane \| by)		P(different \| a)
P(<s> \| Jane)		P(book \| different)

(b) Calculate the sentence probability of <s> John read a different book </s> using only MLE bigram.

(c) Calculate the sentence probability of <s> Jane read a book </s> using only MLE bigram.

2. Evaluation metrics on binary classification

Given the following output,

Actual Label	Predicted Label
0	0
1	1
0	1
0	1
1	1
0	0
1	1
0	1
1	0
0	1

(a) Draw the confusion matrix.

(b) Calculate the Accuracy, Precision, Recall, and F1 score.

(c) Why might using accuracy as the only metric is not ideal?

3. Evaluation metrics on multiclass classification

Given the following confusion matrix of a multi-label classifier

Truth

	A	B	C	D	E	F
A	95	1	13	0	1	0
B	0	1	0	0	0	0
C	10	90	0	1	0	0
D	0	0	0	34	3	7
E	0	1	2	13	26	5
F	0	0	2	14	5	10

Classifier

(a) Calculate the precision, recall, and F1 for classes A-F

(b) Calculate the micro-average precision, recall, and F1

(c) Calculate the macro-average precision, recall, and F1

4. Text classfication

The drug review dataset provides patient reviews on drugs and a positive and negative rating reflecting overall patient satisfaction. The dataset consists of two files: drug review train .csv for training and drug review test .csv for testing. Both files contain plain-text, UTF8-encoded sample set in a tab-separated format with the following columns:

BernoulliNB	true positive	false positive	false negative	precision	recall	F1-score
positive
negative

SGDClassifier	true positive	false positive	false negative	precision	recall	F1-score
positive
negative