首页 > > 详细

代写Data science Assignment代写Processing

项目预算:   开发周期:  发布时间:   要求地区:

Data science

Assignment

Due: 5pm EST, 2/21/2025


1.  n-gram

Given the training data

<s>  John  read  a  book  by  Jane  </s>

<s>  John  read  another  book  </s>

<s>  I  read  a  different  book  </s>

(a)  Calculate bigrams using maximum likelihood estimates (MLE) and fill out the table.

Bigram

Probability

Bigram

Probability

P(John | <s>)

 

P(another | read)

 

P(read | John)

 

P(book | another)

 

P(a | read)

 

P(</s> | book)

 

P(book | a)

 

P(I | <s>)

 

P(by | book)

 

P(read | I)

 

P(Jane | by)

 

P(different | a)

 

P(<s> | Jane)

 

P(book | different)

 

(b)  Calculate the sentence probability of <s>  John  read  a  different  book  </s> using only MLE bigram.

(c)  Calculate the sentence probability of <s>  Jane  read  a  book  </s> using only MLE bigram.


2. Evaluation metrics on binary classification

Given the following output,

Actual Label

Predicted Label

0

0

1

1

0

1

0

1

1

1

0

0

1

1

0

1

1

0

0

1

(a)  Draw the confusion matrix.

(b)  Calculate the Accuracy, Precision, Recall, and F1 score.

(c)  Why might using accuracy as the only metric is not ideal?


3. Evaluation metrics on multiclass classification

Given the following confusion matrix of a multi-label classifier

Truth

 

A

B

C

D

E

F

A

95

1

13

0

1

0

B

0

1

0

0

0

0

C

10

90

0

1

0

0

D

0

0

0

34

3

7

E

0

1

2

13

26

5

F

0

0

2

14

5

10

Classifier

(a)  Calculate the precision, recall, and F1 for classes A-F

(b)  Calculate the micro-average precision, recall, and F1

(c)  Calculate the macro-average precision, recall, and F1


4.  Text classfication

The drug review dataset provides patient reviews on drugs and a positive and negative rating reflecting overall patient satisfaction.  The dataset consists of two files:  drug   review train .csv for training and drug   review test .csv for testing.  Both files contain plain-text, UTF8-encoded sample set in a tab-separated format with the following columns:

  Text

•  Binary label (0 and 1)

(a)  Use BernoulliNB to build a naıve Bayes classifier(¨).

BernoulliNB

true positive

false positive

false negative

precision

recall

F1-score

positive

 

 

 

 

 

 

negative

 

 

 

 

 

 

(b)  Repeat the process in Task (a), but use the SVM (SGDClassifier) model.

SGDClassifier

true positive

false positive

false negative

precision

recall

F1-score

positive

 

 

 

 

 

 

negative

 

 

 

 

 

 

(c)  Upload the source codes.


软件开发、广告设计客服
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 9951568
© 2021 www.rj363.com
软件定制开发网!