首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
STSCI 4060代做、代写Python设计程序
项目预算:
开发周期:
发布时间:
要求地区:
STSCI 4060/5045 Final Project
(Due: 4:30 PM, May 16, 2024)
Important: Read and follow this whole document carefully!
How to submit: submit your project report to the Canvas course website with a single zip file,
which combines all your files.
General instructions:
• Do your own work. Any cheating behavior (for example, submitting code similar to
that of other student(s), copying code from an Internet source, etc.) may result in a
serious consequence (e.g., getting zero points, failing the class, …). If you have a
question about the project, you should directly email your instructor.
• Start the project early. Programming is time consuming; you will need significant
amount of time and patience to code some portions of the project. Do not expect to
finish it on the due day.
• Test your code (especially the .cgi files) separately from other systems. When you have
multiple software systems connected, it is harder to debug.
• Add sufficient documentation to your code so that people understand your algorithm
and what your code does. This is a requirement of this project.
• Do not edit the raw data file in any way. Your results will be compared to the standard
solutions.
• Make sure that you have included all the components in your submission (see the
details at the end of this document on pages 3 and 4). Your grader will run your
programs on his/her computer; if something is missing your programs will not run.
In this project you will have an opportunity to integrate Python programming, Oracle database,
database-driven dynamic web pages, and Python data analysis modules with Jupyter (IPython)
notebook using the data that are processed with the above integration. You are given a raw
data file, honeybee_gene_sequences.txt, which was downloaded from the NCBI web site. We
dealt with the protein data in the class; however, genes are different kinds of biomolecules.
Unlike proteins that are composed of 20 amino acids, genes are only formed with four building
elements: adenine (A), cytosine (C), guanine (G) and thymine (T). They are called nucleotides, a
sequence of which forms a gene, which then determines the sequence of a protein. Thus, the
compositions of the nucleotides and their relative frequencies, especially the combined relative
frequency of C and G (i.e., the sum of the percentages of C and G in a gene sequence), have
important biological (or medical) meanings. For this project, you will do the following:
1. Design a web page (using KompoZer or another similar program) to allow a user to enter
a file name (here honeybee_gene_sequences.txt) and the full path to the location where
the file is stored so that the user can upload the data file by clicking the Submit button
on the web page.
2. Write a specific .cgi file with Python to accept the user input from the web page, process
the data and store the processed data in an Oracle database table, which is also created
within the .cgi file using the Python-Oracle integration approach. In this .cgi file, you
need to at least include the following functions:
A. The main() function to receive the user input from the web page.
B. The processInput() function to do the following:
a) Read in the contents of the data file.
b) In order to extract the right nucleotide (or gene) sequences for all
possible cases (you can see that most times the nucleotide sequences
start right after the substring, mRNA, but not always), you are required to
insert the substring, _**gene_seq_starts_here**_, right before the
nucleotide sequences of every bee gene (or entry) through Python
programming when you read in (or process) the raw data line by line. In
this way, you will use the _**gene_seq_starts_here**_ substring as the
starting point to extract the nucleotide sequences later. Note: There are
different ways to extract the genes from the raw data. For the
requirement specified above, you should just treat it as a programming
requirement of this project.
c) Extract the gi number and nucleotide sequence of each gene (or entry).
d) Make sure that your Python program correctly reads in the gene (or
nucleotide) sequence of the last entry in the raw data file.
e) Calculate the relative frequencies of each nucleotide in every gene.
f) Calculate the combined relative frequency of the nucleotides G and C,
freq_GC, which is obtained by adding the relative frequencies of G and C.
g) Connect Python to the Oracle database system.
h) Create an Oracle table called beeGenes to store gi numbers, nucleotide
sequences, the relative frequencies of the four nucleotides and the
combined relative frequencies of the nucleotides G and C, freq_GC. So,
your beeGenes table has seven columns.
i) When you write the data to the database table, you are required to use
the Oracle bind variable approach and the batch writing method by
setting the bindarraysize to a certain number (refer to the lecture slides if
needed).
j) In order not to truncate any gene sequence, you need to find an
appropriate number for the sequence input size. Thus, you are required
to write a separate Python program (which should also be submitted for
grading) to determine the maximum number of nucleotides of all the
genes in the data file.
C. fileToStr() to return a string containing the contents of the named html file.
D. makePage() to make the final formatted string (or webpage) for displaying on a
web page.
3. Design a template web page to acknowledge that the uploading process was successful
and that the data were processed and stored in the database as planned. There is a
button on which a user can click if the user wants to see some results, retrieved from
the Oracle database table you just created.
4. Code another .cgi file with Python to retrieve data from the database table (beeGenes).
The functions you need are similar to those in the previous .cgi file, but in the
processInput() function, you are required to use a Python dictionary and the format
string mechanism when you extract data from beeGenes. In this function, you will run
queries against the beeGenes table to find the gi numbers of those bee genes that have
the highest relative frequencies of nucleotide A, C, G, or T so that you can display these
on the final web page when the user clicks the “Click to See Some Result” button on the
confirmation page of data submission. Note that you may have a situate when multiple
genes meet the same condition. Your code should take care of this kind of situation
automatically. When that happens, you must list all the gi numbers in the same cell of
your webpage table, with one gi number per line.
5. Design another template web page to display the results gathered from the database.
Inserting a hyperlink of the nucleotides to another web page is optional.
6. You use the local server to run all the web services in this project, using port number
8081.
7. Write a Python program to run a query against the Oracle table beeGenes to show that
you earlier successfully extracted the gene sequence of the last entry of the raw data
file. To do so, you run a query for the gene sequence by providing the related gi number,
which is 147907436. Include both your Python code and the query result in your report.
8. Connect Python to the Oracle database and conduct a K-Means cluster analysis in a
Jupyter notebook. You should only use three columns in the beeGenes table: freq_A
(relative frequency of the nucleotide A), freq_T (relative frequency of the nucleotide T)
and freq_GC for this analysis due to some biological reasons.
In your Jupyter notebook, you should use three cells: the 1st
cell is for importing all
the necessary Python modules for this analysis; the 2nd cell is to connect Python to
your Oracle database and create a numpy array containing the three columns of
data that are read from the beeGenes table in your Oracle database; and the 3rd cell
is for carrying out the K-Means analysis and plotting a 3D scatter plot using the three
columns of data based on the clusters identified by the K-Means analysis.
The K-Means settings are: n_cluster=7, init='random', n_init=10, max_iter=500,
tol=1e-4, and random_state=0. Then, you create a scatter plots with a total figure
size of 14X14. Use the same type of marker ('o') for all the clusters, set s to 20, set
labels to "Cluster 1" to "Cluster 7" for the cluster values of 0 to 6 that are found by
the K-Means algorism, respectively. Set the colors as follows: red for Cluster 1, blue
for Cluster 2, aqua for Cluster 3, black for Cluster 4, purple for Cluster 5, magenta for
Cluster 6, and green for Cluster 7.
Mark the centroid of each cluster with a star: set s to 100, color to red and label to
Centroids. Give the title "K-Means" to the plot. The legends should be displayed in
the upper right corner of the plot.
After your code works correctly, run all the cells in your Jupyter notebook at once.
Submit the notebook file (.ipynb) and an HTML file of the same notebook (.html).
Your report should at least contain the following items: all your code, outputs and screenshots,
which must be combined into a single PDF file, arranged in the order they appear in the project.
You must mark all your items clearly. Moreover, your Python and html program files must be
submitted as separate files, which must be kept in the same folder (no subfolders) so that your
grader can run your programs easily. The following is a detailed list of the files/items to submit.
• All Python program files (with the .py extension), including the program to find the
maximum number of nucleotides in a gene sequence and the program to query the
database to confirm that you successfully extracted the gene sequence of the last
entry of the raw data file.
• All .cgi files, which are technically Python files but contain the .cgi extension.
• All .html files, including the template and non-template .html files.
• The design window of your input web page.
• The design windows of your two template web pages.
• A screenshot of your input web page with the input value entered.
• A screenshot of your confirmation web page that displays that you have successfully
submitted the data, etc.
• A screenshot of your final web page that displays the results of database query
similar to the following screenshot (but it is only an example here, and the actual
results were erased).
• A screenshot of the local CGI server log.
• The result of Oracle table query for the gene sequence of the last entry, which
should be a Python shell screenshot (you may need more than one screen to display
the complete sequence).
• Your Jupyter notebook file (.ipynb).
• The Jupyter notebook HTML file (.html).
• The localCGIServer.py file.
• The raw data file, honeybee_gene_sequences.txt.
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做 program、代写 c++设计程...
2024-12-23
comp2012j 代写、代做 java 设...
2024-12-23
代做 data 编程、代写 python/...
2024-12-23
代做en.553.413-613 applied s...
2024-12-23
代做steady-state analvsis代做...
2024-12-23
代写photo essay of a deciduo...
2024-12-23
代写gpa analyzer调试c/c++语言
2024-12-23
代做comp 330 (fall 2024): as...
2024-12-23
代写pstat 160a fall 2024 - a...
2024-12-23
代做pstat 160a: stochastic p...
2024-12-23
代做7ssgn110 environmental d...
2024-12-23
代做compsci 4039 programming...
2024-12-23
代做lab exercise 8: dictiona...
2024-12-23
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!