首页
网站开发
桌面应用
管理软件
微信开发
App开发
嵌入式软件
工具软件
数据采集与分析
其他
首页
>
> 详细
Python程序辅导、讲解Python编程、data留学生程序讲解 辅导留学生Prolog|解析Java程序
项目预算:
开发周期:
发布时间:
要求地区:
3. Key tasks
● Task 1: you will be given the opportunity to apply the concepts of classes andmethods in Python; in particular you are required to define two user-defined data types – String and List – as a form of class.
● Task 2 : you will be given the opportunity to manipulate data from text files, to conduct some basic data analysis using Python external packages.
3.1. Task 1
3.1.2. Instructions & Requirements
In this task you will be assessed on how to implement two user-defined data types – String and List – and the associated methods for each class that are useful for data processing. You are required to implement the required methods for each class with your own algorithms without utilising any of the built-in functions provided by Python. The data stored in these data types are organised in the array-based structure.
Part A: The String Class
In part A, you are required to implement a number of user-defined methods that are useful for processing or manipulating strings (data in textual form). Although Python has provided a good collection of string methods (that you would have used in the implementation of your first assessment), the purpose of part A is to assess your programming knowledge in developing algorithms for handling strings.
Your task is to create a Python class that contains a collection of methods defined for manipulating strings. An attribute or instance variable is required for this String class to represent each individual string data. It should be stored in an array-based structure. For the purpose of this part, you should define this instance variable by using a Python list to represent each string data. (With this implementation, we are assuming that each character in a string is represented as one element in the Python list.)
The following is a list of methods that can be applied on the strings represented by this String class. You are required to implement each of them. We have suggested the method header with the method name as well as the argument(s) that each method requires.
1. __init__(self, str_value) : This is the constructor method that is required for creating string objects from this String class. It takes the Python string ( str_value ) as an argument and assigns it to an instance variable defined by a Python list as mentioned above. (You may name the instance variable as str_data .)
2. search(self, target_char) : This method checks whether a specific character ( target_char ) exists within the string. So, as long the character exists in the string (i.e. once you have found its first occurrence), return a True value; otherwise a False value. You are expected to implement this using the “linear search” algorithm.
3. frequency(self, target_char) : This method performs in the similar way as the search method, except that it returns the number of occurrences of the specific character ( target_char ) in the string.
4. replace(self, target_char, new_char) : This method will search for the specific character ( target_char ) within the string and replace all its occurrences with a new character ( new_char ).
5. lowercase(self) : This method converts or normalises the string into lower cases. No arguments are required for this method. (Note that you should not attempt to use the built-in lower() method provided by Python.)
6. uppercase(self) : Similar to the lowercase method, this method converts the string into upper cases. (Again, you should not attempt to use the built-in upper() method provided by Python.)
7. tokenise(self, the_delimiter) : This method tokenises/splits the string based on a specific character, referred as the delimiter (the_delimiter). The delimiter could be a space character “ ” or a punctuation mark “,”. This method will return a list of tokens (sub-strings) where each of the tokens can be represented as either an individual Python list or an object of this String class.
8. __eq__(self, other) : This is one of the overloaded methods in Python that enables us to check for equality. Here, we want to compare whether the data represented by the argument other is the same as the data represented in the current string object referred by self . (Note that you will have to first ensure that the argument other is an object of this String class before attempting to compare the contents of the two objects.)
9. __str__(self) : This is another overloaded method that is useful for formatting the output of the string data represented in this String class. Re-build the instance variable ( str_data ) into a Python string and return it.
You should name the Python class as StringClass and save the Python source file as Task1_PartA.py .
Part B: The StringList Class
In part B, you are required to implement another class that is responsible for handling a collection of strings, which is essentially a List abstract data type (ADT) that we have discussed. For the purpose of this part, we are (again) using the Python list to represent the string collection as an array-based structure. This means that each element in the Python list is holding an object from the String class defined in Part A. As such, we need to define a Python list as the instance variable (or attribute) for this StringList class that we are going to implement. You are required to implement each of the following methods for this StringList class. The method header as well as the argument(s) needed by each of the methods are suggested for your consideration.
1. __init__(self, size) : This is the constructor method that is required for constructing an initial empty list which will hold a collection of objects from the String class. You may name this instance variable as str_list . Given that we are adopting an array-based structure for the implementation, you should decide on an initial size for the collection, as indicated by the argument size.
2. add(self, new_item) : This method will add a new item which is a StringClass object to the collection represented in this StringList class (str_list). For the purpose of this task, the new item should just be added to the end of the collection. (Note that duplication of an existing item is allowed here; and you should not attempt to use the built-in append() method provided by Python.)
3. remove(self, target_item) : This method will remove all the occurrences of the specific item (as indicated by target_item ) from the string collection represented in this StringList class ( str_list ).
4. search(self, target_item) : The method is for searching for a specific item ( target_item ) in the string collection ( str_list ). Again, so long as an occurrence of the target item is encountered, return a True value; otherwise a False value. For the implementation of this method, you are required to use the “linear search” algorithm. (Note that the assumption here is that the string collection ( str_list ) has been sorted before this method can be applied to perform the search.)
5. __len__(self) : This is another overloaded method that is commonly implemented to return the number of items in the collection. (Again, you should not attempt to use the built-in len() method provided by Python.)
6. __str__(self) : This is again the overloaded method that is useful for formatting the output of the string collection represented in this StringList class. Construct a Python string which organises each item from the string collection ( str_list ) as a separate line in the output.
You should name the Python class as StringListClass and save the Python source file as
Task1_PartB.py .
Part C: Creating Instances
The final part of this task, you will be assessed on how to make use of the two user-defined classes implemented in the first two parts (i.e., Part A and Part B).
The task here is to construct a Python program for creating instances or objects of each class by importing the two classes. You should attempt to apply “all” the methods defined for each class on the corresponding objects to “test” the implementation for each of the methods. Note that the design and organisation of the program for this task is of your own decision.
You should name your program for this last part as Task1_PartC.py .
3.2. Task 2
3.2.1 Instructions & Requirements
Building upon the programming knowledge and skills acquired from the Task 1, you will be assessed in this Task on how to conduct pre-processing and formatting tasks on the datasets given as a form of the text-based file format. You are required to perform some basic data analysis on the cleaned datasets (i.e. after pre-processing) by adopting the external Python packages (such as NumPy, SciPy, Pandas, and Matplotlib). In addition, you are also required to manage your programs by handling any potential errors or exceptions.
The Dataset: Conti-Ramsden 4
Before you get started with any of the programming tasks, you should read through the description of the dataset that we will be using for the purpose of this assignment. The dataset is known as the Conti-Ramsden 4 Corpus which is a collection of narrative transcripts gathered for a clinical study carried out in the United Kingdom, to study children with language disorders. Two sets of data were collected: the first set is from children diagnosed with Specific Language Impairment (SLI) – one type of language disorders; and the second set is from children with the typical development (TD). A subset of the original corpus is used in this assignment with 10 selected transcripts for each group of children.
Each of the narrative transcripts is a record of the story-telling task performed by each child (for both groups), under the supervision of an investigator. The story is based on the wordless 24-picture storybook authored by Mercer Mayer, ‘ Frog, where are you? ’. Below is an excerpt extracted from the transcript produced by a SLI child.
You should note that there are many details recorded in each of these transcripts. However, for the purpose of this assignment, the data required for processing and analysis is the narrative produced by the children, which are the statements (or lines) indicated by the label of ‘ *CHI: ’ in the transcripts (as highlighted in the excerpt). As a side note, the format of the transcripts is based on the CHAT Transcription Format. You may want refer to the manual [ http://talkbank.org/manuals/CHAT.pdf ] for the explanation of the various
CHAT symbols, such as [//], [/], [*], (.), <…>, etc. ( Note: Please download the dataset (linked below under Assessment Resources ) before attempting the following parts. The SLI transcripts are organised under the folder ‘SLI’ and the TD transcripts are under the folder of ‘TD’.)
Part A: Handling with File Contents and Preprocessing
In this part, you will begin by reading in all the transcripts of the dataset given, both the SLI and TD groups. We will then conduct a number of pre-processing tasks to extract only the relevant contents or texts needed for analysis in the subsequent part (i.e. part 2). Upon completing the pre-processing tasks, each of the cleaned transcripts should be saved as an output file. This would be a more efficient approach whenever we need to manipulate the cleaned dataset without having to repeat the pre-processing task.
As mentioned earlier, for the purpose of this assignment, the data required for processing and analysis is the narrative produced by the children, which are the statements (or lines) indicated by the label of ‘ *CHI: ’ in the transcripts. The first step is that, for each original transcript, extract only the statements which are prefixed or begin with ‘ *CHI: ’. (Note that there are some statements that extend to the next line, you should ensure that you take those into account.)
The next step is to perform a set of pre-processing or filtering tasks. We want to remove certain words (generally referred to as tokens) in each statement that have the CHAT symbols as either prefixes or suffixes, but retaining certain symbols and words for analysis in part 2. For this part of the implementation, you should consider splitting each statement into a list of words or tokens before you begin with the filtering process. Below is a list of symbols that you should filter off from each of the child statements extracted.
● Remove those words that have prefixes of ‘ & ’ or ‘ + ’
○ Example:
Before filtering: *CHI: and he fell out into &-er the window .
After filtering: and he fell out into the window .
● Remove those words that have either ‘ [’ as prefix or ‘] ’ as suffix but retain these three symbols: [//] , [/] , and [*]
○ Example:
Before filtering: *CHI: and he's [/-] the jar smashes [//] smashed
After filtering: and he's the jar smashes [//] smashed
● Retain those words that have either ‘ <’ as prefix or ‘> ’ as suffix but these two symbols should be removed
○ Example:
Before filtering: *CHI:
[//] he goes to bed .
After filtering: they were [//] he goes to bed .
● Retain those words that have either ‘ (’ as prefix or ‘) ’ as suffix but these two symbols should also be removed
○ Example:
Before filtering: *CHI: a + boy an(d) a dog h(ad) (.)
After filtering: a + boy and a dog had (.)
Note: The ‘(’ and ‘)’ symbols could appear as an infix , i.e. appears in between a word – you should also remove it from the word or token that it is attached to. But, you should not remove the symbols of (.), (..), and (...) as these symbols should be retained for data analysis. Also note that for the symbol ‘+’ that appears in between two words, it should be retained (i.e. do not remove that symbol).
Important: You should also take note of the following additional requirements.
1. Pauses:
○ In addition to (.), there are two other symbols indicating longer pauses: (..) and (...). Please don't remove these symbols.
2. Linked words:
○ An underscore is used '_' to indicate linked words. These words should be retained, thus no processing is needed.
3. Special form markers:
○ You may encounter words ending with the '@' symbol. Just retain these words, no processing is needed. (Refer to CHAT Manual Section 8.3 for more details.)
4. Special utterance terminators:
○ These special terminators should have been removed when you attempt to remove words prefixed with '+'. Hence, the statement delimiters are restricted to either a full stop ‘.’, a question mark ‘?’, or an exclamation mark ‘!’.
5. Nested symbols:
○ You may also want to pay attention to words that have more than one
symbol, e.g.
.
Finally, once you have completed the filtering process for all the unwanted symbols, you should now save each of the cleaned child transcripts as an output file. You should produce a separate output file for each cleaned transcript and each statement should be written as a separate line in the output file. You may also want to organise your cleaned dataset into two groups: save the cleaned SLI transcripts under a folder named ‘SLI_cleaned’, and the cleaned TD transcripts under another new folder named ‘TD_cleaned’.
One additional requirement for the implementation for this first part is that you should handle any potential errors or exceptions that might occur by implementing the appropriate handling code. You should consider using the try-except clauses and/or the assert statements.
You should name your program for this first part as Task2_PartA.py .
Part B: Working with Basic Data Analysis
In this part, we are going to perform some basic data analysis by using some of the external Python packages (such as NumPy, SciPy, Pandas, and Matplotlib). The main task is to produce a number of statistics for the two groups of children transcripts. The statistics might serve as good indicators for distinguishing between the children with SLI and the typically developed children.
Amongst the statistics of each child that we are interested in are the following:
● Length of the transcript – indicated by the number of statements
● Size of the vocabulary – indicated by the number of unique words
● Number of repetition for certain words or phrases – indicated by the CHAT symbol [/]
● Number of retracing for certain words or phrases – indicated by the CHAT symbol[//]
● Number of grammatical errors made – indicated by the CHAT symbol [*]
● Number of pauses made – indicated by the CHAT symbols (.)(..)(...)
(Note: Given that the length of each child transcript is indicated by the number of statements, the end of each statement can be determined based on the pronunciation marks of either a full stop ‘.’, a question mark ‘?’, or an exclamation mark ‘!’.)
To begin with the implementation, you should first read in the cleaned dataset that you have prepared from part A. Implement a program that, for each of the cleaned child transcripts from both groups (SLI and TD), extract the count for each of the statistics mentioned above. You should carefully consider a suitable data type or data structure from either Pandas or Numpy for the representation of the statistics extracted for each child group in your program. (Note that the data type/data structure chosen should allow you to represent the statistics as a tabular format, where the columns denote the statistic types and the rows denote the statistic counts of each child transcript.) Then, by using the functions provided by Matplolib, create a visualisation to present these statistics for each
child group.
In addition, you should produce the average or mean of these statistics for the two groups, and plot another graph to demonstrate the mean difference for each of the statistics considered. (You may want to consider the functions of Pandas for this part of the implementation.)
You should name your program for this second task as Task2_PartB.py . Also, you should save all the graphs produced in this part for the submission.
软件开发、广告设计客服
QQ:99515681
邮箱:99515681@qq.com
工作时间:8:00-23:00
微信:codinghelp
热点项目
更多
代做ceng0013 design of a pro...
2024-11-13
代做mech4880 refrigeration a...
2024-11-13
代做mcd1350: media studies a...
2024-11-13
代写fint b338f (autumn 2024)...
2024-11-13
代做engd3000 design of tunab...
2024-11-13
代做n1611 financial economet...
2024-11-13
代做econ 2331: economic and ...
2024-11-13
代做cs770/870 assignment 8代...
2024-11-13
代写amath 481/581 autumn qua...
2024-11-13
代做ccc8013 the process of s...
2024-11-13
代写csit040 – modern comput...
2024-11-13
代写econ 2070: introduc2on t...
2024-11-13
代写cct260, project 2 person...
2024-11-13
热点标签
mktg2509
csci 2600
38170
lng302
csse3010
phas3226
77938
arch1162
engn4536/engn6536
acx5903
comp151101
phl245
cse12
comp9312
stat3016/6016
phas0038
comp2140
6qqmb312
xjco3011
rest0005
ematm0051
5qqmn219
lubs5062m
eee8155
cege0100
eap033
artd1109
mat246
etc3430
ecmm462
mis102
inft6800
ddes9903
comp6521
comp9517
comp3331/9331
comp4337
comp6008
comp9414
bu.231.790.81
man00150m
csb352h
math1041
eengm4100
isys1002
08
6057cem
mktg3504
mthm036
mtrx1701
mth3241
eeee3086
cmp-7038b
cmp-7000a
ints4010
econ2151
infs5710
fins5516
fin3309
fins5510
gsoe9340
math2007
math2036
soee5010
mark3088
infs3605
elec9714
comp2271
ma214
comp2211
infs3604
600426
sit254
acct3091
bbt405
msin0116
com107/com113
mark5826
sit120
comp9021
eco2101
eeen40700
cs253
ece3114
ecmm447
chns3000
math377
itd102
comp9444
comp(2041|9044)
econ0060
econ7230
mgt001371
ecs-323
cs6250
mgdi60012
mdia2012
comm221001
comm5000
ma1008
engl642
econ241
com333
math367
mis201
nbs-7041x
meek16104
econ2003
comm1190
mbas902
comp-1027
dpst1091
comp7315
eppd1033
m06
ee3025
msci231
bb113/bbs1063
fc709
comp3425
comp9417
econ42915
cb9101
math1102e
chme0017
fc307
mkt60104
5522usst
litr1-uc6201.200
ee1102
cosc2803
math39512
omp9727
int2067/int5051
bsb151
mgt253
fc021
babs2202
mis2002s
phya21
18-213
cege0012
mdia1002
math38032
mech5125
07
cisc102
mgx3110
cs240
11175
fin3020s
eco3420
ictten622
comp9727
cpt111
de114102d
mgm320h5s
bafi1019
math21112
efim20036
mn-3503
fins5568
110.807
bcpm000028
info6030
bma0092
bcpm0054
math20212
ce335
cs365
cenv6141
ftec5580
math2010
ec3450
comm1170
ecmt1010
csci-ua.0480-003
econ12-200
ib3960
ectb60h3f
cs247—assignment
tk3163
ics3u
ib3j80
comp20008
comp9334
eppd1063
acct2343
cct109
isys1055/3412
math350-real
math2014
eec180
stat141b
econ2101
msinm014/msing014/msing014b
fit2004
comp643
bu1002
cm2030
联系我们
- QQ: 9951568
© 2021
www.rj363.com
软件定制开发网!