Multi-Word Units in Imaginative and Informative
Domains
Mustafa Aksan, Yeşim Aksan
Mersin University
1. Introduction
An important contribution of corpus analysis to study of language comes as the
identification of recurrent forms in language use. In other words, a corpus analysis
makes it possible to identify textually significant structures that function in the
overall organization of discourse from a collection of texts. Additionally, a corpus
analysis gives us quantitative data that would help derive qualitative conclusions in a
more concrete and reliable manner regarding defining characteristics of the text in
question.
Recent developments in corpus analysis tools brought new options in
identifying recurrent lexical structures and their distribution in the corpus. Further
analysis of lexical structures and their role in the textual organization required more
detailed study of the so-called multi-word units (MWUs). The issues discussed vary
from identification and definition of multi-word units in discourse to their role in
minimizing cognitive processing of information and their role in defining register
properties.
In this study, we will present our initial observations on emerging MWUs in
two specially designed corpora, which involve texts representing imaginative and
informative domains in Turkish. While the corpus representing imaginative domain
includes samples from fictional prose, the one built for informative domain
comprising samples from informative texts. Following a previous study on MWUs
(Biber, Conrad and Cortes, 2004), we will first describe structural patterns found in
both corpora and then we will present the functional types of multi-word expressions
as they appear in the specialized corpora. Our analysis will concentrate more on
quantitative aspects of the MWUs in two types of registers in Turkish and will
highlight their distributional differences. A more detailed analysis of the MWUs in
question and their specific discourse functions demands a different type of study.
2. What is a MWU?
In a very simple definition provided by Biber, Conrad and Cortes (2004:371) a
multi-word unit is “the most frequent sequences of words in a register.” In most
cases, a multi-word unit, which is named also as a lexical bundle, a chunk or an ngram, is not a complete grammatical unit. Usually, a MWU is part of a well-defined
2
grammatical phrase or a clause where some constituent of the phrase or clause is
missing. In other words, fragments of lexical sequences or syntactically incomplete
but meaningful strings are forming the MWUs (e.g. süre sonra ‘after time’, başta
olmak üzere ‘being in the first’) though semantically full expressions are also
automatically retrieved as lexical chunks (e.g. ne de olsa ‘after all’)
In the context of corpus analyses, the patterns of lexical structures are
interpreted in a special manner. In this sense, the choice of a lexical structure is not
merely determined via grammatical patterns in which a syntactic position is
determined by formal choices but rather their occurrence is governed by systematic
patterns of use. As Sinclair (1991:108) observes “By far the majority of text is made
of the occurrence of common words in common patterns, or in slight variants of
those common patterns. Most everyday words do not have an independent meaning,
or meanings, but are components of a rich repertoire of multi-word patterns that
make up a text. This is totally obscured by the procedures of conventional
grammar.”
Lacking a complete formal grammatical form, MWUs do not allow for
complete or compositional semantic interpretation. However, they have identifiable
discourse functions. As argued in most recent research, they are an important part of
the communicative repertoire of speakers and writers, even though they do not
correspond to the well-formed structures. For example, it is possible to decide on the
genre or register properties of a text or text excerpt by simply looking at the
recurrent MWUs, as they are prefabricated expressions specialized or
conventionalized to a particular type and are used over and over again (Wray &
Perkins, 2000; McCarthy & Carter, 2006; Hyland, 2008; OKeeffe, McCarthy &
Carter, 2007; Breeze, 2013). Thus, it turns out that different registers tend to rely on
different sets of lexical sequences.
As for the purposes of this study, there are three basic questions: 1. What are
the structural types of MWUs in Turkish? 2. What are functional categories of
MWUs? 3. Do MWUs distinguish one domain/register from another?
In our analysis we will mainly follow the approach presented by (Biber,
Conrad & Cortes, 2004). In simple terms, their analysis adopts a frequency
perspective. Upon identifying MWUs and their relative distribution in different text
types, they present how such recurrent lexical bundles through their distinct use
contribute the study of particular registers.
3. Data and method
To investigate the domain/register based use of MWUs, two equal size sub-corpora
covering a period of 20 years (1990-2009) were constructed from the databases of
Turkish National Corpus (TNC) (Aksan, Aksan, Koltuksuz et al., 2012). These are
Corpus of Contemporary Turkish Fiction (CCTF) and Corpus of Contemporary
Turkish Informative Prose (CCTIP). Including a wide range of texts through equally
sized samples ensures representativeness and balance of both corpora. CCTF is a 1
3
million-word corpus and it consists of samples from the novels and short stories of
contemporary Turkish authors. CCTIP on the other hand, contains text samples of
informative texts compiled from social sciences, applied sciences, world affairs,
commerce and finance, art, belief and thought, leisure. Both corpora include samples
taken from 200 different texts.
Ngram Statistical Package software tool (Banerjee & Pedersen, 2003) is used
to generate rank order frequency lists of MWUs, i.e., n-grams. The cut-off for
including n-grams in frequency list is set for 1 for a million words. For the
comparative and detailed analysis of n-grams across the registers, the cut-off is
determined as 100 for bi-grams and 14 for tri-grams. It means that the lists contain
bi-grams used at least 100 times per million; and tri-grams used at least 14 times per
million.
To identify and analyze functions of n-grams, concordance lines extracted
and sorted via AntConc 3.2.5 (Anthony, 2010) are examined. These concordance
lines show the extended discourse context of searched n-grams. N-grams being the
part of larger n-grams were ignored. For instance, ya da çok ‘or more’ is ignored
since it is the part of a four-word expression az ya da çok ‘more or less’. A total of
240 multi-word units of meaning, consisting of 130 bi-grams and 110 tri-grams were
analyzed.
4. Quantitative findings
The following tables show the result of the observed frequencies of distribution of
bi-grams and tri-grams across the imaginative and the informative domains. As is
seen in Table 1, the use of 2-word n-grams derived from two corpora is almost the
same (11%).
Table 1. Frequency of bi-grams in imaginative and informative domains
Domains
Frequency
%
Imaginative domain
106,673
11
Informative domain
107,389
11
The use of 3-word sequences in the informative domain is slightly higher
(6%) than the ones observed in the imaginative domain (4%).
Table 2. Frequency of tri-grams in imaginative and informative domains
Domains
Frequency
%
Imaginative domain
41,041
4
Informative domain
60,968
6
In terms of rank frequency, the 20 top-ranked bi-grams and tri-grams are
listed on the basis of their observed frequencies in the imaginative and the
4
informative domains. Table 3 and 4 below demonstrate similarities and differences
in the occurrence of n-grams.
Table 3. The 20 top-ranked bi-grams in imaginative and informative domains
Rank
Imaginative domain
Freq.
Informative domain
Freq.
1
bir şey ‘something’
1200
ya da
‘or’
1712
2
ya da ‘or’
1036
bir şey
‘something’
710
3
ben de ‘me too’
676
böyle bir ‘such a’
522
4
belki de ‘maybe’
622
ne kadar ‘how much’
497
5
ne kadar ‘how much’
608
ve bu
‘and this’
467
6
bir süre ‘for a while’
503
büyük bir ‘something big’
451
7
o da
‘s/he/it either’
484
başka bir ‘something other’ 432
8
başka bir ‘something
451
bir de
‘one more’
422
other than’
9
bir de
‘one more’
451
bir başka ‘another’
389
10
o kadar ‘that much’
451
yeni bir ‘something new’ 362
11
bir an
‘for a
426
bir süre ‘for a while’
350
moment’
12
bir gün ‘one day’
413
ben de
‘me too’
344
13
değil mi ‘isn’t it’
403
daha çok ‘much more’
337
14
bu kadar ‘this much’
400
o kadar
‘that much’
321
15
o zaman ‘then’
399
belki de ‘maybe’
313
16
büyük bir ‘something
372
o zaman ‘then’
302
big’
17
böyle bir ‘such a’
363
sonra da ‘and then’
300
18
her şey
‘everything’
361
gibi bir
‘something like’ 291
19
bir şeyler ‘something’
361
bir şekilde ‘in a way’
285
20
sonra da ‘and then’
353
bu arada ‘meanwhile’
278
The top two bi-grams are the same in the two domains. While 13 bi-grams
overlap in the 20 top-ranked list, 7 different bi-grams are observed in different rank
orders. Different bi-grams in each domain are indicated in bold typeface.
Table 4. The 20 top-ranked tri-grams in imaginative and informative domains
Rank Imaginative domain
Freq.
Informative domain
Freq.
1
bir kez daha ‘once more’
134
bir süre sonra ‘after a
134
while’
2
bir yandan da ‘and besides’
132
ne var ki
‘however’
103
3
başka bir şey ‘something
108
bir kez daha ‘one more
88
else’
time’
4
bir süre sonra ‘after a while’ 108
başka bir şey ‘something
86
else’
5
bir an önce ‘immediately’ 93
bir yandan da ‘and
78
besides’
5
6
63
ya da bir
7
‘nothing
exists’
her zamanki gibi ‘as usual’
61
68
8
ne yazık ki
61
9
ne de olsa
10
51
12
böyle bir şey ‘something
like this’
ama yine de ‘but
still/again’
ne var ki
‘however’
her ne kadar ‘no matter
how’
kısa bir süre ‘for a
moment’
ne olursa olsun ‘in any
case’
bir başka deyişle ‘in other
words’
ne yazık ki ‘unfortunately’
49
13
ya da bir
42
14
‘or a/ one
thing’
ne olursa olsun ‘in any case’
15
kısa bir süre
39
16
belki de bu
39
çok önemli bir ‘a very
important’
bu nedenle de ‘and
therefore’
her şeyden önce ‘first and
foremost’
ama yine de ‘but
still/again’
bir an önce ‘immediately’
17
en ufak bir
38
bir şey yok
38
18
işte o zaman
11
bir şey yok
‘unfortunately’
‘after all’
‘for a
moment’
‘maybe
this’
‘a smallest’
60
48
44
41
‘or a/one’
‘nothing
exists’
o kadar çok ‘that much’
73
62
60
59
58
47
45
43
39
‘now at this 36
38
time’
19
öyle değil mi ‘isn’t it so’
35
çok büyük bir ‘a very big’ 38
20
ne kadar çok ‘the more’
33
daha sonra da ‘and then’
37
12 tri-grams overlap in the top 20 in various rank orders and 8 different trigrams, displayed in bold typeface in Table 4, rank in various orders in imaginative
and informative domains.
5. Structural types of MWUs in Turkish
Our initial observations suggest that MWUs in Turkish are not very much different
than the MWUs identified in corpus studies in English (among many others see
Biber, Conrad & Cortes, 2004; Carter & McCarthy, 2006; Hyland, 2008). What
comes out frequency-driven analysis are mostly noun phrases or noun phrase (NP)
fragments, a similar situation with English (see Biber, 2009). The following types
that we have identified are almost exclusively NPs yet we have determined more
categories to underlie their special role in the text due to their respective frequencies.
For example, degree expressions and quantifiers as well as demonstratives are in fact
NP elements. Similarly, those that combine with conjunctions are also part of the
following NP or NP fragment. Furthermore, some n-grams appear with identical
6
items in alternative orders. As noted in previous studies, some bi-grams appear in
tri-grams or some trigrams are expansions of bi-grams (Cortes, 2004). The following
classifications reveal the structural types of n-grams retrieved from the corpora.
Type 1 bi-grams: (Generic) NPs or NP fragments
1a. Indefinite Article+Head Noun (full phrases): bir adam ‘a man’
1b. Quantifier+Head Noun (full phrase): her gün ‘everyday’
1c. Adjective/modifying expression+indefinite article (NP fragment): güzel
bir ‘something good’, önemli bir ‘something important’
1d. Postpositional phrases: bir anda ‘in a minute’, bir yandan ‘on one hand’
Type 2 bi-grams: Conjunctions with fragments of conjuncts
Type 2a. bi-grams: 1st conjunct fragment+conjunction: sonra da ‘and then’
Type 2b. bi-grams: conjunction+2nd conjunct fragment: ve bir ‘and a/one’
Type 2c. bigrams: Connectives: yazık ki ‘unfortunately’, biraz da ‘just a bit’
Type 3 bi-grams: Degree / quantification expressions
Type 3a. bi-grams: Degree expression+Adjective: çok önemli ‘very
important’ daha iyi ‘better’, en az ‘the least’
Type 3b. bi-grams: Quantifier+Degree/Degree+Quantifier: biraz daha ‘some’
Type 4 bi-grams: Postpositional Phrases/fragments
Type 4a. bi-grams: Full Postpositional Phrases: benim için ‘for me’
Type 4b. bi-grams: PP fragments: süre sonra ‘after time’
Type 1 tri-grams: NPs or NP fragments
Type 1a. tri-grams: Full NPs:
‘day by day’
Type 1b.tri-grams:NP fragments (Quantifier+Adjective+Indefinite article):
çok büyük bir ‘a very big’, en küçük bir ‘a smallest’
Type 2 tri-grams: Conjunctions
Type 2a. tri-grams: Conjuctions followed by phrases or fragments: ve bu
arada ‘and meanwhile’, ve sonra da ‘and then’
Type 2b. tri-grams: Conjunctions preceded by phrases or fragments: daha
önce de ‘as before’, diğer yandan da ‘and besides’
Type 2c. tri-grams: ya da forms: ya da başka ‘or another’, ya da daha ‘or
more’, ya da bir ‘or a/one’
Type 2d. tri-grams: Ne-forms: ne de olsa ‘already known’, ne var ki
7
‘however’, ne yazık ki ‘unfortunately’, her ne kadar ‘no matter how’
Type 3 tri-grams: Postpositions (with complements or fragments)
Type 3a. tri-grams: Full Phrases: her şeyden önce ‘first and foremost’, bir
süre sonra ‘after a while’, her zamanki gibi ‘as usual’
Type 3b. tri-grams: Postposition+fragments: gibi bir şey ‘something like’,
Type 3c tri-grams: Secondary (NPs with oblique cases): bir başka deyişle ‘in
other words’
Type 4 tri-grams: Olarak nominalizations
bir bütün olarak ‘as a whole’, bunun sonucu olarak ‘as a result of this’
Type 5 tri-grams: Clauses
bir şey yok ‘nothing exists’, bir şey var ‘something exists’
Excluding light verb constructions, one can rarely find a MWU with a verbal
element as a member of it. This is probably related to the nature of function words in
Turkish. Those that would appear with verb are generally bound affixes rather than
free words in their written forms. All forms of bi-grams and tri-grams are composed
of either entirely or partially with function words. Those that are not function words
undergo semantic bleaching and form non-compositional formulaic expressions.
6. Functional categories of MWUs in Turkish
Functions of n-grams and their sub-categories observed in fictional prose and
informative texts are determined on the basis of the classes proposed by Biber,
Conrad & Cortes, 2004; Cortes, 2004; Carter & McCarthy, 2006; Hyland, 2008. We
employ three primary function groups proposed by Biber, Conrad & Cortes (2004:
384) for MWUs in English and then extend the sub-categories in line with the
above-mentioned studies. Three major functions comprise referential expressions,
discourse organizers and stance expressions. Accordingly, referential expressions
make direct reference to physical and abstract entities to identify the entity or to
single out some particular aspects of the entity as important. Discourse organizers
show relationships between prior and coming discourse; and stance expressions
convey the writer’s attitudes and evaluations. In addition to these categories, ngrams serve a set of functions such as reporting and questioning typically found in
conversational interactions. We group these functions under the category of
conversational features. As noted by Biber, Conrad & Cortes (2004) single n-gram
can have multiple functions even in a single occurrence.
Bi-grams and tri-grams obtained in imaginative and informative texts are then
classified according to their functions performed in their extended contexts. Table 5
8
presents samples of the MWUs with respect to their major functional categories
along with their relevant sub-categories.
Table 5. MWUs classified according to their functions in context
Category
Sub-category
N-gram
Referential expressions
Time reference
daha sonra ‘later’
Place reference
bir yer ‘a place’
Person reference
ben de ‘me too’
Vague expression
gibi bir şey ‘something
like’
Quantification
çok daha fazla ‘a lot
more’
Description
iyi bir ‘something good’
Text organizers
Transitional signals
ne olursa
olsun ‘no
matter how’
Resultative signals
bunun sonucu olarak ‘as a
result of this’
Focusing signals
bu da ‘and this’
Framing signals
söz konusu olan ‘the
given’
Stance expressions
Epistemic stance
belki de ‘maybe’, belki de
en ‘maybe the most’
Conversational features
Interactional markers
öyle değil mi ‘isn’t it so’
Reporting
dedi kendi kendine ‘said
to himself/herself’
Questioning
var mı ‘is there’
From the frequency distribution of functions of bi and tri-grams, we observe
that referential expressions are the most frequent discourse function with 75% in the
use of bi-grams and 67% in tri-grams, as seen in Table (6) and (7). Tri-grams as text
organizers have slightly higher frequency (23%) when compared to the use of bigrams (19%) under the same function.
Table 6. Functions of bi-grams
Functions
Referential Expressions
Text Organizers
Stance Expressions
Conversational Features
Total
Frequency
135
35
2
7
179
%
75,41
19,55
1,11
3,91
100
9
Table 7. Functions of tri-grams
Functions
Referential Expressions
Text Organizers
Stance Expressions
Conversational Features
Total
Frequency
101
35
4
9
149
%
67,78
23,48
2,68
6,04
100
7. Domain/register specific MWUs
The observed frequencies of bi-grams and tri-grams in imaginative and informative
domains are found to be statistically significant via proportion test conducted by
Minitab 16. According to the proportion analysis, low ratio (between 0-30%)
between the uses of n-grams in both domains signals a difference. In other words, it
indicates domain specific preference of a multi-word unit. When the ratio between
the uses of n-grams across the domains is 50%, it marks that the occurrence of a
MWU in one domain is half of the other domain. When the ratio between the
deployments of n-grams is high (between 80-100%), it expresses a similarity in the
use of MWUs across the domains.
In this respect, we find out that diye düşündü ‘s/he thought like that’, diye
sordu ‘s/he asked like that’, biliyor musun ‘do you know’, sen de ‘you too’ are some
of the bi-grams that seem to reflect the characteristic properties of fictional prose
constituting the imaginative domain. Fictional prose includes direct/indirect speech
representation of characters to that effect a variety of bi-grams carry out the
discourse functions of reporting (example 1). In creating a fictional world,
pronominal reference is essential part of thematic development. While there is a high
number of person reference in fictional prose with 240 occurrences, 2-word strings
in the informative texts are utilized less, just 66 times, to maintain person reference
as seen in Table 8.
Table 8. Bi-grams specific to imaginative domain
N-gram
ImaginativeFrequency
diye düşündü
‘s/he
137
thought’
diye sordu ‘s/he asked’
234
biliyor musun ‘do you
128
know’
sen de ‘you too’
240
InformativeFrequency
0
%
0.00
34
25
0.15
0.20
66
0.28
(1) “Nişanlı değil miyiz biz? Zaten evlenecek değil miydik,” diye sordu. (Ayşe
Kulin-Gece Sesleri)
10
“He asked “Aren’t we engaged? Weren’t we already to marry?””
There are cases in which the deployment of a MWU in one domain is half of
the figure in the other domain. The figures in the following table are interpreted as
such. For instance, while the bi-gram bir şekilde ‘in a way’ is used 100 times in the
informative domain, it is used 50 times in the imaginative domain. Table 9 lists
some of the relevant examples.
Table 9. Bi-grams representing the half of the figure across the domains
N-gram
ImaginativeInformative%
Frequency
Frequency
belki de ‘maybe’
622
313
0.50
yine
de
‘even
324
163
0.50
though’
bir anda ‘in a
155
78
0.50
moment’
bir şekilde ‘in a
142
285
0.50
way’
(2) Adam bu yazıyı belki de, intihara karar vermeden önce yazıhanesine
bırakmıştı... (Erhan Bener-Gece Gelen Ölüm)
“It also maybe the situation that the man had left that writing in his office
before he decided to commit suicide.”
In some other cases, imaginative and informative domains exhibit similarity
in use of the bigrams. Some of these bi-grams are maintained in Table 10. Examples
(3) and (4) are the excerpts taken from the Corpus of Contemporary Turkish Fiction
and Corpus of Contemporary Turkish Informative Prose respectively.
Table 10. Bi-grams similar across the domains
N-gram
ImaginativeInformativeFrequency
Frequency
gibi bir ‘like a’
301
291
ama bu ‘but this’
217
210
ve o ‘and that’
133
131
kısa bir ‘a short’
116
114
%
0.97
0.97
0.98
0.98
(3) Buna bile bir diyeceğim olmaz. Ama bu bina Hürriyet'te çalışanlara
servet filan kazandırmamıştı.
(Azize Bergin-Babalide Topuk Tıkırtıları)
“I have nothing to say for even this. But this building hadn’t provided the
Hürriyet employees with fortune or alike.”
11
(4) Maliye Bakanı için Kordon'da ayrılan eve annesini yerleştirmiş ve o evi
alacağını söylemişti.
(Betül Uncular-Dünden Bugüne Lacililer)
“He had his mother settled in the house that had been booked for the
Finance Minister in Kordon and had declared to buy that house..”
In our corpus data tri-grams are deployed to express referential links and they
achieve discourse organization via a multitude of sub-categories which are found to
be specific to the informative domain. In this respect, the use of lexical sequences
functioning as transitional signals and framing signals such as, bir başka deyişle ‘in
other words’ and başta olmak üzere ‘being in the first’ are likely to index an
informative text. Some of the sample tri-grams typical of informative domain are
given in Table 11.
Table 11. Tri-grams specific to informative domain
N-gram
InformativeFrequency
bir başka deyişle ‘in other words’
59
başta olmak üzere ‘being in the
25
first’
bunun sonucu olarak ‘as a result’
24
bu tür bir ‘of this kind’
27
bu nedenle de ‘because of this’
47
bir bütün olarak ‘as a whole’
18
kendine özgü bir ‘a unique’
17
söz konusu olan ‘ the given’
16
ImaginativeFrequency
1
2
2
3
5
2
4
3
%
0.02
0.07
0.08
0.11
0.11
0.11
0.24
0.31
(5) Kanadalılar daha sonra misyonerlik de başta olmak üzere birçok şey
yapacaktı altın işletmeciliği adı altında. (İbrahim Türkhan-Tanrı Dağlarının
Yankısı)
“The Canadians were to do lots of things, with missionary being in the first,
under the frame of gold industry.”
Tri-grams sensitive to imaginative domain again reflect the register properties
of the fictional prose. Seven top-ranking lexical chunks used for reporting function,
highlighting narrative time sequence and emphasizing interactional links
respectively are given in Table 12.
Table 12. Tri-grams specific to imaginative domain
N-gram
ImaginativeFrequency
dedim kendi kendime ‘said to to
18
myself’
InformativeFrequency
1
%
0.06
12
dedi kendi kendine ‘said to
her/himself’
bir hali vardı ‘as if s/he was’
tam o sırada ‘at that moment’
bir an için ‘just for a moment’
öyle değil mi ‘isn’t it so’
her zamanki gibi ‘as always’
17
1
0.06
16
33
30
35
61
1
4
5
8
18
0.06
0.12
0.17
0.23
0.30
(6) Hepsi bir ağızdan konuşup durdular. Kelebeğime baktığımda, bir ölüye
benzemiyor, dedim kendi kendime. (Şebnem İşigüzel-Öykümü Kim
Anlatacak)
“All continued to speak at the same time. When I looked at my butterfly, it
doesn’t look like a dead, I said to myself.”
Finally, on the basis of the proportion analysis we identify that informative
and imaginative texts display similarity in the use of some of the tri-grams presented
in Table 13.
Table 13. Tri-grams similar across domains
N-gram
ImaginativeFrequency
bir o kadar ‘at least that much’
18
her geçen gün ‘day by day’
19
ne yazık ki ‘unfortunately’
61
diye bir şey ‘something like that’
23
InformativeFrequency
17
18
58
22
%
0.94
0.95
0.95
0.96
(7) Yüzü aşkın ölü,
yaralı.
(Uğur Kökden-Bin dokuz Yüze Veda)
“Dead over a hundred, and also injured at least that much”
(8) Her kırmızı ışıkta durulacak
yok tabii de, ama ille durulması gereken bir durum olduğunda, n'apılacak?
(Ferhan Şensoy-Rum Memet)
“Of course, there is not something like to stop at each red light, but if there
exists a situation to stop necessarily, what will be done?
7. Conclusion
In this short paper, we have presented initial results of our analysis of MWUs over
data from two specially constructed corpora. We aimed (i) to provide a preliminary
typology of MWUs identified in the corpora, and (ii) to discuss their particular
functions in different domains representing two registers, namely fictional prose and
informative texts.
13
We observe that compared to English, the structural types of multi-word units
in Turkish are far less in number. The main reason for this difference may be
attributed to morphological properties of both languages. Apart from
quantificational differences, in both languages, similar structural types of MWUs are
found.
As for their respective functions and more specifically, the distribution of
MWUs and their recurrent use help define domain/register differentiation. These
lexical structures appear more frequently compared to others contribute to sense and
distinctiveness of a register as suggested via the proportion test implemented in this
study.
References
Aksan, Y., Aksan, M., Koltuksuz, A. et al. 2012. Construction of the Turkish National Corpus
(TNC). Proceedings of Eight International Conference on Language Resources and
Evaluation (LREC2012).
Anthony, L. 2010. AntConc (Version 3.2.2.5w) [Computer Software]. Tokyo, Japan:Waseda
University. http://www.antlab.sci.waseda.ac.jp/
Banerjee, S. & Pederson, T. 2003. The design, implementation and use of the (N)gram
(S)tatistic (P)ackage. Proceedings of the fourth international conference on intelligent text
processing and computational linguistics, 370-381.
Biber, D., Conrad, S. and Cortes, V. 2004. If you look at ... : Lexical bundles in university
teaching and textbooks. Applied Linguistics 25, 371–405.
Biber, D. 2009. A corpus-driven approach to formulaic language in English. International
Journal of Corpus Linguistics 14, 275-311.
Breeze, R. 2013. Lexical bundles across four legal genres. International Journal of Corpus
Linguistics 18, 229-253.
Carter, R. A. & McCarthy, M. J. 2006. Cambridge grammar of English. Cambridge:
Cambridge University Press.
Cortes, V. 2004. Lexical Bundles in Published and Student Disciplinary Writing: Examples
from History and Biology. English for Specific Purposes 23, 397–423.
Hyland, K. 2008. As can be seen: Lexical bundles and disciplinary variation. English for
specific purposes 27, 4–21.
McCarthy, M.J. & Carter, R.A. 2006. This that and the other: Multi-word clusters in spoken
English as visible patterns of interaction. In: McCarthy, M.J. (ed.) Explorations in corpus
linguistics. Cambridge: Cambridge University Press, 7-26.
Minitab 16 Statistical Software 2010.[Computer Software]. State College, PA: Minitab, Inc.
(www.minitab.com).
O’Keeffe, A., McCarthy, M.J. & Carter, R.A. 2007. From corpus to classroom. Cambridge:
Cambridge University Press.
Sinclair, J. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.
Wray, A. & Perkins, M.R. 2000. The functions of formulaic language: An integrated model.
Language and Communication 20, 1-28.
Download

Multi-Word Units in Imaginative and Informative Domains