This paper puts forward the idea that graded reading, or extensive reading, is a completely indispensable part of any language program, if not all language programs. In order to demonstrate the case for an extensive reading component within any language program, it is useful to distinguish between two kinds of learning. The first is learning to use language. The second is studying about language.
Learning to use language means learning to use a language feature such as a verb, a grammatical construction or a lexical item, fluently and automatically in communicative situations. In order to do this, the learner should not be bogged down with form. If the learner has to think mid-sentence how a tense or a phrase should be constructed, then the fluency changes to a focus on language items. Studying aboutlanguage involves finding out about how the language items work, such as learning the grammar and vocabulary through course books, a teacher’s presentation, or from a reading passage. The learners are introduced to a piece of language in say, a reading or listening passage and then asked to analyze it and find out its form and function. For example, the learners may learn the difference between make and do, or between the past simple tense and the present perfect tense, and so on. Typically, in course books and lessons, this presentation phase is followed activities that check the item is understood and can be manipulated and controlled at the form, meaning and pragmatic levels by giving some kind of drill, a gap-fill, a sentence completion activity, or a test. All this learning about language is fine, but how much language do the learners need to learn?
1. The amount of language to be learnt
Let us first look at the vocabulary. We know from vocabulary research that English is made up of a very few extremely common words which comprise the bulk of the language. In written text, we know that about 2000 word families cover about 85-90% of the running words in general texts and that 50% of any text will be function words (Nation 2001). We also know that to read a native novel, a newspaper or a magazine with 98% vocabulary coverage, a learner would need to know about 8000-9000 word families. But how should these words be learnt? And what do we mean by “learning”? And how do we define a “word”?
One of the few things language researchers can agree about is that learners can learn words from reading provided the reading is comprehensible. They may though, disagree over the uptake rates and types of texts to be used. Determining uptake rates is a vital component in the overall picture of vocabulary learning because these rates affect how much text learners need to meet, and over what time period the learning should take place. Over the last decade or so we’ve been able to patch together a picture of the rate at which incidental vocabulary learning can occur from second language reading. However, the estimates vary sometimes considerably. For example, Dupuy and Krashen (1993) state that 25% of their target words were learnt, and in other studies the figures range from 20% (Horst – Cobb – Meara 1998), to 6.1% (Pitts – White – Krashen 1989), and to 5.8% (Day – Omura – Hiramatsu 1991). More recent estimates put the uptake rate and 25% and 4% (Waring – Takaki 2003) depending on the type of test used to measure gains.
One of the reasons for this variation is that uptake rates vary widely depending on a range of factors. Among these factors are learnability, criteria for learning and theopportunity for successful learning. One of the main factors affecting learnability includes the ratio of unknown to known words in a text. The more dense a text is (more unknown words it has), the less likely incidental learning can occur. Liu Na and Nation (1985) and Hu and Nation (1999) suggest the optimal known word coverage rate be about 95-99% of known words for there to be a good chance that learning can take place. Learnability is also affected by other factors such as whether a word is concrete or abstract, a cognate or not, or if it appears with highly redundant co-text, and whether the target word appears in a transparent or opaque context, and so forth. Laufer (1989) and Nation (2001), and many others have shown that unless we have about 98-99% coverage of the vocabulary of the other words in the text, the chance that an unknown word will be learnt is minimal. This means that at minimum there should be one new word in 40, or 1 in 50 for the right conditions for incidental vocabulary learning. The figures for learning from listening appear to be even higher due to the transitory nature of listening (Brown – Waring – Donkaewbua 2008).
The criteria for learning refers to the measures used to assess learning. Waring and Takaki (2003) have shown that some test types are easier to complete than others. For example, a simple word recognition test (Have you seen this word?) requires only knowledge of seeing the orthographic string of letters and does not require the word’s meaning to be known. By contrast, an L1 to L2 translation test will require considerably more knowledge. Supporting this, Brown, Waring and Donkaewbua (2008) found that multiple choice tests consistently return higher scores than translation tests because they require less knowledge for completion. One of the reasons for this is that multiple choice tests will naturally generate a 25% score for 4 option tests if subjects guess randomly. Thus, depending on one’s criteria, the acquisition rates will vary considerably and researchers should be careful to select appropriate measures.
Uptake rates also depend on the opportunities for learning that is, the number of times an unknown word appears in a given text and how closely spaced the unknown words are, so that knowledge can be retained in memory before it is lost. It is pertinent to look at the opportunity that learners have for learning from natural text because this can tell us how how words are spaced in the language. Moreover, this data combined with the uptake rates stated above, can help us determine whether incidental learning of vocabulary from reading is efficient enough to be a major vocabulary learning strategy.
Table 1 shows the frequency at which words occur in a 50 million word sub-corpus (both written and spoken) of the British National Corpus (BNC) of English. The corpus was analyzed using Range (Nation – Wheatley 2000) whereby words were counted in word families by type. For example, all instances of the verb use and its derivations and inflections members (used, user, usefully, uselessness), count as one occurrence. The table can be read as follows. The most frequent word in English (the) covers 5.839% of any general English text (i.e. it occurs once in every 17 words) (see (1) in the table). The 2000th most frequent word in English covers 0.00432% of any general English text and occurs once every 23,103 words (2). Note that when the learner meets the 2000th most frequent word in English, this means that all the previous 1999 words have also been met at least once.
Table 1: A statistical analysis of the number of English words needed to meet at given occurrence rates to ‘learn’ that number of words
A
B
C (= 100 / B)
D (= x times C )
Word rank
Percentage of English this word covers
Number of running words needed meet all these words once
Volume of text needed to be read to meet the
words at these recurrence rates
5 times
10 times
20 times
50 times
1st most frequent (the)