In Part Five of our series on the Cambridge English Delta Module One exam, we move on to Paper Two of the written exam, and look at the requirements of Task One - the "testing" question. There's quite a lot of terminology involved in this area - in general this article takes it for granted that you've covered this on your course and know what is meant.
This article is written by Sue Swift, who was involved with Cambridge English Diploma schemes for nearly 50 years, both as a tutor and assessor.
Paper Two, Task One asks you to demonstrate your knowledge of the features that might make a test suitable or unsuitable for a specific learner. You're given a description of the learner, then the test and have to identify six positive and negative points about the test. Youre answer must include both positive and negative points, but the balance doesn't matter - it might be 4/2, 3/3 or whatever.
There are 18 marks available for this task. Each point you make gains you two marks, plus an additional mark for stating the effect it might have on the learner.
Common mistakes? Getting obsessed by the terminology and letting this rule the answer. Don't. Look at the consequences that the test might have for the learner and start from there. You're looking at issues such as : would the learner actually be able to do the test?Why or why not? How would knowledge of an upcoming test or the results of a previous test shape the learner's study of the language? How might it motivate or demotivate her? What consequences would it have for her life outside the classroom (ability to do a job, promotion to a higher level course etc).
So what do you have to take into consideration ? Here are a few things you might think about as you look at the test :
Administration, Level, Instructions, Test Type, Length, Evidence, Fresh starts, Activity Type, Content, Age, Timing, Item Type, Needs, Marking, Adaptability, Language, Format, Imagination vs. Communication.
That's not intended to be an exhaustive list, but it does cover many of the main areas you need to consider. Why that order? You may like to reorganise it into categories, but as it is, you'll see that the first letters, which I've highlighted, spell A little fat cat in Malfi - which may help you to remember the categories. Use it, or organise the list more logically, as you find easiest.
So what might you need to say about each of these? Let's take the following example of a learner and the test she is given :
L. is at the start of a two-week intensive course at a language school in Britain. Her reason for taking the course is that her daughter has recently married an Englishman and is now expecting her first child. L wants to be able to speak to her son-in-law and grandchild, and to cope practically and socially on visits to England. On the form she completed when enrolling for the course, she self-assessed her level as pre-intermediate.She is given the following test as the speaking skills part of the placement test which is administered on the first day. Each student is interviewed separately by a teacher, who gives a mark based on their overall impression of the learner's speaking ability
In the exam you will then see the test she is given. However, for our purposes just imagine that it's a picture story, showing a man who oversleeps, gets ready for work in a tremendous hurry, drives to the station, and buys a newspaper while he's waiting for the train. When he gets the newspaper he realises that it's Sunday, and that he doesn't need to go to work at all. So he goes back home and goes back to bed.
What might you say about the categories in the list above in relation to this test? You might like to think about it before you read on. Try and find three positive points and three negative points as you would need to in the exam.
Administration : This category concerns the effect the way the test is administered may have on the learner. Here, the fact that the test is given one-to-one means that if there were a large number of students starting courses on the same day, it could mean a certain amount of waiting around for the learner and could create dissatisfaction with the course even before she started it. It also means that, if she comes from a culture where teachers are held in high esteem, the power-distance which she assumes to exist between herself and the teacher means that she waits for the teacher to take the lead in the conversation, answering questions or talking about what she is told to, but without making any attempt to control the discourse. (This format might be an important problem with young learners too, who could "freeze" if asked to talk to a strange adult in what is clearly a test situation.) All of this could affect the reliability of the result - the picture that emerged of the learner's ability might not be a reflection of her true competence in peer-to-peer interaction. And this could result in her being placed in a lower level class than the one which is actually most suitable for her.
Level : Any test needs to be "doable" at the learner's level. If it's too difficult, it may result in learner demotivation, whereas if it is too easy it may not give the teacher, the learner, or other stakeholders the evidence that they need to make decisions regarding further study (should she progress to the next level? what needs to be revised?) or use of English outside the classroom (should she be promoted, or given a place on a university course?). With some tests, the general level is known - a progress test, for example, will be based on what the learner has been studying, and the level of difficulty will be determined by such factors as how many "tricky" items are included - but a placement test like this one needs to be "doable" by learners at a variety of levels - by definition, the exact level of the learners taking the test is unknown. This test would seem to be adaptable to a variety of levels - more competent learners would be able to narrate the story using a variety of past verb forms, sequencing devices and so on, less competent students would be able to describe it picture by picture using the present continuous, and low level learners would be able to answer simple questions put by the teacher : Where is the man? What's the time? etc. Thus whether L. was right when she self-assessed her level, or had over- or underestimated, she would find the test "doable", thus giving reliable evidence of her actual level of communicative competence which would lead to her being placed in the correct class.
Instructions : How clear are the instructions for the activity? Is there an example? If the learner is not completely clear about what she has to do in the test, then she may do it incorrectly for reasons unrelated to her linguistic competence, the result again becoming unreliable. Here the teacher is on hand to explain, so the problem shouldn't arise, and because of the need for flexibility mentioned above, the instructions may not need to be too watertight. This is particularly important if the activity (or task) type is one which is liable to be unfamiliar to the learner - an example might be a sentence transformation activity of the type where the learner has to complete a second sentence so that it means the same as a given first sentence, eg : It's too expensive / It costs........If the learner has never done this type of activity before, at least one completed example would be necessary to ensure s/he fully understood what was required.
Test Type and Adaptability: Is it a placement test, a diagnostic test, a progress test, an achievement test or a proficiency test? Each of these will have different objectives, and will therefore need a different format, contents etc if they are to achieve those objectives. We have seen one example already - the placement test which needs to be "doable" at a range of different levels. An example of a test with a different purpose, and therdeefore needing different qualities, would be a progress test (eg an end of unit test). This has a formative purpose - it aims to evaluate how well the learner has assimilated the language items, subskills etc which she has been taught in order to allow both the teacher and learner herself to see what needs to be revised, to allow the teacher to see whether she is teaching at the correct pace for the learners, etc etc. A progress test will therefore not need to be "doable" at different levels - on the contrary it will need to be "doable" only by those learners who have studied that particular section of the course, and will provide reliable results only if it recycles the lexis, structural/functional areas, subskills etc which have just been taught. A progress test which brought in items which the learners had not studied would be lacking in content validity. This might happen if eg the test focused on a stucture that had been taught, but in a context different from those they had encountered on the course. The lexical items needed might then block the learners from performing well, even though they had actually assimilated the structure.
Length : Test length is always problematic. Too short and there will be inadequate coverage of language items and skills, leading to unreliable results. Too long and there will be problems of practicality. Doubling the test time also means doubling the administration and marking time, with resultant consequences for staffing costs. And when you're testing, you can't be teaching - in our placement test example, the more time testing takes away from the course, the less time there is to cover the learner's needs.
Evidence : Every test is intended to provide evidence of some sort for the stakeholders. We've seen that a progress test has to provide reliable evidence in order to ensure the learner is correctly placed, while a progress test has a formative aim and should answer the question What do we need to do next? Sometimes the evidence is necessary not for the teacher, the learner or the institution, but for an external stakeholder such as an employer or university. If the evidence produced by the test is reliable, the correct decisions may be made. If not, they can't - probably to the detriment of the learner. For example, if a progress test focusing on a specific structure contains only multiple choice items, that may tell the teacher that the learner recognises the correct form and use of the structure, but gives no evidence of whether the learner would actually use the structure spontaneously and accurately when speaking or writing. The teacher who took the test result as evidence that no further work was necessary on the structure might therefore later find the student was still avoiding it. Or, again, an employer who wanted to know if the learner could deal simple enquiries on the phone, might make the wrong decision about appointing her if the test she was given consisted of situations dealing with responding to complaints - a far more challenging task.
Fresh starts : Fresh starts are another feature that can make test results more reliable, and the Delta itself is a good example. A long time ago, when Delta was still DTEFLA, the written exam consisted of three one-hour essay questions. This meant that after a course focusing on a wide range of topics (ask yourself how many different topics you've covered on your Module One course - ours must have between eighty and a hundred), your final grade was determined by your ability to write about just three of them. If none of the topics which came up happened to be your "speciality" or if you had a general knowledge of everything but in-depth knowledge of nothing, you might do less well than someone who in fact knew relatively little, but just happened to know a lot about the three topics she had to or chose to answer questions on. With the new format, this can't happen. With eight questions, many of which have a number of different sections, and all of which are marked by individual point made, there are now numerous "fresh starts". If you don't know the first term defined in Paper 1.1, you may still know the second; if you can't analyse the phonology of the phrases specified in Paper 1.5, you may still be able to analyse the form and meaning, and so on. So fresh starts lead to a much more reliable result - there is no chance of the result being swayed by a specific strength or weakness. Applying this to our placement test, this test has no "fresh starts". Presuming that the learner starts narrating the story using past verb forms, her final mark is liable to be dominated by her accuracy and fluency in using those. If this is high, it may hide the fact that when talking about the future she relies exclusively on will + infinitive. Or if it is low, may not reveal that in general social conversation using the present simple, she is both accurate and fluent.
Activity Type : Is the activity a direct test of speaking, writing, listening or reading, or an indirect test? Direct tests are generally preferred, but both can have disadvantages. For example, imagine that we wanted to know if a learner could make polite requests. We could test this indirectly with a gap fill activity : Would you mind .... (open) the window?
This tells me if the learner knows that the -ing form must be used after mind (form), but tells me nothing about whether she really understands the use, or what form she would use spontaneously. If on the other hand, I ask the learner to roleplay a situation where she needs to make polite requests (a direct test) and she continuously uses can + infinitive, I have no clear evidence of whether she knows the form/use of would you mind.... The indirect test forces the learner into using the language I want to check, but tells me nothing about his/her own use of the item. The direct test tells me what items s/he spontaneously uses, but not which ones she also knows. In a progress test therefore, after I'd been focusing on polite requests in class, I might want to use both indirect and direct tests - the first would tell me if the learner could control the target forms when "pushed" into it; the second whether they were now used spontaneously or still avoided.
Activity Type : Also under activity type, you might consider whether the activity type is valid - does it test what it is supposed to be testing? For example, consider the implications of including a dictation as part of a listening test. A dictation is general a written text and a monologue, which is read fairly slowly, with phrases repeated so that students can write down every word. However, this is not what listening is all about. When people listen, they don't usually have the chance to hear what is said more than once. They don't retain the exact words that are said, but rather the overall meaning. In many ways then, what students do when taking down a dictation is not the same as what they do in the real listening situation. The test can therefore be said to lack construct validity - what they need to do in the test is not the same as out threory of what they need to do in the real situation..
Activity Type : Also under activity type, you might consider whether the activity type is valid - does it test what it is supposed to be testing? For example, consider the implications of including a dictation as part of a listening test. A dictation is general a written text and a monologue, which is read fairly slowly, with phrases repeated so that students can write down every word. However, this is not what listening is all about. When people listen, they don't usually have the chance to hear what is said more than once. They don't retain the exact words that are said, but rather the overall meaning. In many ways then, what students do when taking down a dictation is not the same as what they do in the real listening situation. The test can therefore be said to lack construct validity - what they need to do in the test is not the same as out threory of what they need to do in the real situation..
Content : There are a multitude of things you could talk about here. If the test is a progress or achievement test, does the test content reflect what has been taught; if the test is intended to have predictive validity (ie to indicate whether the learner would be capable of performing adequately in a communicative situation outside the classroom), does the content mirror the learner's communicative needs. One of the examples given above illustrates this - if we want to find out whether the learner could deal with simple enquiries regarding her companies services, a test focusing on her ability to deal with complaints will not provide reliable evidence.
Age : is the test suitable for the learner's age group? A 12 year old might find a reading comprehension intended for adults too cognitively challenging, but a text intended for young children too "babyish".
Timing : How long does the learner have to complete the test? Is this sufficient? If s/he feels rushed, and doesn't have time to finish, s/e may feel the test was unfair and that it tested her ability to work under pressure rather than her actual knowledge of the language. the test would therefore lack face validity for the learner.
Needs : What are the learner's communicative needs and how are these reflected by the test? We've already seen that a test may need to reflect needs in order to provide reliable evidence, but a test which does so is also liable to have greater face validity for the learner (she will feel that the test truly reflects her ability to use the language outside the classroom). In the case of an achievement or proficiency test at the end of the course, a a test related to learner needs will also have positive backwash - knowledge of what is coming up in a later test often shapes the teacher's choice of the content and activity types in the course. If the test did not reflect the learner's needs, the teacher might therefore be tempted to spend course time working on areas which were actually irrelevant to her.
Language : closely related to the concept of needs is the type of language that the learner will need to use in the test. The fact that the activity type or topic reflects her needs, doesn't necessarily mean that the language it involves will be the same. Take our example learner : she's asked to tell a story based on picture - which is one of the things she might want to do with her grandchild. But the style of language she will need to talk to a child and talk about pictures in a story book (caretaker talk or motherese) is not the same as the language she will probably produce in response to this decontextualised task. Without a context (why is she telling this story? who to? in what setting?) there is much less evidence of her ability to communicate in a given situation.
Item type and Marking: How will the test be marked? If it consists of discrete point items, then there will be "right answers" and marking should be objective. This will probably mean though, that the task types are indirect tests. if we want to use direct testing, then marking will often have to be subjective.
In our example situation the test was marked on the basis of the teacher's "overall impression" of the learner's ability, rather than on the basis of agreed criteria ( eg so many marks for grammatical accuracy; so many for fluency and the use of coping strategies; so many for intelligibility of pronunciation; etc). This creates the risk that one teacher might mark "harder" than another, or over-emphasise one particular category - eg marking down a learner who was grammatically inaccurate without taking the other categories into consideration. The test result might not therefore be reliable, resulting possibly in our learner being placed in an inappropriate class.
Familiarity: How familiar the learners are with a task type will often affect how well they do it - the example of the sentence transformation task mentioned above is a case in point. Learners who have done this type of task frequently will know the "tricks" (eg if the first sentence is not + adj + enough the transformation will be too + adj), and therefore stand to do better than students meeting the task type for the first time. This can affect the reliability of the result and, again in the case of an achievement or proficency test, may create negative backwash - the teacher spends course time teaching the "tricks" of the task type rather than improving students' general language competence.
Imagination vs. Communication : A task where the learners have to invent content may test their imagination rather than their ability to use the language. Imagine a writing task where the learner was asked to write an email to a customer explaining the reasons for a delivery delay. Learners who had experience of customer service and could write from experience would clearly find the task easier than others who had to work purely from imagination. if they couldn't think of what to say, they would do badly on the test because of lack of ideas and not necessarily because of lack of communicative ability. One of the plus points about the test in our example situation is that the learner doesn't have to invent anything. The content of the story is given by the pictures, and the task just tests her ability to communicate the given meanings. From this point of view, therefore, it should give a reliable result and lead to her being placed in an appropriate class.
So - what would be my three plus points for this test ?
1. The fact that it is "doable" at a variety of levels of competence (see above for why) means that it is "fit for purpose" as a placement test. The learner will be able to talk about the story in some way or other regardless of her proficiency, but what she is and isn't able to say should give reliable evidence of her level, meaning that she is placed in an appropriate class. Being able to perform to the best of her ability in the test will also make her feel that it was fair (face validity), and she will not be demotivated by feelings of failure.
2. The pictures illustrate the story and the learner is asked to talk about what she sees. The task therefore tests her ability to communicate given ideas in English - not how creative she is or how quickly she can invent something (communication not imagination). This will increase the reliability of the result, again meaning she is more likely to find herself in an appropriate class. It will also leave her feeling satisfied that she has said all she could without being blocked by non-linguistic factors.
3. This is a direct test and will therefore give clear evidence of a variety of elements involved in the speaking construct : range and accuracy of grammar and lexis; intelligibility of pronunciation; ability to express meaning through stress and intonation; fluency and the ability to use coping strategies such as circumlocution etc. The teacher should therefore be able to assess her ability in each of these areas, and her overall competence, accurately.
And the negative points?
1. The lack of specific criteria for marking means that the results may not be reliable - they may be influenced by the teacher's particular "hobbyhorse" categories, or even by the mood she is in on the day of the test. This may lead to the learner being placed in an inappropriate class, and to lack of face validity - she may disagree with the result and feel it was unfair.
Language : closely related to the concept of needs is the type of language that the learner will need to use in the test. The fact that the activity type or topic reflects her needs, doesn't necessarily mean that the language it involves will be the same. Take our example learner : she's asked to tell a story based on picture - which is one of the things she might want to do with her grandchild. But the style of language she will need to talk to a child and talk about pictures in a story book (caretaker talk or motherese) is not the same as the language she will probably produce in response to this decontextualised task. Without a context (why is she telling this story? who to? in what setting?) there is much less evidence of her ability to communicate in a given situation.
Item type and Marking: How will the test be marked? If it consists of discrete point items, then there will be "right answers" and marking should be objective. This will probably mean though, that the task types are indirect tests. if we want to use direct testing, then marking will often have to be subjective.
In our example situation the test was marked on the basis of the teacher's "overall impression" of the learner's ability, rather than on the basis of agreed criteria ( eg so many marks for grammatical accuracy; so many for fluency and the use of coping strategies; so many for intelligibility of pronunciation; etc). This creates the risk that one teacher might mark "harder" than another, or over-emphasise one particular category - eg marking down a learner who was grammatically inaccurate without taking the other categories into consideration. The test result might not therefore be reliable, resulting possibly in our learner being placed in an inappropriate class.
Familiarity: How familiar the learners are with a task type will often affect how well they do it - the example of the sentence transformation task mentioned above is a case in point. Learners who have done this type of task frequently will know the "tricks" (eg if the first sentence is not + adj + enough the transformation will be too + adj), and therefore stand to do better than students meeting the task type for the first time. This can affect the reliability of the result and, again in the case of an achievement or proficency test, may create negative backwash - the teacher spends course time teaching the "tricks" of the task type rather than improving students' general language competence.
Imagination vs. Communication : A task where the learners have to invent content may test their imagination rather than their ability to use the language. Imagine a writing task where the learner was asked to write an email to a customer explaining the reasons for a delivery delay. Learners who had experience of customer service and could write from experience would clearly find the task easier than others who had to work purely from imagination. if they couldn't think of what to say, they would do badly on the test because of lack of ideas and not necessarily because of lack of communicative ability. One of the plus points about the test in our example situation is that the learner doesn't have to invent anything. The content of the story is given by the pictures, and the task just tests her ability to communicate the given meanings. From this point of view, therefore, it should give a reliable result and lead to her being placed in an appropriate class.
So - what would be my three plus points for this test ?
1. The fact that it is "doable" at a variety of levels of competence (see above for why) means that it is "fit for purpose" as a placement test. The learner will be able to talk about the story in some way or other regardless of her proficiency, but what she is and isn't able to say should give reliable evidence of her level, meaning that she is placed in an appropriate class. Being able to perform to the best of her ability in the test will also make her feel that it was fair (face validity), and she will not be demotivated by feelings of failure.
2. The pictures illustrate the story and the learner is asked to talk about what she sees. The task therefore tests her ability to communicate given ideas in English - not how creative she is or how quickly she can invent something (communication not imagination). This will increase the reliability of the result, again meaning she is more likely to find herself in an appropriate class. It will also leave her feeling satisfied that she has said all she could without being blocked by non-linguistic factors.
3. This is a direct test and will therefore give clear evidence of a variety of elements involved in the speaking construct : range and accuracy of grammar and lexis; intelligibility of pronunciation; ability to express meaning through stress and intonation; fluency and the ability to use coping strategies such as circumlocution etc. The teacher should therefore be able to assess her ability in each of these areas, and her overall competence, accurately.
And the negative points?
1. The lack of specific criteria for marking means that the results may not be reliable - they may be influenced by the teacher's particular "hobbyhorse" categories, or even by the mood she is in on the day of the test. This may lead to the learner being placed in an inappropriate class, and to lack of face validity - she may disagree with the result and feel it was unfair.
2. The fact that the test consists of one task only, and that there are no "fresh starts" means that it will not give a clear picture of her general competence but only of her strength or weakness in a specific area - the ability to narrate past events. There is no evidence of other areas - talking about future events, making requests or offers, agreeing and disagreeing etc etc. This means the test will not give a full picture of her ability, so that again the results may not be reliable and might again result in her being placed in an inappropriate class.
3. The format of the test also means that it will test only her ability to monologue, rather than her ability to interact (which lowers its construct validity, as interactive skills are part of the speaking construct). No evidence will therefore be gained of her ability to negotiate topic, deal with communication problems, respond spontaneously to what other people say, etc. Again, the test will therefore not give a full picture of her competence and L. may feel that it is unrelated to her need for social interaction - ie it will again lack face validity.
Missed some of the other articles in this series? You'll find links to all of them here - just scroll down the page. But if you're preparing for the Delta Module One, don't forget that you'll find a lot more information about all the tasks in the exam, with sample questions and answers, plus advice for tackling the questions in the Handbook for Tutors and Candidates, and sample papers (from June 2016) and the accompanying exam report published by Cambridge. Click on the link to download them.