A good test needs to be fit for purpose - in other words it needs to match the requirements of the situation in which it is being used. This involves it demonstrating a number of qualities associated with effective testing in general - reliability, transparency, coverage, different types of validity, practicality etc - but also matching the needs of the learners with whom it is being used - and this might include such factors as age, level, culture, communicative needs and so on.
Some examples :
- if a progress test or an achievement test did not test the items taught in the preceding course, the test would lack content validity and not be "fit for purpose". It would fail to show if they had or hadn't assimilated and retained those items and therefore would not accurately reflect their progress or achievement.
- if a placement test did not have a number of items likely to be known by learners who were at each level of linguistic competence (say for the CEFR levels A1 - C2) its coverage would not discriminate effectively between learners. If for example it had a very limited number of items at A2 and all of them at the upper end of the level, it could result in learners who were actually at B1 level being placed in A2 classes (because they had correctly answered the A1 items but not the "hard" A2 items or items from B1 and beyond). However, that learner might well have a good grasp of "easier" A2 items that were not included in the test and actually be ready to start a B1 class - where revision of "harder" A2 items would probably be included. It would therefore not be "fit" for its intended purpose of placing learners in the correct class.
- similarly, if learners wanting to enrol at a language school were told that the placement test (taken before they have made the final decision to enrol) would take three hours of their time, they might well walk straight out and go to the school's nearest competitor. This lack of practicality would mean the test was not "fit" for its purpose as a sales tool.
- if a practice test for an exam such as the Cambridge Flyers test (intended for 7-12 year olds) was administered to a group of teenagers, they might feel it was too "babyish" and be demotivated. The content would not be age-appropriate and therefore not "fit for purpose".
- if a group of learners learner were taking a one-to-one Business English course in order to improve their ability to give Sales Presentations, and in the diagnostic test for the course were just asked to chat to the teacher about what they had done at the weekend, the test would only give evidence of their ability to participate in short turns on conversational topics - it would not show not whether they were able to organise and deliver a long turn using lexis specific to their product or service, or persuasive language necessary in a sales context. As well as the problem of lack of evidence of their true communicative needs, it might also lack face validity : they would not see the point of having spent time on something irrelevant to their work, and again, this might result in demotivation at the beginning of the course, while one purpose of a diagnostic test should be to assure the learners that the course will cover exactly what they need/want it to. Both these factors would therefore mean that the test was not "fit for purpose".
- if the same test were given to the learners at the end of the course, it would not only lack content validity (presuming that the course had focused on Sales Presentations) but also predictive validity as it would not produce evidence for the learners or their company as to whether they would now be able to meet the demands of their jobs when speaking to clients. Again - the test is not "fit for purpose"
- if a direct test such as a writing test had no specific set of marking criteria, ensuring that a certain number of marks were awarded to specific areas (eg layout, organisation of ideas, grammatical range and accuracy, lexical range and accuracy, stylistic appropriacy etc) difference markers might emphasise some areas over others. This would mean that the marks awarded to any text would depend as much on who by chance marked the test as on the actual content of the text itself. This lack of marking reliability would mean that the test did not accurately discriminate between the learners in the cohort (Learner A might gain 70% if her text were marked by Teacher X, while Learner B gained only 40% because teacher Y was using different criteria) and this would again mean the test was not "fit for purpose".
These are only a few examples, but serve to show how, while all the factors are important for all tests, in specific circumstances some will be more important than others in determining exactly how "fit for purpose" the test is.