An ELT Notebook: An ELT Notebook : Corpus / Corpora

A corpus (plural : corpora) is a database of samples of real language (either written or spoken) stored on a computer which can be used to investigate language use. For example, the British National Corpus includes samples of late 20th century British English amounting to 100 million words while the Corpus of Contemporary American English includes samples amounting to 450 million words.

Corpora can be searched for specific words and expressions by using a concordancer to research eg the frequency of an occurrence of a word, common collocations for a word, differences in the use of a word in different varieties of the language etc.

See here for Lextutor, an example of a concordancer available for online use.

Further reading

Friginal, E. Corpus linguistics for Language Teachers. Routledge

Pages

An ELT Notebook : Corpus / Corpora