An ELT Notebook : Corpus / Corpora



A corpus (plural : corpora) is a database of  samples of real language (either written or spoken) stored on a computer which can be used to investigate language use.  For example, the British National Corpus  includes samples of late 20th century British English amounting to 100 million words  while the Corpus of Contemporary American English includes samples amounting to 450 million words.

Corpora can be searched for specific words and expressions by using a concordancer to research eg the frequency of an occurrence of a word, common collocations for a word, differences in the use of a word in different varieties of the language etc.

See here for an example of a concordancer available for online use.