How it works

Διαβάστε στα Ελληνικά

The test estimates receptive vocabulary — a number of words you recognize in reading and listening. The only way to do it precisely is to take a really thick vocabulary and check whether you know every word one by one. Well, nobody wants to do that. However, there is a better way thanks to Item Response Theory (IRT) — a modern paradigm for the design, analysis, and scoring of tests. According to this paradigm, we assume that your vocabulary is a latent trait, or ability, which can be expressed as a number and measured. The measurement consists of a series of test words of various difficulties, which can be marked as known or unknown. For example, the word “cat” has low difficulty, while “recusant” — on the opposite, has very high difficulty. The difficulty scale is closely related to how frequently we see, hear, or use these words. IRT gives a mathematical prescription on how to calculate one’s ability based on responses to a set of test items of various difficulties — and that is exactly how we do it.

To make the test quick and precise, we use Computerized Adaptive Testing (CAT) technique — another standard in the world of modern testing. We calculate your vocabulary after each response to a test word. Then, we choose the next test word, so it is not too easy or too hard — in this way we maximize the amount of information each test item contributes to the test. The precision of the vocabulary calculation gets better with each step; the test stops when it reaches a certain threshold.

Dictionary

We chose to use the Dictionary of Standard Modern Greek (Triantafyllidis Dictionary) as our reference for how many words the modern Greek language has and what is considered a word (as opposed to an inflected form). The dictionary contains 45,000 headwords, so we used this number. We selected this dictionary because it is, first, authoritative and widely accepted as a reference, and second, focused on commonly used language (it does not include many narrow scientific terms, archaic words, slang, etc.)

Frequency data

We collected frequency data for each headword from the dictionary from Hellenic National Corpus of Greek Language (HNC).

How easy is it to fool the test?

There are two types of checks. First, there are some non-words among the test items. Second, if you mark a test word as known, you might be asked to clarify its meaning by choosing between 4 definitions. At the end, we calculate attention index with a simple formula (x+y)/(ax+ay), where x is a number of non-words marked as unknown, ax is a total number of presented non-words, y is a number of multiple-choice questions answered correctly, and ay is a total number of presented multiple-choice questions. The final vocabulary estimation is not affected by the attention index. The index is only used to decide whether the response data are valid and can be used in our research.

The team

Grigory Golovin — myVocab platform, programming, data analysis.
Alex Terekhov — test words selection.