Studying passive vocabulary by means of the internet

L.A. Ashkinazi, G.V. Golovin

The article is prepared for the 3rd ISA Forum of Sociology (Vienna, Austria 2016) "The Futures We Want: Global Sociology and the Struggles for a Better World"

We developed the first adaptive passive vocabulary test for Russian language. For this purpose, the full frequency dictionary of Russian language was built. The test is based on the assessment of respondent`s knowledge of words of varying frequency. The validity, accuracy and reliability of the test are checked. On the basis of analysis of more than 150 000 questionnaires, we studied how passive vocabulary depends on respondents’ age, education, reading habits, his/her parents` attitude to reading, TV and Internet usage.


Passive and active vocabulary is the basis of human communication capabilities. It determines the possibility of a person`s inclusion into a society. It is a “fingerprint” of a society on a person, and it is what we use to identify the social strata to which the person belongs to, and to define the method of further communication with him or her. Therefore, passive vocabulary affects the ability to communicate, and in general determines the possibility of acculturation, such as books reading.

The Internet is a suitable way for many kinds of sociological research, for example, for studying a society`s “footprints” (i.e. texts and images) and structure (i.e. particular communities and relations); finally, it is used for traditional public opinion polls. In all these cases it is necessary to take into account the peculiar issues associated with the Internet as a source of information: for example, online versions of publications are different from paper ones – as by the texts, so by their availability. It is easy to obtain data when conducting polls via the Internet, but the distinctions of the Internet audience from the regular one are usually limited to age and sex (which decrease over time) and, in the best case, to the differences of regional structure.

However, these are not the main differences – the main difference is that the easiness of answering questions via the Internet does not mean that everyone who sees the survey will respond to it. For example, on the web-site (the largest network Russian-language library) less than 1 out of 1 000 readers assesses texts - even though it takes just two clicks, which is a few seconds. Some web sites make attempts to force the audience: for example, the results of a “test” taken by a user are reported to him only after completing a questionnaire or making a payment. However, it is unclear how it affects the percentage of the responses - the effect may be opposite, and uneven across the sample. Thus, at the polls conducting via the Internet, the main thing is to understand who responds to our questions, and how we can characterize this real sample.


We studied the relations of passive vocabulary of Russian-speaking Internet users with a variety of their characteristics. Parameters of the sample are listed below, but its main feature is that these are the people who are interested in their vocabulary. Therefore, the respondents` demographic parameters are interesting not for the attempts to build a representative sample, but, on the contrary, to answer the question of which people are interested in their vocabulary. As there were no any special advertisements, in fact, the sample was composed by the people to some extent interested in this issue; further, the information was distributed in a common way for the Internet. Then people by themselves were selected, who were ready to spend about 5 minutes to pass the test, and the same amount of time - to complete the questionnaire. There was no force applied to get the answers, and as a result 85% of respondents, who passed the test, completed the questionnaire as well; this value can be regarded as a criterion of the interest level.

Data for the study were obtained from the web site, where the first adaptive test for Russian language vocabulary is implemented. Detailed description of the methodology is also published there. The test used several word-traps – which do not exist, but look similar to the real ones. In our analysis, we used data from respondents who did not make any false detection, which means that they were stricter to themselves. Inclusion of people who made one mistake into statistics would have increased the sample by 16%, but would have led to the overestimation of the results by 2% on average.


At the preliminary stage (123 000 respondents) we asked questions on sex, age and education of respondents, and obtained the following results.

To define the effect of education, we separately analyzed the results for the group older than 30 years (65 000 respondents). The following results were obtained for this group.

The resulting values of passive vocabulary and the speed of its completion seem to be exaggerated. This is due to the fact that in the test all derivatives words were counted separately (for example, "the work" and "to work", or "the city" and "city" [in the meaning of “urban”]). In addition, as it is noted above, our sample is formed from active Internet users with high level of education, and those who are interested in their vocabulary.

At the next stage of the research there were two questionnaires placed on the web site in series, which were completed by more than 10 000 and 20 000 respondents, respectively. Let us start with the description of the sample, i.e. the analysis of the characteristics of people interested in their vocabulary.

Thus, our respondents are active users of the Internet, people who have already got or who is getting higher education, who have books at home and read them, who do not use TV at all or use it very little. This is a portrait of people who are interested in their vocabulary. Especially interesting and unexpected is a tenfold increase of the proportion of students, 20 times increased number of people who do not use TV and a doubling number of active Internet users.

We will continue the consideration of the characteristics of people “interested in their vocabulary”.

In general, the proportion of Internet users aged 0-12, 13-18, 19-35 and 56-120 years is respectively 10% -10% - 45% - 30% -5%, but in our study it is 0,5% - 9% - 58% - 25% - 8%. It seems like the interest to the vocabulary awakes after high school, and the increased proportion of the oldest groups is a preserved tradition of respect for the culture of the word.

The proportion of women in the Internet is about 40%, and in our study - from 50% to 63% in different age groups, and about 55% on average.

The memories of the number of books in the parents` home are important for the respondents – 2/3 of the respondents or even a bit more in all the groups remember that there were a lot of books in their parents` home. Accordingly, the parents` attitude to reading was the following - more than a third part of respondents answered that parents have encouraged them strongly, and the same part answered that they have encouraged them moderately, and only about 5% were advised to protect the eyesight. The attitude to reading was expressed in the fact that respondents` parents read to them at night as they were children always, almost always or quite often in half of the cases or even more in all the groups.

The attitude to reading, however, varies considerably with the generations – positive answers to the question “Did you love to read in your childhood?” are given by different proportions of respondents in different age groups, and the younger the group, the less this percentage is. For the groups aged 13-18, 19-35, 36-55, and 56-120 years it is respectively 32% - 42% - 66% - 69%. Similarly, with generations the interest in school studying changes - though for most of the respondents studying at school was interesting, this interest decreases over time: for the same age groups the proportion of those who were interested changes as follows: 28% - 32% - 35% - 48%, and the ratio of interesting to boring is 1.6 – 1.6 – 2 – 4. While the latest change can be partially attributed to the evolution of the educational system, the attitude to reading is more personal characteristic.

It is possible that - at least in this social group - the place of books and school is partly occupied by parents and the Internet. In particular, to the question “Did you often talk to your parents on the “abstract topics”, “about life in general”, “about different difficult topics”, etc. in your youth?” in the above mentioned age groups answers “often” or “regularly” are typical for 31% - 26% - 20% - 18% of women and 23% - 16% - 13% - 11% of men. That is, parents tend to talk more with girls, and with time such communication slightly increases – in the youngest age group the larger proportion of respondents declares regular communication with parents. The answer “from time to time” is given by 41% - 47% of respondents in all the groups without any observable regularity.

Now let us proceed to the analysis of the effects on the passive vocabulary.

The size of home library increases passive vocabulary monotonically without saturation: in the range of <50 books to >1000 books dictionary increases by 24-27% in all the groups.

Reading (of paper and electronic books) increases the vocabulary monotonically with saturation: in the range of ≤ 1 books per week to 5-10 books per week dictionary grows in all the groups by 18-20%, and at the last stage of 5-10 books per week to > 10 books per week dictionary is no longer increases.

Reading and the size of home library affect the vocabulary independently, but, naturally, they are correlated: with the amount of < 50 books the highest number of respondents reads ≤ 1 book per week, and for all other library sizes the largest number of respondents reads 2, 3, 4 books per week.

Education increases vocabulary evenly over its levels: from unfinished secondary to secondary education dictionary increases by 14%, then to incomplete higher education – by 16%, then to higher education – by 12%, and to scientific degree – by additional 12%.

TV decreases dictionary monotonously without saturation: in the range from “do not watch TV” up to “5-10 hours per day” the dictionary decreases in the age group “more than 30 years old” by 10%, in the group “18-30 years old” by 17%, and in the group “less than 18 years old” by 26% - which means that a growing person is more vulnerable.

Communication (“How many hours per day do you communicate with other people?”), Internet usage (“How many hours per day do you use the Internet not for work?”) and gender of respondent do not affect his or her vocabulary appreciably. As the Internet, like television, takes time, it is possible to think that TV and Internet have different linguistic content and / or assimilation of new words takes place with different efficiency while listening and reading.

A library at home partly protects from the TV influence on vocabulary – with a library of 200-500 books and more a reduction in vocabulary with increasing time of TV usage disappears. However, it is possible that library owners watch TV content with different dictionary.

Surprisingly, reading does not protect from TV impact on dictionary. We can make a hypothesis that people who have library at home have a tradition of reading - in parents’ families and in general; at the same time, reading without library at home and without family tradition of reading might be just “chewing gum for the eyes”, because it does not add anything to the dictionary.

The fact that there were a lot of books in the house the respondent has grown up in (scale - I remember a lot - probably yes, but I cannot answer properly - it seems a little) has a significant influence on vocabulary. The effect diminishes with age, but remains for the whole life - a change from many to few reduces dictionary in the age group “less than 18 years old” by 20%, but also by 10% in all other groups.

The attitude to reading of people who have brought the respondent up (scale: encouraged me strongly - encouraged me moderately - neutral - advised me to protect the eyesight) weakly affects dictionary - for all the age groups transition from encouraged me strongly to advised to protect the eyesight reduces dictionary by 5%. The hypothesis we can make is that it is the example and opportunity which teach people, rather than preachments.

The respondent`s love to reading in the childhood (scale: yes, I preferred itrather yes than no - rather no than yes - there were other interesting things) strongly affects dictionary. Although the effect is becoming weaker with age, it remains for the whole life - changes from preferred it to had more interesting things reduces dictionary in the age group “less than 18 years old” by 30%, but also by 20% in all other groups.

The fact that the people who have brought the respondent up were reading to him or her at night in the childhood (scale - always or almost always - often - rarely or never) weakly affects dictionary - for all the age groups the transition from always or nearly always to rarely or never reduces dictionary by 5%.

Communication with parents, namely, “Did you often talk to your parents on the “abstract topics”, “about life in general”, “about different difficult topics”, etc. in your youth?” (scale: often, regularlyfrom time to time, it happened - rarely or never), influences dictionary significantly - moving from rare to often reduces dictionary in the group “less than 18 years old” by 10%, and by 5% in all other groups.

The extent to which it was interesting for the respondent to study at school (scale: rather interestingsometimes interesting, sometimes boring - rather boring) affects vocabulary as well. The transition from rather interesting to rather boring does not affect dictionary in the age group “over 56 years old’, reduces it by 5% in the group “18-55 years old” and by 10% in the group “less than 18 years old”.

The proximity of the above mentioned effects of the attitude to reading of people who have brought respondent up and their practice of reading to him or her at night and the weakness of these effects (though 5% of more than 10 000 respondents sample are quite real) can help us to conclude that this may be an artifact - those people whose dictionary is larger are more inclined to attribute this to the influence of their childhood. However, other effects are real. The reality of the library size impact is supported by the formal character of this parameter, and the reality of impact of love to reading and interest to studying – by the fact that these effects decrease with age (artifact would be growing).


We can conclude that own vocabulary is more interesting for the active Internet users with higher education and students, people having a lot of books at home, who read and do not use TV. Most of these people remember that there were a lot of books in their parents` homes, that in childhood they were encouraged to read, that parents read to them and they liked to read by themselves, and that it was interesting for them to study at school.

The main factors determining vocabulary size (for our sample - those respondents who are interested in it) are age and education. In addition, vocabulary increases with the library size at home where the respondent has grown up in and the home library the respondent currently has, with reading in childhood and in present time; it reduces with TV usage and it is not dependent on the Internet usage.


One of the authors (L.A.) is grateful to B.V. Dubin and M.L. Gayner for the opportunity to learn the respect to a book and a word from them.

logo small

Дата последней правки: 18 мая, 2016