Language & Cognition Lab IITK

Resources

Ongoing studies open for participation

If you require any assistance or guidance to participate in these experiments, please feel free to reach out to the concerned experimenter.

ShabdGyaan: Hindi Vocabulary Game [ link ] [contact: Niket]
Word Rating Survey [ link ] [contact: Vivek]
Indian Pictures Database [ link ] [contact: Irfan]

Our research team has developed innovative tools and resources that can significantly enhance your investigations:

ENRO - ENglish Reading Online (collaboration with Victor Kuperman and Noam Siegelman): Dataset of reading and listening comprehension in English as a second language (L2), [OSF]; [paper]

MECO - Multilingual Eyetracking COrpus (in collaboration with Victor Kuperman and Noam Siegelman): International database of eye movements during text reading in English and Hindi, alongwith subjective and objective measures of language comprehension, [OSF]; [paper 1], [paper 2].

Hindi Word Familiarity Norms - The Hindi Word Familiarity Norms provide subjective familiarity ratings for a large set of Hindi words. These ratings indicate how commonly known or frequently encountered a word is by Hindi speakers. The resource is intended to support research on Hindi word recognition, lexical processing, psycholinguistics, cognitive science, and computational language studies. *Coming Soon*

Hindi Age of Acquisition Norms - The Hindi Age of Acquisition Norms provide estimated ages at which Hindi speakers typically learn or acquire words. These norms help researchers examine how early or late acquired words influence language processing, word recognition, reading, and memory. This resource is useful for psycholinguistics, developmental, cognitive, and language-related research. *Coming Soon*

Hindi Word Concreteness Norms - The Hindi Word Concreteness Norms provide subjective ratings of how concrete or abstract Hindi words are perceived to be. These ratings help distinguish words referring to tangible objects, actions, or experiences from more abstract concepts. The resource is intended to support research in psycholinguistics, cognitive science, language processing, and computational modeling. *Coming Soon*

IndiEn - A YouTube Comments based Indian English Corpus *Coming Soon*

Indian Pictures Database - A standardized set of 300 pictures sourced from the Indian cultural setting, that we aim to make freely available for research with the Indian population across various fields such as psycholinguistics, cognitive psychology, neuropsychology etc. These pictures have been drawn by one of associates and are available both in color and in line-drawing format.

We plan to get the norms for these stimuli in 6 Indian languages namely – Hindi, Gujrati, Punjabi, Telugu, Kannada, and Marathi. Further, the stimulus set shall be rated on 7 dimensions namely – object familiarity, image agreement, visual complexity, L1 (Indian language) name, L1 age of acquisition, L2 (English language) name and L2 age of acquisition. *Coming Soon* For any assistance or any inquiries, please contact Dr. Ark Verma (arkverma@iitk.ac.in)

ShabdGyaan: Hindi Vocabulary Game

[Link]

Large-scale crowdsourced vocabulary data collected through an online game. Designed to estimate Hindi vocabulary knowledge in the general population. For inquiries, contact Niket Agrawal (niket_net@rocketmail.com).

HiLex (Hindi lexical proficiency test)

[Link]

A quick proficiency test using a lexical decision task in L1 and L2 Hindi speakers. Designed for psycholinguistic researchers working on Hindi. For inquiries, contact Niket Agrawal (niket_net@rocketmail.com).

Shabd (Newspaper-based psycholinguistics corpus)

[Link]

Shabd is a psycholinguistic database for Hindi that provides normative lexical and psycholinguistic information. It includes measures such as word frequency, word length, number of aksharas, number of matras, total length based on aksharas and matras, number of phonemes, number of syllables, phoneme IPA transcription, POS tags with POS frequencies, contextual diversity, and orthographic Levenshtein distance (OLD20). These measures are intended to support linguistic, psycholinguistic, and cognitive research.

For any assistance, clarification, or collaboration inquiries, please contact Dr. Ark Verma (arkverma@iitk.ac.in) or Vivek Sikarwar (sikarwar@iitk.ac.in).

Not-a-shabd (Indian languages pseudoword generator)

[Link]

A pseudoword generator for producing nonwords based on Hindi bigram frequency distributions, intended for use in psycholinguistic experiments. For inquiries, contact Niket Agrawal (niket_net@rocketmail.com).

Shabd-2.0: A Social Media-Based Psycholinguistic Database for Hindi

*Coming Soon*

Shabd-2.0 is a social media-based psycholinguistic database for Hindi, developed from Hindi YouTube comments and Twitter data. It provides lexical and psycholinguistic information derived from naturally occurring social media language. The database includes measures such as word frequency, contextual diversity, word length, number of aksharas, number of matras, total length based on aksharas and matras, number of phonemes, number of syllables, phoneme IPA transcription, and orthographic Levenshtein distance (OLD20). These measures are intended to support research on Hindi language processing, digital communication, psycholinguistics, and cognitive science.

For any assistance, clarification, or collaboration inquiries, please contact Dr. Ark Verma (arkverma@iitk.ac.in) or Vivek Sikarwar (sikarwar@iitk.ac.in).

Hindi English EyeTracking Corpus (HEET)

*Coming Soon*

We aim to develop a large-scale Hindi–English bilingual self-paced reading corpus with eye-tracking data from participants with diverse language backgrounds. The dataset will include annotations at the word, trial, participant, and language levels, along with subjective ratings and objective measures of language comprehension, including reading ability, spelling, vocabulary, and comprehension. The corpus will also enable if key reading measures, such as first fixation duration, gaze duration, total fixation duration, number of fixations, refixations, saccade length, and skipping rate differ between Hindi (an abugida, nonlinear script) and English (an alphabetic, linear script).

For inquiries, contact Niket Agrawal (niket_net@rocketmail.com).

Hindi Lexicon Project (HiLP)

*Coming Soon*

The Hindi Lexicon Project is a large-scale lexical resource developed to support research on Hindi word recognition and language processing. It provides behavioral and psycholinguistic data for Hindi words, including lexical decision task measures such as reaction time and accuracy, along with word-level variables such as frequency, length, familiarity, concreteness, and other psycholinguistic features. This resource is intended to aid research in psycholinguistics, cognitive science, linguistics, and computational language studies.

For any assistance, clarification, or collaboration inquiries, please contact Dr. Ark Verma (arkverma@iitk.ac.in) or Vivek Sikarwar (sikarwar@iitk.ac.in).

Google Sites

Report abuse