The series consists of three 2-week workshops spread over different time-periods (mostly during summer and winter vacations). The first workshop will be of the ‘basic’ level, second of ‘intermediate’ and third of ‘advance’ level. There are no pre-requisites for the ‘basic’ level and anybody interested in the course may enrol. For ‘intermediate’ level, a basic proficiency in the language will need to be demonstrated by the participants. Similalry for ‘advance’ level, an intermediate proficiency will need to be demonstrated. The first workshop will be of only ‘basic’ level, followed by workshops on ‘basic’ and ‘intermediate’ level and all the subsequent levels will host workshops at either any two or all the three levels.
The target audience for the ‘basic’ workshop will be linguists working in any sub-field. It will cover the basic concepts of programming and then move on discussing the methods of collecting data (mainly from the web) and using it for basic analysis.
The ‘intermediate’ workshop will be targeted towards researchers and students who would like to use statistical methods in their research. It will focus on statistical analysis of the data with a substantial amount of time devoted to exploring and understanding ‘pandas’, the Python library for data science. It will also discuss reading and writing files in different formats like formatted text (.docx), spreadsheet, JSON, CSV, PDF, XML, etc using different libraries and developing web applications.
The ‘advance’ workshop will be most suitable for researchers and students interested in the field of computational linguistics as it will largely focus on machine learning toolkits like scikit-learn and PyTorch (for deep learning) and development of different kinds of NLP applications.