3. Test Impact and Test Design: Insights from the Syrian National Baccalaureate Examination of English
-Mai Mohamad, Madan M. Sarma & Debasish Mohapatra
Abstract Testing in the Syrian educational system has been growing in the past six years with the average number of tests that schools and colleges set every year increased three-folds. This test inflation paved the way to the birth of a ‘testocracy’ that brought about new challenges for stakeholders and test developers. Of all the tests that Syrian students take, the National Baccalaureate Examination (NBE from here onwards) is the most critical. In the present research we try to shed light on one part of this test, namely the NBE of English language. Within the broad lines of language testing, we aimed to investigate the possibility of predicting certain facets of test impact via close examination of the test template in isolation from other factors in the teaching/learning environment. Keywords:Test impact, washback, test design, high-stakes tests, EFL IntroductionThe effect that examinations have on teaching and learning in a language program is broadly known as test impact, and more specifically as ‘washback’ (Hughes 2003). For the present study, the terms ‘test impact’ and ‘test washback’ will be used interchangeably. Nowadays, test impact is doubtlessly a well-established area of research with a massive bulk of literature produced and studies conducted to investigate it since early 1990s, including; Alderson and Wall (1993); Messick (1996); Bailey (1996); Watanabe (2004); Bachman and Palmer (2010); Wall (1996, 2000, 2005,2012); and Cheng (2005,2008,2015). Before the 1990s, test impact was not considered solely apart from other issues in language testing, however, a new wave of studies and research design started to emerge from that time onwards (Wall 2000). Hughes (1993) and Bailey (1996) distinguished between washback on the participants, including learners, teachers and administration, and washback on the language program or the educational system at large. Later studies asserted that investigating test impact is rather a complicated task that requires involving all the factors surrounding the test environment in order to identify the nature and magnitude of its impact (Alderson and Wall 1993; Wall 1996, 2000; Brindley 2002; Cheng 2005). Thus, the majority of influential research in language testing was revolving around the educational factors affecting the use of tests, such as the stakes of the tests (Cheng 2005) and teacher training (Watanabe 1996); which allow for washback to occur. What has been presented in many studies (Alderson and Wall 1993; Alderson and Hamp-Lyons 1996; Cheng and Watanabe 2004 among others) is the claim that it is not an easy task to detect, describe or anticipate washback due to its being connected with multiple factors and not only limited to the test design. However, these studies did not deny the possibility of detecting washback through careful examination of the test format since this impact is inevitably present. There has been no “study that has actually examined the effects of test design and test use in order to better understand influences of testing on teaching and learning” (Xie and Andrews 2013, 50). Hence, the foci of the present study are to show how certain aspects of test impact can be predicted with careful examination of its design. Syrian National Baccalaureate Examinations of English (SNBEE)In Syria, English is one of the three foreign languages taught at schools yet enjoying higher status than French and Russian. NBE is taken for all subject, including English, at the end of grade 12. Within two hours, a student is required to answer between 40 to 50 questions divided into 11 parts, the majority of them is in multiple-choice format. English language accounts for 10.34 percent of the total grade. The student is required to score a minimum of 40 percent to pass the subject in either the basic or the make-up examination. The importance of the NBE stems from its being the only criterion that determines the students’ educational future. In 2017, 296573 students set this exam in all general and technical streams, where 170314 of them passed (57.34 percent) (SANA 2017). The Ministry of Education, Government of Syria (MoE from here onwards) issued a fixed question template which teachers are supposed to adopt and control their mock tests. This template is being treated as part of the curriculum. Teacher trainers and education advisors, who monitor teachers through regular visits to schools, request teachers to strictly follow this template and mark this adherence as an indicator of their good performance. Research objectives1- To identify the major skills and sub-skills of Syrian NBE of English test. 2- To validate the claim that undesired (negative) test impact can be predicted from this design.3- To specify the nature of this impact and indicate to what extent it might affect test fairness and quality of education. MethodologyFor this study, qualitative analysis of the test samples used from 2013 till 2017 results in specifying the skills and sub-skills being tested. Following Miles and Huberman (1994), analysis started by generating a pre-set list of codes that was revised later with new codes being added and others eliminated. Later on, codes were rearranged to fit under themes as shown in Appendix 1. Findings and discussionTest format: what are we particularly testing?In essence, there are six categories (themes) in the test templets studied (Appendix 2 is a sample of one format). As shown in Chart 1, questions are designed to test the reading skill, grammar, general-usage of lexicon, language function, translation and writing skill. Some parts of the templets test more than one skill or form of language competence, such as the follow up questions of each reading text that are designed to test the examinee’s skimming or scanning skills through finding information in the text, in addition to testing their grammatical knowledge of how to build a correct statement. Clearly, testing grammatical knowledge enjoys the lion’s share with 51.28 percent of questions, and the students’ ability to memorize vocabulary items and use them appropriately in particular contexts has the second highest percentage of questions with 25.64 percent. 17.94 percent of questions test the students’ reading skill by making them answer comprehension questions through skimming and scanning the texts given and reading for specific details. The examinee is also required to be aware of various functions of language (12.82 percent) and build upon this knowledge to express needs and wishes, analyse information, and judge the truth or falsity of statements. Regardless of the fact that translation is extensively taught and used in the Syrian language classroom (Rajab 2013), the student’s ability to interpret the meaning in two languages is measured through 2 questions. In the last question in the template, the student is asked to write a composition of at least 80 words on a given topic. Each question in the test template is treated as one unit of analysis (one unit for that triggers code(s)), as all questions are assigned relatively the same marks. Chart 1. Distribution of testing categories (themes) in the Syrian NBE of English. Test template’s impact predictedAs stated previously, test impact is recognized in literature as an inevitable shortcoming (in case of undesired impact) of language tests. Being aware of this fact, developers attempt to design tests that create the minimum negative impact possible. In the Syrian NBE of English, this impact can be foreseen in the following:1. Syrian NBE of English is incompatible with the objectives of the communicative curriculum According to the Standards for English Curriculum in Syria, students who pass grade 12 are expected to have the ability to fluently use English as a language of interaction, both orally and in writing, in order to communicate daily in the immediate target language environment (MoE 2016). The Syrian MoE assures that these standards echo the testing criteria adopted in NBE. The results of analysis, however, reveal that the foucus of testing is not the learner‘s ability to use langauge for communciaction, but their grammatical accuracy, reading skill, sumative skills and knowledge of vocabulary; a knowledge that is not enough for the learner to use language. Carroll (1961) and Oller (1973) argue that the validity of a test is determined by the quality of the language sample it includes and most importantly by its ability to anticipate the learners’ ability in practicing language skills and using them successfully in the social context desired. With the limited range of skills and sub-skills that it tests, the test template used in Syria seems to be foreign to the communicative curriculum introduced in 2004. 2. Syrian NBE of English narrows the curriculum and reinforces undesired teaching methods Although the CLT-based English curriculum was introduced in 2004 as an attempt to improve the quality of foreign language education in the country (Rajab 2013), the test design does not meet this claim. Given the course time-limitations, the stakes of the examination, and teacher’s attitude and understanding of their students’ needs, the existence of a fixed test format allows for limiting the content taught and narrowing the curriculum. To guarantee the students’ passing with good results, the test format urges teachers to select important texts for reading (based on their recurrences in previous tests), isolate important grammatical rules, list a number of topics for writing, and ignore the rest of the text book. Consequently, the most adequate teaching method that coincides with such selection is the Grammar Translation method rather than any CLT-based approach. Predicting this selection of materials could have been feasible if previous research on similar environments was taken into account. In the Sri Lankan context studied by Wall and Alderson (1993), one of the findings was that test design innovation brought about changes in the content taught with more focus on writing and reading. In other words, test formats stifle teachers’ and students’ efforts to learn beyond what is required for the test. 3. The loopholes in the Syrian NBE of English lead to the flourish of private educationThe fact that the same test format has been used for six years with questions copied from the textbooks, and the limited range of questions involved in this design encouraged the establishment of a number of cram schools to train students to pass the NBE of English ‘effortlessly’. The lack of novelty in the questions encourages students to memorize long lists of potential questions and the use of testwiseness; skills that they can master by relying on coaching outside the school. The studied test format is one of the various factors that foster distorted learning habits that curriculum developers claim to discourage. The large number of cram schools that sprouted up in the last five years in Syria and the popularity gained by shadow education over government education is not surprizing. Students find it more satisfactory to go to an institution that provides them with all the potential questions set for the a given format. According to Messick (1996) and Winke (2011) test designers and administrators are fully aware that the results of grand tests affect their reputation. Cheng, Sun and Ma (2015) assert Madaus (1955, 1988) view that policy-makers and test designers ‘manipulate’ tests in order to control what is being taught and how it is taught. In the Syrian scenario, we can assume that test designers are conscious of test impact and strive to preserve it as it helps to keep pass percentages high and compatible with social demands. This lax attitude of the Syrian MoE has definitely adverse effect on test fairness and quality of education at large. Conclusion Examining the test design of the Syrian NBE of English shows that the negative washback that can be predicted is massive to the extent that it leads to putting the test validity and reliability at stake, and also questioning the quality of education as a whole. Educational reform policy should start from the testing arena and leads to curriculum development and teacher training. Endorsing constant development and elevation in the teaching and learning spheres and neglecting testing by assuming that its success is a foregone conclusion of what comes before-hand is a deadly mistake in any language program. In addition to that, a profound and systematic change in the testing system in general, and in test design in particular must be taken into consideration in Syria. ReferencesAlderson, J.C. & Wall, D. (1993). Does washback exist? Applied Linguistics, 14: 115-29.Bachman, L., & Palmer, A. (2010). Language assessment in practice. New York: Oxford University Press.Bailey, K. (1996). Working for washback: A review of the washback concept in language testing. Language Testing, 13: 257-279.Brindley, G. (2002). Issues in language assessment. In The Oxford Handbook of Applied Linguistics. Ed. Robert B. Kaplan. New York: Oxford University Press, 459- 470. Carroll, J. B. (1961). Fundamental considerations in language testing. In Language Testing and Assessment. Ed. A. J. Kunnan. New York: Rutledge, 43- 51.Cheng, L. (2005). Changing language teaching through language testing: A washback study. Cambridge: Cambridge University Press.Cheng, L. (2008). Washback, impact and consequences. In Encyclopaedia of Language and Education. Ed. E. Shohamy and N.H. Hornberger, 2nd Ed. Language Testing and Assessment, 7. New York: Springer Science + Business Media LLC, 349-364.Cheng, L., Sun, Y. & Ma J. (2015). Review of washback research literature within Kane's argument-based validation framework. Language Testing, 48 (4): 436-470.Hughes, A. (1993). Backwash and TOEFL 2000. Unpublished manuscript, University of Reading.Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press. Madaus, G. F., (1988). The influence of testing on the curriculum. In Critical Issues in Curriculum: Eighty-Seventh Yearbook of the National Society for the Study of Education. Ed. Tanner, L.N. Chicago: University of Chicago Press, 83-121.Madaus, G. F. (1985). Public policy and the testing profession: You’ve never had it so good? Educational Measurement: Issues and Practice, 4: 5–11.Messick, S. (1996). Validity and washback in language testing. Language Testing, 13: 241–256.Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis. London: Sage.MOE. (2016). Syrian ministry of education: annual educational review (report): Part 1. Damascus: MOE Archive Department.Rajab, T. (2013). Developing whole-class interactive teaching: meeting the training needs of Syrian EFL secondary school teachers (Doctoral Dissertation). Retrieved from http://etheses.whiterose.ac.uk/id/eprint/3868. (Accessed: 12 Jan, 2016).SANA. (2017). http://www.sana.sy/?p=20108 (Accessed: 14 July, 2017).Oller, J. W. Jr. (1973). Discrete-point tests versus tests of integrative skills. In Language Testing and Assessment. Ed. A. J. Kunnan. New York: Rutledge, 60- 87.Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study. Language Testing, 10: 41–69.Wall, D. (1996). Introducing new tests into traditional systems: insights from general education and from innovation theory. Language Testing, 13: 334–357. Wall, D. (2000). The impact of high-stakes testing on teaching and learning: can this be predicted or controlled? System, 28:499-509.Wall, D. (2005). The impact of high-stakes examinations on classroom teaching: A case study using insights from testing and innovation theory. Cambridge: University of Cambridge ESOL Examinations and Cambridge University Press.Wall, D. (2012). Washback. In The Routledge Handbook of Language Testing. Eds. Glenn F. and Fred D. New York: Routledge University Press, 79-92.Watanabe, Y. (1996). Does grammar translation come from the entrance examination? Preliminary findings from classroom-based research. Language Testing, 13(3): 318–333.Watanabe, Y. (2004). Methodology in washback studies. In Washback in Language Testing. Eds. L. Cheng and Y. Watanabe. Mahwah: Lawrence Erlbaum Associates, 19-36. Winke, P. (2011). Evaluating the validity of a high-stakes ESL test: why teachers’ perceptions matter. TESOL Quarterly,45 (4): 628-660.Xie, Q. and Andrews, S. (2013). Do test design and uses influence test preparation? Testing a model of washback with structural equation modeling. Language Testing, 30 (1): 49 –70. Appendixes Appendix 1. Appendix 2. Test Sample