In today’s world information and communication technology hasalmost integrated with every human work. Computer along with the internet havebecome one of the most important tool and has sparked a revolution and made thecurrent era a digital age.The integration of computer to language learning and teaching isthe most commonly practiced mode of language education worldwide.Computer-assisted language testing (CALT) employs computer applicationseliciting and evaluating test takers’ performance in a second language.

CALTencompasses computer-adaptive testing (CAT), the use of multimedia in languagetest tasks, and automatic response analysis (Chapelle & Douglas,2006).While learning and teaching a language, especially a foreign language, becomesthe most essential part. The three main motives for using technology inlanguage testing are efficiency, equivalence, and innovation.The paper aims to highlight the detailed description of CALT alongwith its various dimensions; its application and the methods involved. It willalso throw lights on assessing English for Specific Purposes (ESP).  Since technology is a challenging task,particularly computer and language teaching and learning, it will explore thechallenges of CALT, referring to Indian Classrooms.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Key-words: CALT, Learning, Teaching, Issues and Challenges  1.Introductionto CALT            José Noijons (1994) defines CALT is an integratedprocedure in which language in which language performance is elicited andassessed with the help of a computer. CALTencompasses computer-adaptive testing (CAT), the use of multimedia in languagetest tasks, and automatic response analysis (Chapelle & Douglas, 2006).Chapelle(2010) distinguishes three main motives for using technology in languagetesting:efficiency, equivalence, and innovation. Efficiency is achieved through computer adaptive testing and analysis-basedassessment that utilizes automated writing evaluation (AWE) or automated speechevaluation (ASE) systems. Equivalence refers to research on making computerized tests equivalent to paperand penciltests that are considered to be “the gold standard” in language testing. Innovation—wheretechnology can create a true transformation of language testing—isrevealed in the reconceptualization of the L2 ability construct in CALT as “theability to select and deploy appropriate language through the technologies thatare appropriate for a situation” (Chapelle & Douglas, 2006, p.

107).Table1.1 Framework for Computer Assisted Language Tests       Attribute                                                                  Categories 1.      Directionality 2.       Delivery format 3.      Media density 4.      Target skill 5.

       Scoring mechanism 6.        7.      Stakes 8.       Purpose     9.       Response type 10.    11.

   Task type Linear, adaptive, and semi-adaptive testing Computer-based and Web-based testing Single medium and multimedia Single language skill and integrated skills Human-based, exact answer matching, and analysis-based scoring Low stakes, medium stakes, and high stakes Curriculum-related (achievement, admission, diagnosis, placement, progress) and non-curriculum-related (proficiency and screening) Selected response and constructed response   Selective (e.g., multiple choice), productive (e.g., short answer, cloze task, written and oral narratives), and interactive (e.g., matching, drag and drop)   CALT: Originand DevelopmentThe use of computer in the field of assessment and testing practicedates back to 1935 when the IBM model 805 was used for scoring objective testsin the United States of America to reduce the labour intensive and costlybusiness of scoring millions of tests taken each year.

But the year 1980is acrucial year which led to many advancements in the area of CALT.In the 1980s, as the microcomputers came within reach for manyapplied linguists and item response theory (IRT) also appeared at the same timeto make use of this new technology for innovating the existing assessment andtesting practice. In 1985, Larson and Madsen developed the first CAT at BrighamYoung University, in the USA  which wastechnologically advanced assessment measures (Dunkel, 1999). They developedlarge pool of test items for test delivery using computers. In the ComputerAdapted Test, designed by them, the program selected and presented items in asequence based on the test taker’s response to each item. If a student answeredan item correctly, a more difficult item was presented; and conversely, if anitem was answered incorrectly, an easier item was given. In short, the test”adapted” to the examinee’s level of ability.

The computer’s role was toevaluate the student’s response, select an appropriate succeeding item anddisplay it on the screen. The computer also notified the examinee of the end ofthe test and of his or her level of performance (Larson 1989: 278). Larson andMadsen’s (1985) above referred CAT served as an impetus for the constructionand development of many more computer adapted tests throughout the 1990s (e.g.

,Kaya-Carton, Carton & Dandonoli, 1991; Burston & Monville-Burston,1995; Brown & Iwashita, 1996; Young, Shermis, Brutten & Perkins, 1996)which helped language teachers in making more accurate assessment of the testtaker’s language ability and attracted many as it appeared to be of immensepotentials both for language teachers and learners. As Item Response Theory and many computer softwares, forcalculating the item statistics and providing adaptive control of itemselection, presentation and evaluation, witnessed advancements, the use ofcomputer technology in the field of language assessment and testing startedbecoming inevitable reality though the challenge of availability ofinfrastructure and the cross-disciplinary knowledge, required in the field,hampered its progress for some time at its early stage. Today the use of computer technology, in the field of languageassessment and testing, has become so widespread and so inclusive that it isregarded as the inseparable part of today’s education system.

The web of manyuseful computer adapted tests CATs as well as web based tests WBTs isconstantly growing and computers are used not only for test delivery but alsofor evaluation of complex types of test responses. Even the large testingcompanies, who showed little interest in the field at its early stage, have alsostepped in and are producing and administrating these CATs as well as WBTs. Theadministration and delivery of highly popular and useful tests such as TOEFL,IELTS, DIALANG etc.

, to mention a few, speak volumes about the role played bycomputer technology in the field of language assessment today. Prominent Testing ServicesThe realm of CALT is constantly expanding and encompassing even thefield of scoring and rating as well. Today computers are used not just to scoreobjective type of test tasks but also to assess and rate much more complex tasktypes like essays and spoken English.

The Educational Testing Service’s (seehttp://www.ets.org), automated systems known as Criterion (seehttp://www.criterion.ets.

org) and e-rater (see http://www.ets.org/erater), forrating extended written responses based on aspects of NLP analysis, VantageLaboratories’ (see http://www.vantage.

com), IntelliMetric, Pearson KnowledgeTechnologies’ (see http://www.knowledge-technologies.com) Intelligent EssayAssessor (IEA ), and Pearson’s Versant, (see http://www.

versanttest.com), acomputer-scored test of spoken English for non-native speakers, using NLPtechnology, etc. indicate how rapidly the realm of CALT is growing andreshaping, innovating and revolutionizing the field of language assessment andtesting by adapting itself successfully with the new challenges in technologyand assessment practice .Evaluation inCALTTesting and Evaluation is the most important part of languagelearning because without learning process there can be test. The systematicevaluation can be done by recognising the influence on learning of three mainperspectives (software designer, teacher and student) and taking into accountthree sets of interactions between them:·        Teacher-student:a two-way direct interaction. One of the main variables here isthe teacher’s role, which may be ‘resource provider’, ‘manager’,’coach’,’researcher’ or ‘facilitator’.·        Designer-studentPrimarily a one-way influence, although the designer’s perception of thestudent’s learning characteristics will implicitly be of help.·        Designer-teacher                    Again,primarily a one-way influence, with the designer’s perception of theteacher    having some influence.

This framework assists the evaluator to identify the key issues onwhich judgements must be made in the particular context of the proposed use(predictive evaluation) or actual use (interpretive evaluation). (Soromic,2010)CALT in ESP Classrooms            The application oftechnology in the realm of English for Specific Purposes (ESP) has gainedtremendous popularity among English as a Foreign Language (EFL) researchers andscholars (Arno, 2012; Butler-Pascoe, 2009; Jarvis, 2009; Plastina, 2003).ESP instruction is goal-oriented and based on the specific needs ofstudents (Robinson, 2003).            Corpus helps totest the communicative ability and efficiency.

Content, language, grammar andvocabulary knowledge is being assessed. The assessment of curriculum,instructional materials are constantly assessed. The most important part oftesting involves the language usage for a specific purposes, i.e. business,medical, law, science and technology, etc. and the usage of vocabulary.

Theassessment of curriculum development is the primary task.Challenges in CALTThe views regarding the current status and the futureof CALT vary slightlyamong researchers, with some being more concerned about the severity ofexistingproblems than others. Ockey (2009), for instance, believes that due to numerouslimitations and problems “CBT has failed to realize its anticipated potential”(p.836), while Chalhoub-Deville (2010) contends that “L2 CBTs, as currentlyconceived, fall short in providing any radical transformation of assessmentpractices”(p. 522). In the meantime, other researchers (e.g., Chapelle, 2010; Douglas,2010)appear to be somewhat more positive about the transformative role of CALT andstress that despite existing unresolved issues technology remains “aninescapableaspect of modern language testing” and its use in language assessment “reallyisn’t an issue we can reasonably reject—technology is being used and willcontinue to be used” (Douglas,2010,p.

139).Still, everyone seems to acknowledge the existence of challenges in CALT,maintaining that more work is necessary to solve the persisting problems. Inparticular, a noticeable amount of discussion in the literature has beendedicatedto the issues plaguing computer-adaptive testing, which, according to someresearchers, led to the decline of its popularity, especially in large scaleassessment (e.g.

, Douglas & Hegelheimer, 2007; Ockey, 2009). Of primaryconcern for CATs is the security of test items (Wainer & Eignor, 2000).Unlike a linear CBT that presents the same set of tasks to a group of testtakers, a computer adaptive language test provides different questions to testtakers. To limit the exposure of items, CATs require a signifiantly larger itempool, which makesthe construction of such tests more costly and time-consuming.Ockey (2009)suggests that one way to avoid problems associated with test takers’memorization of test items is to create computer programs that would generatequestionsautomatically.Some test developers suggest starting a CAT with easy items, whereas othersrecommend beginning with items of average diffiulty.

Additionally, no consensushas been reached on how the algorithm should proceed with the selection ofitems once a test taker has responded to the first question, nor are thereagreed-upon rules on when exactly an adaptive test should stop (Thissen &Mislevy, 2000). Nonetheless, research is being carried out to address thisissue and new methods of item selections in computer-adaptive testing such asthe Weighted Penalty Model (see Shin, Chien, Way, & Swanson, 2009) have recentlybeen proposed.Another major problem with computer-adaptive tests concerns their reductionistapproach to the measured L2 constructs. Canale (1986) was one of the first to arguethat the unidimensionality assumption deriving from the IRT models used in CATsposes a threat to the L2 ability construct, making it unidimensional as well. Theirmain argument suggests that the L2 ability construct should be multidimensionaland consist of multiple constituents that represent not only the cognitive aspectsof language use, but also knowledge of language discourse and the norms of socialinteraction, the ability to use language in context, the ability to use metacognitivestrategies, and, in the case of CALT, the ability to use technology. Hence, Chalhoub-Deville(2010) asserts that, because of the multidimensional nature of the L2 abilityconstruct, measurement models employed in CBTs must be multidimensional as well—arequirement that many adaptive language tests do not meet. Finally, theunidimensionality assumption of IRT also precludes the use of integratedlanguage tasks in computer-adaptive assessment (Jamieson, 2005).

As a result ofsome of these problems,ETS, for instance, decided to abandon the computer-adaptive mode that wasemployed in TOEFL CBT and instead return to the linear approach in the newerTOEFL iBT.The limitations of the adaptive approach prompted some researchers to movetoward semi adaptive assessment (e.g., Winke, 2006). The advantages of thistypeof assessment include a smaller number of items (compared to linear tests) andthe absence of necessity to satisfy IRT assumptions. Thus, Ockey (2009) arguesthat semi adaptive tests can be the best compromise between adaptive and linearapproaches and predicts that they will become more widespread in medium-scaleassessments.Automated scoring is another contentious area of CALT. One of the main issueswith automated scoring of constructed responses, both for writing and forspeaking assessment, is related to the fact that computers look only at alimited rangeof features in test takers’ output.

Even though research studies reportrelativelyhigh correlation indices between the scores assigned by AWE systems and humanraters (e.g., Attali & Burstein, 2006), Douglas (2010) points out that itis not clearwhether the underlying basis for these scores is the same. Specifially, heasks,”are humans and computers giving the same score to an essay but for differentreasons, and if so, how does it affect our interpretations of the scores?”(Douglas,2010, p. 119). He thus concludes that although “techniques of computer-assistednatural language processing become more and more sophisticated, . .

. we arestill some years, perhaps decades, away from being able to rely wholly on suchsystems in language assessment” (Douglas, 2010, p. 119). Since machines do notunderstand ideas and concepts and are not able to evaluate the meaningfulwriting, critics contend that AWE “dehumanizes the writing situation, discountsthe complexity of written communication” (Ziegler, 2006, p. 139) and “strikes adeath blow to the understanding of writing and composing as a meaning-makingactivity” (Ericsson, 2006, p. 37).

Automatic scoring of speaking skills is even more problematic than that ofwriting. In particular, speaking assessment involves an extra step whichwritingassessment does not have: recognition of the input (i.e., speech). Unlikewriting assessment, the assessment of speaking also requires the evaluation ofsegmental features (e.g.

, individual sounds and phonemes) and suprasegmentalfeatures (e.g., tone, stress, and prosody). Since automated evaluation systemscannot perform at the level of human raters and cannot evaluate coherence,content, and logic the way humans do. Other challenges faced by CALT arerelated to task types and design, namelythe use of multimedia and integrated tasks.

Although the use of multimediainputis believed to result in a greater level of authenticity in test tasks byprovidingmore realistic content and contextualization cues, it remains unclear how theinclusion of multimedia affects the L2 construct being measured by CBTs(Jamieson, 2005). Some researchers even question the extent to which multimediaenhances the authenticity of tests (e.g.

, Douglas & Hegelheimer, 2007)since comparative studies on the role of multimedia in language assessment haveyieldedmixed results (see Ginther, 2002; Wagner, 2007; Suvorov, 2009). With regards tointegrated tasks, their implementation in CBTs is generally viewed favourablybecause such tasks seem to better reflct what test takers would be required todoin real-life situations. The use of integrated tasks is therefore believed toincreaseauthenticity of language tests (Fulcher & Davidson, 2007). However, Douglas(2010) warns that the interpretation of integrated tasks can be problematicbecause,if the test taker’s performance is inadequate, it is virtually impossible to findoutwhether such performance is caused by one of the target skills or theircombination. This concern appears to be more relevant in high stakes testingthan in lowstakes testing.Conclusion            To sum up, allthe negative aspects and caveats associated with CALT mentioned so far areworthy of concern and research but they should not lead to the suspiciontowards CALT. Technology can be instrumental in expansion and innovation inlanguage testing.

Since its advent, CALT has changed and innovated the existingtesting practices, to make them in line with the needs of the 21st centurye-generation of second language learners by making them more flexible,innovative, individualized, efficient and fast. The realization of thesebenefits embedded in it and their implications, is making it integral part oftoday’s education system to make testing practice more flexible, innovative,dynamic, efficient and individualized as well as to enhance the quality andstandard of education. In the form of CALT, we are witnessing theseopportunities for the reflections and need to capitalize on.References:Alderson, J. C.

(1988). Innovations in language testing: Can the microcomputer help? SpecialReport No 1 Language Testing Update. Lancaster, UK: University of Lancaster. Alderson, J.

C. (1990). Learner-centeredtesting through computers: Institutional issues in individual assessment in J.

de Jong & D. K. Stevenson (eds.) Individualizing the assessment of languageabilities. Clevedon, UK: Multilingual Matters.

Alderson, J. C.(2000). Assessing reading. Cambridge: Cambridge University PressBurston, J.& Monville-Burston, M. (1995).

Practical design and implementationconsiderations of a computer-adaptive foreign language test: The Monash/Melbourne French CAT. CALICO Journal, 13(1), 26-46Carol A. Chapelleand Dan Douglas (2006) Assessing language through computer technologyCambridge: Cambridge University Press. Center for Applied Linguistics:www.

cal.org (Accessed 10 June 2012).Howell, S.L.and Hricko, M. (eds.

): 2006, Online Assessment and Measurement: Case Studiesfrom Higher Education, K-12 and Corporate. Idea Group, Hershey, PANoijons, J.(1994). Testing computer assisted language tests: Towards a checklist for CALT.CALICO Journal, 12(1), 37-58.Pearson:www.market-leader.net), www.

ecollege.com, & www.myenglishlab.com (Accessed15 June 2012)Reid, J.

(1986). Using the Writer’s Workbench in composition teaching and testing. In C.Stansfield (ed.

), Technology and language testing (pp. 167—88). Washington, DC:TESOL Publications.  

x

Hi!
I'm Erica!

Would you like to get a custom essay? How about receiving a customized one?

Check it out