von Davier A.A.,Educational Testing Service
Psychometrika | Year: 2013
In this paper, an overview of the observed-score equating (OSE) process is provided from the perspective of a unifying equating framework (von Davier in von Davier (Ed.), Statistical models for test equating, scaling, and linking, Springer, New York, pp. 1-17, 2011b). The framework includes all OSE approaches. Issues related to the test, common items, and sampling designs and their relationship to measurement and equating are discussed. Challenges to the equating process, model assumptions, and approaches to equating evaluation are also presented. The equating process is illustrated step-by-step with a real data example from a licensure test. © 2013 The Psychometric Society. Source
Educational Testing Service | Date: 2015-03-26
Systems and methods described herein automate imposture detection in, e.g., test settings based on voice samples. Based on user instructions, a processing system may determine at least one set of appointments, each having voice samples and a voice print, and a comparison plan for comparing the appointments. The comparison plan defines a plurality of appointment pairs. For each appointment pair, the system compares the associated first and second appointments by, e.g., comparing the first appointments voice samples to the second appointments voice print and generating corresponding raw scores, which may be used to compute a composite score. If the composite score satisfies a predetermined threshold condition for fraud, the system may determine whether flagging/holding criteria are satisfied by the raw scores. If the criteria are satisfied, a flag or hold notice may be associated with the appointment pair to trigger an appropriate system/human response (e.g., withholding the appointments test results).
Educational Testing Service | Date: 2015-06-01
Systems and methods are provided for identifying one or more target words of a corpus that have a lexical relationship to a plurality of provided cue words. The cue words and statistical lexical information derived from a corpus of documents are analyzed to determine candidate words that have a lexical association with the cue words. The statistical information includes numerical values indicative of probabilities of word pairs appearing together as adjacent words in a well-formed text or appearing together within a paragraph of a well-formed text. For each candidate word, a statistical association score between the candidate word and each of the cue words is determined. An aggregate score for each of the candidate words is determined based on the statistical association scores. One or more of the candidate words are selected to be the one or more target words based on the aggregate scores.
Educational Testing Service | Date: 2015-03-06
Systems and methods are provided for a computer-implemented method of providing a score that measures an essays usage of source material provided in at least one written text and an audio recording. Using one or more data processors, a determination is made of a list of n-grams present in a received essay. For each of a plurality of present n-grams, an n-gram weight is determined, where the n-gram weight is based on a number of appearances of that n-gram in the at least one written text and a number of appearances of that n-gram in the audio recording, and an n-gram sub-metric is determined based on the presence of the n-gram in the essay and the n-gram weight. A source usage metric is determined based on the n-gram sub-metrics for the plurality of present n-grams, and a scoring model is used to generate a score for the essay based on the source usage metric.
Agency: NSF | Branch: Continuing grant | Program: | Phase: DISCOVERY RESEARCH K-12 | Award Amount: 1.37M | Year: 2016
There is widespread recognition in educational literatures that academic discourse is important for supporting students developing understanding in the disciplines of science and mathematics. College and career-ready standards also call for attention to supporting students learning of how to think and communicate like disciplinary experts. The teaching practice of orchestrating classroom discussion is intended to support students in obtaining higher levels of academic achievement but also to support students participation in a democratic society. However, research has found that teachers--particularly novice teachers--struggle to orchestrate discussion effectively for science and mathematics. The investigators of this project hypothesize that opportunities to 1) practice orchestrating discussions in simulated classroom environments; 2) receive constructive feedback on their practice; and 3) reflect on that feedback and their experiences with peers and teacher educators, develops preservice teachers abilities to lead productive classroom discussion. This may allow them to be more effective at orchestrating discussion when they begin teaching real students in science and mathematics classrooms. The project team, which includes investigators from Educational Testing Service (ETS) and software engineers at Mursion, will develop, pilot, and validate eight discussion-oriented performance tasks that will be embedded in an online simulated classroom environment. The resulting research and development products could be used nationwide in teacher preparation and professional development settings to assess and develop teachers ability to support classroom discussion in science and mathematics.
The Discovery Research K-12 (DRK-12) program seeks to significantly enhance the learning and teaching of science, technology, engineering and mathematics (STEM) by preK-12 students and teachers, through research and development of innovative resources, models, and tools. Projects in the DRK-12 program build on fundamental research in STEM education and prior research and development efforts that provide theoretical and empirical justification for proposed projects. This Early Stage Design and Development project will 1) iteratively develop, pilot, and refine eight science and mathematics discussion-oriented performance tasks (six formative, two summative), scoring rubrics, and rater training materials; 2) deploy the intervention in four university sites, collecting data from 240 prospective teachers in both treatment and business-as-usual courses; and 3) use data analyses and expert review to build a five-part argument for the validity of the assessment and scoring rubrics. Data sources include prospective teachers background and demographic information, cognitive interviews, surveys, scores on content knowledge for teaching (CKT) instruments, performance and scores on the developed performance tasks, discussion scores on Danielsons Framework for Teaching observation protocol, and case study interviews with prospective teachers. The project team will also conduct interviews with teacher educators and observe classroom debrief sessions with prospective teachers and their teacher educators. The research will examine each teachers scores on two summative performance tasks administered pre- and post-intervention and will look for evidence of growth across three formative tasks. Linear regression models will be used to understand relationships among teachers CKT scores, pre-intervention performance task scores, group assignment, and post-intervention performance task scores. A grounded theory approach to coding qualitative data of 24 case study teachers, observations of debrief sessions, and interviews with teacher educators will generate descriptive use cases, illustrating how the tools can support prospective teachers in learning how to facilitate discussions focused on science and mathematics argumentation. Mursion will develop a webpage on its website dedicated to this project that will allow the team to post the new performance-based tasks, scoring rubrics, and examples of performance in the simulated environment for teacher educators, educational researchers, and policy makers and collect feedback from them that can be used as another information source for refining tools and their use. Research findings will also be disseminated by more traditional means, such as papers in peer-reviewed research and practitioner journals and conference presentations.