Elephant Delta day 1 – Professor Tim Dunne from UCT on The Rasch Model for test outcomes and related item requirements

Blogging from The Tenth Southern Hemisphere Conference on the Teaching and Learning of Undergraduate Mathematics and Statistics

Prof Tim Dunne – UCT (photograph taken from the Elephant Delta website)

Professor Dunne will be discussing the Rasch model, some information of which can be found here. Quoting from wikipedia:
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between (a) the respondent’s abilities, attitudes or personality traits and (b) the item difficulty.
Live blogging: Note that these are notes I’ve taken live, but will edit this today into a more readable format. I want to put this up straight away though to see if I have any obvious misunderstanding. Equations will also be put into more readable format ASAP.

David Andrich pioneer of the Rasch method.

Intelligence doesn’t seemed to be well explained in practice by a bell-shaped curve – what is the measure of intelligence?

The Rasch model seems to be a better match to reality – because we don’t have true measurement, but only really ordinal data.

A test tries to distinguish between different levels of ability.

Test validity: formative (as students are learning) and summative (what has been learned) assessment. Need an explicit scoring scheme

Scoring schemes must be revisable. Need to have an adequate spread of items.

N persons and K test items (want 5K<N<15K).

Smaller N can be instructive

Large N sharpens item precision in high stakes

Large K sharpens person precision

We should be using the Gamma statistic for ordinal 2-way tables.

Gamma=(P-Q)/(P+Q)

Gamma=1: perfect correlation

Gamma=-1: perfect anti-correlation.

Can check the Gamma statistics of one item with another item, or with all other items. Prefer Gamma closer to 1. But that means that one or other element is redundant.

Data array for 0,1 items: item versus person (correct=1, incorrect=0). Can order by easy items and difficult items. Can order by achievement of individuals.

Distinguish people into groups based on how many items they got correct.

Adding a new person may not follow the pattern of all other people: They might get easy questions wrong and hard questions right.

There are likely to be elements which violate the Guttman pattern.

Look at patterns for individual persons: 16 items, people who got 8 right:

1110110110100000: Modelled/Ideal (a little noise)

1111111100000000: Guttman/Deterministic

0000000011111111 Check for miscode

1010101010101010: Check from miskey

Nature of Rasch model: Probaility model for a complete context and its internal inferences, not a random sample.

Context: Persons and original item responses

Inferences are internal findings

use subsets of items or subsets of respondents

Purpose: give us something that is measurement like to escape ordinality.

People are so used to thinking of marks as units. An arithmetic of scores is not inherently valid.

How can we measure change? How can we use context to make diagnostic input.

Rasch model also works for partial credit scores (not just 0/1 for each item).

The model claims the outcome between a person and the item is governed by ability of candidate+difficulty of item, and nothing more. This must hold for every item in the test.

This means that we are fair across all variables. (race, gender, language, etc.) – not clearly true.

Contrast Rasch model with Item Response theory.

Measurement in science is possible because units make sense. We measure one attribute at a time. The instrument must interact in such a way as to interact with that element in an invariant fashion: Ratio comparison.

The human universe is less well-ordered. Variables are unobservable but partially accessible to comparative probes.

Ordinal scales of measurement do not support the mathematical operations needed to calculate means and standard deviations.

All person-item scores $x_{ni}$ (person n and question i) result from interaction of only person ability and item difficulty.

Data array is NxK

N person totals $T_n=\sum^K x_{ni}$ from 0 to $max=\sum^K m_i$ where $m_i$ is the maximum score for question $i$ .

$T_n$ is a sufficient statistic for person performance between 0 and max

The $(m_i+1)$ item scores: 0 to $m_i$ reflect increasing difficulties and $m_i$ thresholds for all N persons.

Allow ourselves to doctor the data to fit a measurement procedure.

Rasche measurement(-like ) models:

Want to be able to locate on a single continuum all the items ordered by difficulty level. Want to be able to relate all the persons on a single dimension from the least to the most able. Want that all distance between different locates in the 2d table should be meaningful.

The model claims that the outcome of an encounter between a person and an item is governed by the product $A.E$ of the ability $A=e^\beta$ and the easiness $E=e^{-\delta}$

$\pi_{ni}=\frac{e^beta_n}{e^delta_i+e^beta_n}$

is conditional prob of a correct item if item $i$ at local $\latex beta_n$ in terms of ability.

difference between person ability and difficulty is all that matters:

$\pi_{ni}$ monotonic in $\beta_n$ and $\delta_i$

This is a bernoulli process.

$mean=\pi_{ni}$ and $variance=\pi_ni(1-\pi_{ni}).$

$\ln\frac{\pi_{ni}}{1-\pi_{ni}}=\beta_n-\delta_i=\ln \frac{\pi_{ni1}}{\pi_{ni0} Uni-dimensional item locations and person locations. Want the item and persons to be on the same scale. The origin is the mean of the item locations. if$ \beta_n-\delta_i=0$ then it’s 50/50, at -3 the prob is 5%, at 3 it’s 95%.

Note non-gaussian probabilities

Rasch Rating scale model:

One can add in the fact that a single item can have partial credit: Essentially look at the likelihood of getting different partial scores based on item difficulty and person’s ability. Can check that an item is or isn’t working in terms of distributions of partial credit scores for a given question.

RUMM program:

Items which everyone gets right or wrong are eliminated. Persons with 0 or perfect scores are eliminated.

Results can lead to classroom or individual interventions.

Are there features of Simpson’s paradox?

Can differentiate given items with respect to external variables: eg. language.

How clear is this post?

About the Author: Jonathan Shock

Leave a Reply Cancel reply

Categories

Recent Posts