Blogging from The Tenth Southern Hemisphere Conference on the Teaching and Learning of Undergraduate Mathematics and Statistics
Professor Dunne will be discussing the Rasch model, some information of which can be found here. Quoting from wikipedia:
The Rasch model, named after Georg Rasch, is a psychometric model for analyzing categorical data, such as answers to questions on a reading assessment or questionnaire responses, as a function of the trade-off between (a) the respondent’s abilities, attitudes or personality traits and (b) the item difficulty.
Live blogging: Note that these are notes I’ve taken live, but will edit this today into a more readable format. I want to put this up straight away though to see if I have any obvious misunderstanding. Equations will also be put into more readable format ASAP.
David Andrich pioneer of the Rasch method.
Intelligence doesn’t seemed to be well explained in practice by a bell-shaped curve – what is the measure of intelligence?
The Rasch model seems to be a better match to reality – because we don’t have true measurement, but only really ordinal data.
A test tries to distinguish between different levels of ability.
Test validity: formative (as students are learning) and summative (what has been learned) assessment. Need an explicit scoring scheme
Scoring schemes must be revisable. Need to have an adequate spread of items.
N persons and K test items (want 5K<N<15K).
Smaller N can be instructive
Large N sharpens item precision in high stakes
Large K sharpens person precision
We should be using the Gamma statistic for ordinal 2-way tables.
Gamma=(P-Q)/(P+Q)
Gamma=1: perfect correlation
Gamma=-1: perfect anti-correlation.
Can check the Gamma statistics of one item with another item, or with all other items. Prefer Gamma closer to 1. But that means that one or other element is redundant.
Data array for 0,1 items: item versus person (correct=1, incorrect=0). Can order by easy items and difficult items. Can order by achievement of individuals.
Distinguish people into groups based on how many items they got correct.
Adding a new person may not follow the pattern of all other people: They might get easy questions wrong and hard questions right.
There are likely to be elements which violate the Guttman pattern.
Look at patterns for individual persons: 16 items, people who got 8 right:
1110110110100000: Modelled/Ideal (a little noise)
1111111100000000: Guttman/Deterministic
0000000011111111 Check for miscode
1010101010101010: Check from miskey
Nature of Rasch model: Probaility model for a complete context and its internal inferences, not a random sample.
Context: Persons and original item responses
Inferences are internal findings
use subsets of items or subsets of respondents
Purpose: give us something that is measurement like to escape ordinality.
People are so used to thinking of marks as units. An arithmetic of scores is not inherently valid.
How can we measure change? How can we use context to make diagnostic input.
Rasch model also works for partial credit scores (not just 0/1 for each item).
The model claims the outcome between a person and the item is governed by ability of candidate+difficulty of item, and nothing more. This must hold for every item in the test.
This means that we are fair across all variables. (race, gender, language, etc.) – not clearly true.
Contrast Rasch model with Item Response theory.
Measurement in science is possible because units make sense. We measure one attribute at a time. The instrument must interact in such a way as to interact with that element in an invariant fashion: Ratio comparison.
The human universe is less well-ordered. Variables are unobservable but partially accessible to comparative probes.
Ordinal scales of measurement do not support the mathematical operations needed to calculate means and standard deviations.
All person-item scores (person n and question i) result from interaction of only person ability and item difficulty.
Data array is NxK
N person totals from 0 to where is the maximum score for question .
is a sufficient statistic for person performance between 0 and max
The item scores: 0 to reflect increasing difficulties and thresholds for all N persons.
Allow ourselves to doctor the data to fit a measurement procedure.
Rasche measurement(-like ) models:
Want to be able to locate on a single continuum all the items ordered by difficulty level. Want to be able to relate all the persons on a single dimension from the least to the most able. Want that all distance between different locates in the 2d table should be meaningful.
The model claims that the outcome of an encounter between a person and an item is governed by the product of the ability and the easiness
is conditional prob of a correct item if item at local $\latex beta_n$ in terms of ability.
difference between person ability and difficulty is all that matters:
monotonic in and
This is a bernoulli process.
and
\beta_n-\delta_i=0$ then it’s 50/50, at -3 the prob is 5%, at 3 it’s 95%.
Note non-gaussian probabilities
Rasch Rating scale model:
One can add in the fact that a single item can have partial credit: Essentially look at the likelihood of getting different partial scores based on item difficulty and person’s ability. Can check that an item is or isn’t working in terms of distributions of partial credit scores for a given question.
RUMM program:
Items which everyone gets right or wrong are eliminated. Persons with 0 or perfect scores are eliminated.
Results can lead to classroom or individual interventions.
Are there features of Simpson’s paradox?
Can differentiate given items with respect to external variables: eg. language.
Leave a Reply