

That degree of difficulty within a level needs to be taken into account when building the test. The result of that effort will show that not all intermediate-low items are created equal with some items at the same level being harder than others. The analysis will put each item on a spectrum from easy to hard. The more consistently an item performs this way the better it is at differentiating the test taker’s language skill. In other words, if it is an intermediate-low item, novice-level test-takers will consistently get it wrong, and intermediate and above test-takers will get it correct.

If the item is a good one, the analysis will confirm that it consistently discerns the accurate level of the test taker. The analysis is conducted on data from a number of test-takers, who ideally have a wide range of skill levels. This process is called psychometric analysis. For computer-scored questions (items) in reading and listening, a test developer needs to conduct a statistical analysis of the items. Simply stated, reliability means that if you give the same test to the same student s/he will get the same score.
