Standard setting is a means of identifying cut scores on a test or education program indicating whether a student achieved planned goals for academic achievement or professional licensed tests.The tests consist of different types of item formats such as multiple choice, open-ended, essay, or Liker s...
Standard setting is a means of identifying cut scores on a test or education program indicating whether a student achieved planned goals for academic achievement or professional licensed tests.The tests consist of different types of item formats such as multiple choice, open-ended, essay, or Liker scale items, and the item formats require appropriate standard setting methods.
Although setting standards and cut scores for tests with different types of items has been examined, most of the studies explored the proper standard setting method for academic achievement tests mainly including multiple choice, open-ended, and essay items. However, in educational and psychological fields, Likert scale tests are also commonly used for surveys, and cut scores are determined simply based on the addition of the response score point in 7-point, 6-point, and 5-point scale instead of using other standard setting methods.
This study explored various standard setting methods for a
Liker scale test to prevent the problem of providing inadequate test reports and interpretations because of inappropriate cut scores. Therefore, this study not only compared three standard setting methods (Extended Angoff, Body of Work, and empirical method) for a Likert scale test but also examined the cut scores based on different response formats (7-point, 6-point, and 5-point) and item numbers(81 items and 57 ites) in the same test.
Research questions were investigated: 1) Do the cut scores from the three standard setting methods in the same response format differ from one another? 2) Do the cut scores from the three response formats within the same standard setting method differ from one another?
This study conducted three standard setting methods (extended Angoff - evaluation of item unit, body of work method – holistic approach, and Empirical method - combined scores from the response score in labels) in three different response formats(7-point, 6-point and 5-point) and item number(81items and 57items) of a Likert scale test. The data were analyzed at three stages. First, cut scores were set in the three standard setting methods. Extended Angoff and body of work methods result in cut scores from the third round of the panelist rating.
Afterwards, empirical method in which the response score were added for each performance level’s cut score was conducted. Second, the difference of the cut scores and impact data for each performance level in the three response formats were examined. Finally, the differences of the nine combinations were investigated using Factorial ANOVA.
The results showed that the cut scores in the nine combinations were different as shown in Table 1. Specifically, the impact data from body of work method appeared lower for Level 2 and higher for Level 3 than the other methods. For the response formats, 7-point response format resulted in relatively higher impact data for Level 2 and lower data for Level 3. Three-way ANOVA indicated that the differences among the nine combinations were statistically significant.
This study compared the cut scores from different standard setting methods and response formats for a Likert scale test. The findings of this study will enable test developers and psychometricians to ponder an appropriate standard setting method based on the characteristics of Likert scale tests. Additionally, by comparing similarities and differences of the results from the standard setting methods, this study will provide useful information for further studies examining standard setting methods for a Liker scale test. Most of the Likert scale tests in social science determined the cut scores by adding up the response score in each label (empirical method). For example, 5-point scale test uses 1 (strongly disagree) and 2 (disagree) as a low level and 4 (Agree) and 5 (strongly agree) as a high level. However, the empirical method of using the response score for the cut score might cause problems because items in a test contribute to the total score with different weights. This study considered the weights of the items into the standard setting so that practitioners set the cut score with care for Likert scale test.
Standard setting is a means of identifying cut scores on a test or education program indicating whether a student achieved planned goals for academic achievement or professional licensed tests.The tests consist of different types of item formats such as multiple choice, open-ended, essay, or Liker scale items, and the item formats require appropriate standard setting methods.
Although setting standards and cut scores for tests with different types of items has been examined, most of the studies explored the proper standard setting method for academic achievement tests mainly including multiple choice, open-ended, and essay items. However, in educational and psychological fields, Likert scale tests are also commonly used for surveys, and cut scores are determined simply based on the addition of the response score point in 7-point, 6-point, and 5-point scale instead of using other standard setting methods.
This study explored various standard setting methods for a
Liker scale test to prevent the problem of providing inadequate test reports and interpretations because of inappropriate cut scores. Therefore, this study not only compared three standard setting methods (Extended Angoff, Body of Work, and empirical method) for a Likert scale test but also examined the cut scores based on different response formats (7-point, 6-point, and 5-point) and item numbers(81 items and 57 ites) in the same test.
Research questions were investigated: 1) Do the cut scores from the three standard setting methods in the same response format differ from one another? 2) Do the cut scores from the three response formats within the same standard setting method differ from one another?
This study conducted three standard setting methods (extended Angoff - evaluation of item unit, body of work method – holistic approach, and Empirical method - combined scores from the response score in labels) in three different response formats(7-point, 6-point and 5-point) and item number(81items and 57items) of a Likert scale test. The data were analyzed at three stages. First, cut scores were set in the three standard setting methods. Extended Angoff and body of work methods result in cut scores from the third round of the panelist rating.
Afterwards, empirical method in which the response score were added for each performance level’s cut score was conducted. Second, the difference of the cut scores and impact data for each performance level in the three response formats were examined. Finally, the differences of the nine combinations were investigated using Factorial ANOVA.
The results showed that the cut scores in the nine combinations were different as shown in Table 1. Specifically, the impact data from body of work method appeared lower for Level 2 and higher for Level 3 than the other methods. For the response formats, 7-point response format resulted in relatively higher impact data for Level 2 and lower data for Level 3. Three-way ANOVA indicated that the differences among the nine combinations were statistically significant.
This study compared the cut scores from different standard setting methods and response formats for a Likert scale test. The findings of this study will enable test developers and psychometricians to ponder an appropriate standard setting method based on the characteristics of Likert scale tests. Additionally, by comparing similarities and differences of the results from the standard setting methods, this study will provide useful information for further studies examining standard setting methods for a Liker scale test. Most of the Likert scale tests in social science determined the cut scores by adding up the response score in each label (empirical method). For example, 5-point scale test uses 1 (strongly disagree) and 2 (disagree) as a low level and 4 (Agree) and 5 (strongly agree) as a high level. However, the empirical method of using the response score for the cut score might cause problems because items in a test contribute to the total score with different weights. This study considered the weights of the items into the standard setting so that practitioners set the cut score with care for Likert scale test.
주제어
#기준점설정
#확장된 Angoff 방법
#Body of Work
#MSLQ
※ AI-Helper는 부적절한 답변을 할 수 있습니다.