Test Development – Steps, Constructs, and Item Analysis
Test development defines the process of establishing conceptual frameworks that describe the skills and knowledge base to be assessed using the test (Irwing & Hughes, 2019). The test development process follows a five-stepped approach, which includes test conceptualization, test construction, test tryout, test analysis, and test revision (Cohen et al., 2022).
Integral to test development is the ability of the developers to make a good test. This can be attained by making the test easy to read, repeatable, and purposeful. Purposeful tests are particularly useful in clinical practice as they add to the already available clinical knowledge. It also measures what it claims to measure; if a test was developed for depression, it should contain parameters aligned with the specific aspects and features of depression. A good test also has a clear scoring method and provides valid and relatable instructions (Cohen et al., 2022). In this respect, it should be able to assert a reason for failure. For instance, a test for depression should test when the disorder is positive or negative. A good test also has high validity and reliability (Cohen et al., 2022).
Validity is the extent to which a test measures what it was meant for (Erlinawati & Muslimah, 2021). Reliability, on the other hand, is the extent to which a test is consistent in measuring what it was meant for (Matheson, 2019). Developers can ensure test validity by making the test specific. Test specificity ensures that the test correlates with the objective at hand. This can be achieved by using frameworks and tools that suit their needs. Vague or ambiguous tests can result in confusion and subsequent false positives. Developers can also ensure test validity by utilizing the FIRST principles when developing the test. The FIRST principle is a set of guidelines for developing good tests and includes fast, independent, repeatable, self-validating, and timely. Consistently, developers can ensure test reliability by ensuring the test items are formulated on the same theoretical frameworks and can be measured in the same way. Reliability can also be guaranteed by clearly defining the criteria of what is being measured (Dou et al., 2019).
There are several cultural, environmental, and ethical considerations when creating a test. Test development should consider apparent cultural variations in the audiences to which the test is to be applied. Maintaining cultural awareness by tailoring the language towards simplicity and making the test expressive can considerably increase the applicability of the test across groups (Baugh & Baugh, 2021). Test development should also be cognizant of the environmental variations of where the test will be used. A good test should be usable in a dynamic environment. A good test should also be able to conform to the normative principles of justice and beneficence (Varkey, 2020). In this respect, the test should be usable to all populations, and its findings should be able to influence positive changes in the population on which it is used.
The norming process for a test can either be through norm referencing or criterion referencing. In norm referencing, the scores of the test taker are compared with the scores of the norming group. The norming group, in this respect, can be multivariate and often correspond to the population under scrutiny. Norm referencing seeks to rank individuals about others within a group. It is based on the assumption that the majority in the group will score in the average range and that only a few will score above or below the average range. A test can be normed in a population by administering the test to larger populations. An example of a norm reference test is the IQ test. In this case, there are no set criteria for passing the test, and an individual’s score will be compared against an average score. The criterion reference test differs from norm referencing in diverse aspects. In criterion-referenced tests, the test taker’s scores are compared against a set of criteria. This standard is often predetermined and known by both the tester and the test taker before the test process.
Subsequently, the self-reported test compares with administered tests in several ways. Both are useful in assessing a particular phenomenon. Self-administered tests are data collection tools executed by the person to whom a phenomenon is being tested. Self-reported tests can either be open-ended self-descriptions, direct self-ratings, or indirect self-reports. The self-reported tests have many strengths. To begin with, they are efficient, as most self-reported tests can be executed in less than 15 minutes and allow for the administration of a large number of tests within a short time. It is also cost-effective as it allows for broad coverage with limited resources. Notwithstanding, self-reported tests are susceptible to deception (Zimmerman, 2024).
On the other hand, administered tests are executed by another party and on the persons to whom a phenomenon is being tested. A strength of an administered test is that the persons administering the test can sense deceptions and assess intrapersonal factors that may be influencing responses. It also allows the test taker to seek clarification and feedback on the test process. It is, however, time-consuming and may be less cost-effective.
References
Baugh, R. F., & Baugh, A. D. (2021). Cultural influences and the objective structured clinical examination. International Journal of Medical Education, 12, 22–24. https://doi.org/10.5116/ijme.5ff9.b817
Cohen, R. J., Schneider, W. J., & Tobin, R. M. (2022). Psychological testing and assessment: An introduction to tests and measurement. McGraw Hill.
Dou, H., Zhao, Y., Chen, Y., Zhao, Q., Xiao, B., Wang, Y., Zhang, Y., Chen, Z., Guo, J., & Tao, L. (2019). Development and testing of the reliability and validity of the Adolescent Haze Related Knowledge Awareness Assessment Scale (AHRKAAS). BMC Public Health, 18(1). https://doi.org/10.1186/s12889-018-5638-8
Erlinawati, E., & Muslimah, M. (2021). Test validity and reliability in learning evaluation. Bulletin of Community Engagement, 1(1), 26. https://doi.org/10.51278/bce.v1i1.96
Irwing, P., & Hughes, D. J. (2019). Test development. The Wiley Handbook of Psychometric Testing, 1–47. https://doi.org/10.1002/9781118489772.ch1
Matheson, G. J. (2019). We need to talk about reliability: Making better use of test-retest studies for study design and interpretation. PeerJ, 7. https://doi.org/10.7717/peerj.6918
Varkey, B. (2020). Principles of clinical ethics and their application to practice. Medical Principles and Practice, 30(1), 17–28. https://doi.org/10.1159/000509119
Zimmerman, M. (2024). The value and limitations of self‐administered questionnaires in Clinical Practice and epidemiological studies. World Psychiatry, 23(2), 210–212. https://doi.org/10.1002/
ORDER A PLAGIARISM-FREE PAPER HERE
We’ll write everything from scratch
Question
What are the steps to developing a test?
Identify a few types of test constructs and identify the advantages and disadvantages of each.
How are items developed (e.g., item analysis)?
What are norms?
Development and Assessment of the Psychometric Properties of a Compassionate Care Questionnaire for Nurses opens in new window
This resource provides medical professionals’ perspectives on the concept of compassionate care.