Statistical Discussion - Comparing The Validity And Reliability Of Tests

Statistical Discussion – Comparing the Validity and Reliability of Two Tests

Question One

Test A appears more consistent among the two samples, given its reliability and validity evidence. It has good test-retest reliability (.81) and internal consistency reliability for the total score (.75), with subscales ranging from moderate to good as well (.57 to .78). Test B is also reliable but slightly less so, for instance, test-retest reliabilities of each scale range from .65 to .71 and the internal consistency within each scale range from .71 to .77. Furr (2018) explains that anything above 0.80 for test-retest and 0.70 for internal consistency indicates suitable reliability in research context. In terms of validity, Test A appears to have more evidence supporting various aspects of validity. Its content validity is strong with a clearly documented test development process. It has evidence of convergent validity with correlation to an established instrument, discriminant validity through lack of correlation with depression inventory, and structural validity demonstrated through factor analysis. Test B, on the other hand, relies on some measures of convergent validity, while the structural aspect is based only on theory but not empirical analysis alone. Discriminant correlation is also slightly less strong in Test B than in Test A.

Question Two

Other details that can enhance the final decision include norm and standardization sample specifics for both tests to see if they are fit for the target population or purpose; greater knowledge of factor structure and how subscales relate to the construct on a global level for Test A; effect sizes and statistical significance testing, which is used in determining key validity coefficients; as well as information on social desirability or faking assessment for both. Also, some insights into test administration, scoring, and interpretation guidelines would be important for implementing the tests practically. The relative value of different forms of validity is necessary to weigh different pieces of evidence. Looking into original validation study reports for both may provide further background information as well as methodological aspects.

References

Furr, R. M. (2018). Psychometrics: An introduction. Sage Publications.

ORDER A PLAGIARISM-FREE PAPER HERE

We’ll write everything from scratch

Question

Study the following descriptions of two tests and answer the two questions that follow the test descriptions. Then post replies to classmates.

Comparing the Validity and Reliability of Two Tests

Test A: 40 items
Description: Measure of self-esteem
Scales: Total Score, General Self-Esteem, Social Self-Esteem, Personal Self-Esteem
Reliability: Test-retest r = .81; coefficient alphas for the Total Score, General Self-Esteem, Social Self-Esteem, and Personal Self-Esteem scales are .75, .78, .57 and .72, respectively.
Validity: Content—developed construct definitions for self-esteem, developed a table of specifications, wrote items covering all content areas, and used experts to evaluate items. Convergent—correlated with Coopersmith’s Self-Esteem Inventory (r = .41). Discriminant—correlated with Beck Depression Inventory (r = .05). Factor analysis revealed that the three subscales (General Self-Esteem, Social Self-Esteem, Personal Self-Esteem) are dimensions of self-esteem. Homogeneity—correlations between the scales indicate the General scale correlated with the Social scale at .67, the Personal scale at .79, and the Total scale at .89.

Test B: 117 items
Scales: Global self-esteem, competence, lovability, likability, self-control, personal power, moral self-approval, body appearance, body functioning, identity integration, and defensive self-enhancement.
Reliability: Test-retest for each scale ranges from .65 to .71. Coefficient alphas range on each scale from .71 to .77.
Validity: Content—based on a three-level hierarchical model of self-esteem. Convergent—correlated with the Self-Concept and Motivation Inventory (r = .25) and with the Eysenck Personality Inventory (r = .45). Discriminant—correlated with the Hamilton Depression Inventory (r = .19).

Please ANSWER the following questions:

Given this technical information, which of the above instruments would you select?
What additional information would you want to have to make your decision?

Statistical Discussion – Comparing the Validity and Reliability of Tests

Statistical Discussion – Comparing the Validity and Reliability of Two Tests

Question One

Question Two

References

How It Works

How to Avoid Plagiarism