An Overview of Reliability and Validity

Introduction

In the perspective of understanding research methodologies, the importance of validation and reliability must be considered at various stages of a framework and process. In the preparatory or design phase of research consideration must be given to how the data will be treated. Similar treatment of the information in the Literature Review must also occur. In addressing the research question, data must be collected and how it is treated also remains important which lent credibility to how that data is interpreted in the findings portion of research.

How data is treated required a review of whether or not it belonged in the study. This involved a perspective of validity and reliability associated with the data. The following narrative will answer the following questions:

  • What is the difference between reliability and validity? Which is more important Why?
  • What are the different ways of assessing reliability?
  • What are the different ways of assessing validity?
  • What are the different ways of obtaining validity evidence?

Validity

Professor Sekaran, an expert of business research, denoted that validity determined how well something measures the object or concept it is intended to measure (Sekaran 2003a). Trochim, the author of the website Research Methods Knowledgebase denoted that validity pointed to the conclusions we reach regarding our research and the quality of those conclusions (Trochim 2006e). Professor Zikmund’s definition of validity remained consistent with Sekaran’s where he echoed, “Validity is the ability of a measure (for example, an attitude measure) to measure what it is supposed to measure,” (Zikmund 2003b). Trochim divided validity into two areas which included Theory and Observation and then included four subgroups while Sekaran mentioned internal and external validation before mentioning three types of validity. Cooper and Schindler’s work on Business Research Methods denoted that externalized validity, “is the data’s ability to be generalized across persons, settings, and times,” (Cooper and Schindler 2006b). Cooper and Schindler then described Internal Validation in a similar fashion as Sekaran, Trochim and Zikmund. Cooper and Schindler’s generalized description of validation stated, “the ability of a research instrument is to measure what it is supposed to,” (p.318). Zikmund however, differentiated validity into many subtypes (p. 303-304) and Cooper and Schindler’s divisions remained similar to Sekaran’s subtype (p.318-321).

     Sekaran’s description of validity fell less than three broad headings: Content Validity, Criterion Related Validity and Construct Validity (p. 206). Content validity ensured, “that the measure included an adequate and representative set of items that tap the concept,” (p. 206). This was another way to describe a scale of measure toward the universe and the more representative the scale of that measure the more validity. Sekaran noted that Criterion-related validity was based upon differentiated predictive criteria and remained informed by establishing two additional subtypes of validity which included concurrent validity or predictive validity (p. 206). Concurrent Validity was a scale that helped differentiate data on the scale when they were supposed to be different. The Predictive validity attempted to differentiate based upon an event in the future (p. 206-207). Construct Validity tested how well results obtained, “fit the theories around which the test is designed,” through convergent and discriminant validity (p. 207). When two different scores and instruments measure the same concept through some highly correlated means, this is called convergent validity. Sekaran noted when two separate scores are predicted to be distinct and uncorrelated and are found to be uncorrelated and distinct this is called discriminant validity. These descriptors helped to differentiate the “goodness of measure” which assists the researcher in a determination of how well or accurately the item was measured to determine validity.

     Trochim divided validation into key areas of theory and observation with the following four subcategories: 1) Conclusion, 2) Internal, 3) Construct and 4) External (Trochim 2006e). Each of these areas purportedly builds on the previous questions Trochim expressed like some stair steps. Trochim noted that Conclusion Validity was often misused and misunderstood (Trochim 2006a). He continued, “Conclusion validity is the degree to which conclusions we reach about relationships in our data are reasonable,” (Trochim 2006a). Conclusion validity was the level to which the determination we reach is believable or plausible. Trochim cautioned researchers about threats to Conclusion validity, when he denoted two common errors that, “conclude that there is no relationship when in fact there is (you missed the relationship or did not see it),” or “conclude that there is a relationship when in fact there is not (you’re seeing things that aren’t there!),” (Trochim 2006g). Trochim noted that internal validity remained important in studies because it had zero generalized-ability and was almost always relevant only to the study itself (Trochim 2006d). Trochim summarized that internal validity was, “that you have evidence that what you did in the study (i.e., the program) [that] caused what you observed (i.e., the outcome) to happen, “(Trochim 2006d). Like internal validity, Trochim noted that construct validity attempted to generalize, but instead of being only about the study itself and having no external generalization, it generalized to the point of measurement to conjoin the hypothetical and our observation in reality (Trochim 2006b). This intent was to blend theory and observation. Trochim discussed external validity as, “the degree to which the conclusions in your study would hold for other persons in other places and at other times,” (Trochim 2006c). Trochim noted that critics of external validity may deny the universality or generality of the claim based upon specific persons, places or things and thus attempt to invalidate the generalization (Trochim 2006c).

     Zikmund described validity as measuring what is intended on measuring (Zikmund 2003b). Zikmund noted several variants of validity which included face validity, criterion validity, concurrent validity, construct validity (p. 302-304). Several of these variances on validity had sub categories. Zikmund described Face validity as, “referring to the subjective agreement among professionals that a scale logically appears to reflect accurately what it proposes to measure,” and that “the content of the scale appears to be adequate,” (p. 302). Furthermore, Zikmund delineated criterion validity as the ability of some measure to correlate with some other measure of the same construct. Criterion validity was similarly described by Sekaran with the two subtypes of concurrent validity and predictive validity. Zikmund described construct validity in similar terms as Sekaran with convergent and discriminant validity. Zikmund described construct validity as the, “degree to which a measure confirms a network of related topics,” (p. 303). This data means that if the measure behaves the way it is supposed to with a variety of other variables, this is construct validity. Zikmund completed his description of validity with an introduction to the concept of sensitivity which he described as the instruments ability to accurately measure variability in some stimuli or response.

     Cooper and Schindler described three areas of validity in their exploration of the subject: Content, Criterion-Related, and Construct (Cooper and Schindler 2006b). In content validity the degree to which the contents of the items adequately represent the universe of all relevant items were measured to include methods like judgment and content validity ratio panel judgment (p. 319). Cooper and Schindler break up Criterion-Related similarly to Sekaran into concurrent and predictive validity (p.319). Cooper and Schindler noted that Criterion-related validity measured the degree to which the predictor was adequate in capturing the relevant aspects of the criterion and used the method of correlation (p. 319). Concurrent validity was measured as a description of the present and noted that criterion data area available at the same time as predictor scores. Predictive validity remained different in the sense that it was a prediction of the future with the criterion data being measured after the passage of time. Lastly, construct validity, answered the question, “What accounts for the variance in the measure,” by attempting to understand the underlying constructs being measured and determining how well the test represents them (p. 319). In construct validity “both the theory and the instrument being used are considered,” (p. 319). Methods used in construct validity included judgement, correlation of proposed tests with established ones, factor analysis, and multitrait-multimethod analysis (p. 319).

      The previous explanations denoted various ways in which validity was expressed by several research experts. They delineated internal and external validity before breaking into the nuance variance of validity in their volume. Validity remained a central focus for research because if the object of study is being measured, it remains vitally important to ensure that the measure of that study is accurate. An example of this measure would be the weight of a young person. The scales that they stand on must contain a common measure, which in the United States uses a measure of pounds. This measure of pounds is further calibrated with other instruments that measure weight with pounds through a certification by the state in pounds by an official agency of weights and measures that used a known pound or weight to calibrate all official instruments in the state. This gives credibility to the outcome that some young man weighing 160 pounds would be consistent with any other man or woman standing on the scale who also weighed 160 pounds and if two separate scales were calibrated similarly the outcomes would be expected to be the same. Understanding the validity and ensuring the reliability of that measure is where reliability comes in.

Reliability

     Reliability remained the extent to which an experiment, test, or measuring procedure yields the same results on repeated trials. It focuses on the data itself where validity sought to examine the measure of the data. Sekaran noted that reliability was, “the extent to which it is without bias and hence ensures consistent measurement across time and across various items in the instrument,” (Sekaran 2003b). Sekaran broke her description of reliability into several areas following a general description. Trochim described reliability as a condition with repeatability or consistency (Trochim 2006f). He noted that something was reliable if it would give the same result over and over again if the item measured had not changed (Trochim 2006f). Trochim described four descriptive variances in reliability (Trochim 2006h). Research expert, Zikmund denoted that reliability is the degree to which measures are free from error (Zikmund 2003a). Zikmund also described several methods for describing reliability. Lastly Cooper and Schindler denoted that reliability is a consistent and necessary contributor to validity but not a sufficient condition for validity (Cooper and Schindler 2006a). The following denotes differing perspectives from researchers on Reliability.

     Sekaran differentiates the “goodness of the data” between validity and reliability and then further divides reliability into stability and consistency. Sekaran noted that stability of measures is the extent to which those conditions measured remain the same over time and defined consistency as the means by which items can hang together. Following this division there is another into four areas of specific reliability: 1) Test-Retest Reliability, 2) Parallel-form Reliability, 3) Inter-item Consistency Reliability and 4) Split-Half Reliability. The test-retest reliability took the same set of observations or conditions such as a test and then at a later time the same conditions were applied and retested. The measurement of this was called the test-retest coefficient. The higher the value between these two numbers the higher the perceived reliability (p. 204). Sekaran described parallel-form reliability as two tests with question in random sequence and some words interchanged with synonyms to get this coefficient. When the value between these two tests becomes higher the perceived reliability is higher (p.204-205). She defined inter-item consistency reliability as a coefficient where two items are measured independently of one another and the degree by which they relate determines the reliability of that observation. Split-half reliability reflects the correlations between two halves of an instrument (p. 205).

     Trochim denoted that reliability is impossible to calculate exactly but that it must be calculated with an estimate (Trochim 2006h). Trochim noted four types of reliability which included: 1) Inter-Rater or Inter-Observer Reliability, 2)Test-Retest Reliability, 3) Parallel-Forms Reliability, 4) Internal Consistency Reliability. Inter-rater or Inter-Observer reliability focused on a phenomenon or object and had independent observations of it. If the two observations had higher correlation values then the reliability of the information improved. Trochim echoed Sekaran when describing test-retest reliability and parallel-forms reliability. In describing internal reliability however, Trochim paralleled some of Sekaran’s internal reliability but defined it more aptly. Trochim divided these internal estimators into four additional areas which included Sekaran’s inter-item and split-half and two others. Trochim noted:

In internal consistency reliability estimation we use our single measurement instrument administered to a group of people on one occasion to estimate reliability. In effect we judge the reliability of the instrument by estimating how well the items that reflect the same construct yield similar results. We are looking at how consistent the results are for different items for the same construct within the measure, (Trochim 2006h).

Trochim included the average inter-item correlation used all of the items on our instrument that are designed to measure the same construct (Trochim 2006h). Furthermore, he added the average inter-item total correlations which used the average inter-item correlation but added a value at the end which included the total in the computation (Trochim 2006h). Trochim’s explanation of split-half reliability divided all items that purport to measure the same construct into two sets and then administered the entire instrument to a sample which was calculated as the total score for each randomly divided half (Trochim 2006h). Trochim describes a variation on the split-half reliability labeled, Cornbach’s Alpha, which computed one split-half reliability and then randomly divided the items into another set of split halves. This number was recomputed repeated until all possible split half estimates exist. Cronbach’s Alpha is mathematically equivalent to the average of all possible split-half estimates, although that’s not how we compute it (Trochim 2006h).

Zikmund’s description of reliability compliments the aforementioned descriptors by noting that reliability is the measure by which similar results are obtained over space and time (Zikmund 2003a). Zikmund described the test-retest reliability method similarly as the other researchers where he noted that if the tests were given at two different times with similar results the measure should be considered “stable,” (p. 300). Zikmund also described the split-half clearly noting that to test the validity of a group half can be measured and then the other half as well. The closer those two halves are measured the more valid the results (p. 301). Zikmund also notes the equivalent-form method which used the example of a questionnaire with similar questions and reordered wordings and question order to see if mitigate the halo effect of language on the test. The higher those results correlate the more reliable the information (p. 301).

Cooper and Schindler spend the least amount of time describing reliability but not that reliability plays a key factor in contributing to validity but it is not a sufficient condition for validity (Cooper and Schindler 2006b). They note that reliability remained concerned with estimates of the degree to which a measurement is free of random or unstable error.

Conclusion

The difference between reliability and validity is that reliability ensures an accuracy of measurement while validity focuses on measuring the right thing. Validity remains more important in this perspective because accurately measuring the wrong thing is not worth the study. This paper reviewed multiple perspective of assessing what reliability and validity are. Furthermore, several methods of evidence were discussed.

References

Cooper, Donald R., and Pamela S. Schindler. 2006a. “Reliability.” In Business Research Methods, edited by Brent Gordon, Scott Isenberg, and Cynthia Douglas, Ninth, 321. New York, New York: McGraw Hill Irwin. http://amzn.com/B0086HRH3C .

———. 2006b. “Validity.” In Business Research Methods, edited by Brent Gordon, Scott Isenberg, and Cynthia Douglas, Ninth, 318. Boston, MA: McGraw Hill Irwin. http://amzn.com/0073214876 .

Sekaran, Uma. 2003a. “Item Analysis.” In Research Methods for Business: A Skill Building Approach, edited by Jeff Marshall and Patricia McFadden, Fourth, 203. New York, New York: John Wiley & Sons, Inc. http://amzn.com/0471203661.

———. 2003b. “Reliability.” In Research Methods for Business: A Skill Building Approach, edited by Jeff Marshall and Patricia McFadden, Fourth, 203. New York, New York: John Wiley & Sons, Inc. http://amzn.com/0471203661 .

Trochim, William M.K. 2006a. “Conclusion Validity.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/concval.php.

———. 2006b. “Construct Validity.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/introval.php.

———. 2006c. “External Validity.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/external.php.

———. 2006d. “Internal Validity.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/intval.php.

———. 2006e. “Introduction to Validity.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/introval.php.

———. 2006f. “Theory of Reliability.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/reliablt.php.

———. 2006g. “Threats to Conclusion Validity.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/concthre.php.

———. 2006h. “Types of Reliability.” Research Methods Knowledge Base. http://www.socialresearchmethods.net/kb/reltypes.php.

Zikmund, William G. 2003a. “Reliability.” In Business Research Methods, edited by Jack Calhoun,

Steve Hazelwood, Mary Draper, and Robert Dreas, Seventh, 300. Mason, Ohio: Thomson South-Western. http://amzn.com/0030350840 .

———. 2003b. “Validity.” In Business Research Methods, edited by Jack Calhoun, Steve Hazelwood, Mary Draper, and Robert Dreas, Seventh, 303–5. Mason, Ohio: Thomson South-Western. http://amzn.com/0030350840 .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s