Reliability and validity in psychology research


Reliability
It is important that psychology research can easily be repeated and yield the same results each time. Reliability refers to the extent to which the measurement of a particular behaviour is consistent.

Assessing and improving reliability
In order to be able to class a test or research method as reliable, it must yield consistent results each time it is used. Of course, the exact same results will not be obtained each time as participants and situations vary, but a strong positive correlation between the results of the same test will indicate reliability.

Assessing and improving reliability of observers
An observational research study using more than one observer needs to be assessed for reliability. If different observers provide significantly different observations of the same behaviour, then the data provided by those observers is unreliable. Observer reliability can therefore be assessed by correlating the data provided by the observers.

Where observer scores do not significantly correlate then reliability can be improved by:
  • Training observers in the observation techniques being used and making sure everyone agrees with them.
  • Ensuring behaviour categories are correctly and objectively operationalised. This means that the behaviour being observed can only be that behaviour. For example, “aggressive behaviour” is subjective and not operationalised, but “pushing” is objective and operationalised.

Assessing and improving reliability of psychology tests
Psychological tests, such as self report questionnaires, need to be reliable. They can be assessed for reliability using the split-half or test-retest methods, and if unreliable the questions can be improved until reliability is established.
  • The split-half method involves randomly choosing half the questions on the test and comparing the results with the other half. If there is a significant positive correlation between the two halves then the questions are reliable. Using the split-half method means the same participant can be used without having to wait for them to ‘forget’ the questions between the two halves of the test, and it is therefore a quick and easy way to establish reliability. However it can only be effective with large questionnaires in which all questions measure the behaviour being researched.
  • The test-retest method involves administering an entire test to a participant, waiting for them to ‘forget’ the questions (which could take several months), and then readministering the test. If the results from both presentations of the test significantly positively correlate then it is a reliable test. The disadvantages of the test-retest method are that it takes a long time for results to be obtained, and if too long an interval has been used then the participant may have changed in themselves which may mean a test is declared unreliable when it is in fact reliable. The advantage is that every question is checked for reliability.



Validity
Validity refers to the extent to which a research technique actually measures the behaviour it is claimed to measure. For example, a relationship questionnaire is not a valid measure of aggression.

Assessing and improving internal validity
Internal validity means, “Does this test accurately measure what it is supposed to?” If a test is used to measure a behaviour, and there is a difference in that behaviour between participants but the test does not measure the difference, then the test has no internal validity.

Assessing and improving external validity
External validity means, “Can the results from this test be generalised to populations and situations beyond the situation or population being measured?” There are two types of external validity, population validity and ecological validity:
  • Population validity refers to the extent to which the results can be generalised to groups of people other than the sample of participants used. Much psychological research uses university students as participants, e.g. Asch (1959), and it is difficult to say for sure that the results can be generalised to anyone other than university students.
  • Ecological validity refers to the extent to which the task used in a research study is representative of real life. Research into eyewitness testimony, for example, has generally lacked ecological validity as participants viewed incidents on video screens rather than in real life.


Assessing and improving validity of psychology tests
Psychological tests can be assessed for validity in variety of ways including face validity, content validity, concurrent validity and predictive validity:
  • Face validity is a subjective assessment of whether or not a test appears to measure the behaviour it claims to. It is subjective and therefore not a particularly strong method with which to assess validity.
  • Content validity is an objective assessment of the items in a test to establish whether or not they all relate to and measure the behaviour in question.
  • Concurrent validity is a comparison between two tests of a particular behaviour. One test has already been established as a valid measure of the behaviour, and the other test is the new one. If the results from both old and new tests significantly correlate then the new test is valid.
  • Predictive validity refers to how well a test predicts future behaviour. An example of this is a diagnostic test for a mental health problem such as depression. If the test is a valid measure of depression and accurately diagnoses depression, then there will be a significant positive correlation between the test scores and the outcome for the patient.


A Level exam tips
Answering exam questions (PSYA1 AQA A specification)
These tend to be 1 or 2 mark questions focused on showing you understand how to assess and improve reliability or validity. If the questions states ‘Describe one method of assessing and improving the validity of (the test in the question stem)’ then it would be perfectly adequate to give the answer, “A valid test is one that measures the behaviour it is supposed to. It can be assessed using concurrent validity in which the test is compared with an established one, and if there is a significant positive correlation then the new test is probably a valid one.”