The purpose of this study is to reveal the reliability, practicality, and validity level of the entrance exam for the teacher training program at National Institute of Education in Cambodia and to measure the knowledge of the student and test when they did the examination in 2010. This study employed documentary interviews from the two teachers and the two students at NIE. As a result of these findings, the current study contributes to the understanding of the practicality, reliability, and validity level of test items and providing suggestions towards the improvement of the tests design for the entrance exam at NIE in the future.
National Institute of Education is well-known school in Cambodian and The National Institute of Education is charged with responsibility for the training of Cambodia’s school teachers and school administrators. It comprises two departments – the Education Department, which trains lower and upper secondary school teachers in the sciences and social sciences, and the Planning and Management Department, which trains school principals, inspectors, supervisors and office administrators to plan and evaluate the quality of education throughout the country. And it has an enrollment of about 900 students every year (MoEYS). So this paper will consider reliability, practicality, and validity level in a representatives English language tests that was a part of the entrance examination for the National Institute of Education in 2010 and in order to understand the nature of English tests and the analysis put forward the context in which the entrance examination was taken by the students or trainees. Moreover, all students or trainees must take the entrance examination whether or not they will have a chance to study there and if they pass the examination, they will have a chance to study there. If they fail the examination, they must wait until next year and take the exam again.
According to Clark (1975) said that reliability is in fact a prerequisite to validity in performance assessment in the sense that the test must provide consistent, replicable information about candidates’ language performance. And Jones (1979) said that there is no test can achieve its intended purpose if the test results are unreliable. Reliability in a performance test depends on two significant variables: (1) the simulation of the test tasks, and the consistency of the ratings, and four types of reliability have drawn serious attention: (1) inter-examiner reliability, (2) intra-examine reliability, (3) inter-rater reliability, and (4) intra-rater reliability.
Jones also mentioned that since the administration of performance tests may vary in different contexts at different times, it may result in inconsistent ratings for the same examinee on different performance tests. Therefore, attention should be devoted to inter-examiner and intra-examiner reliability, which concern consistency in eliciting test performance from the test takers (Jones, 1979).
In addition, performance tests require human or mechanical raters’ judgments. The reliability issue is generally more complicated when tests involve human raters because human judgments involve subjective interpretation on the part of the rater and may thus lead to disagreement (McNamara, 1996). Inter-rater and intra-rater reliability are the main considerations when investigating the issue of rater disagreement. Inter-rater reliability has to do with the consistency between two or more raters who evaluate the same test performance (Jones, 1979). For inter-rater reliability, it is of primary interest to examine if the observations over raters are consistent or not, which may be estimated through the application of generalizability (Crocker & Algina, 1986). Intra-rater reliability concerns the consistency of one rater for the same test performance at different times (Jones, 1979). Both inter- and intra-rater reliability deserve close attention in that test scores are likely to vary from rater to rater or even from the same rater (Clark, 1979).
It refers to the economy of time, effort and money in testing. In other words, a test should be easy to design, easy to administer, easy to mark, and easy to interpret the results (Bachman and Palmer, 1996). Moreover, according to Brown (2004) said that the test that is practical it needs to be within the means of financial limitations, appropriate time constraints, easy to administrator, score, and interpret.
The term validity refers to whether or not a test measures what it intends to measure. On a test with high validity the items will be closely linked to the test’s intended focus. For many certification and licensure tests this means that the items will be highly related to a specific job or occupation. If a test has poor validity then it does not measure the job-related content and competencies it ought to (Bachman and Palmer, 1996).
Content validity refers to the connections between the test items and the subject-related tasks. The test should evaluate only the content related to the field of study in a manner sufficiently representative, relevant, and comprehensible (Bachman and Palmer, 1996).
Hughes (1989) said that a test is to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned. Hughes (1989) also said that in order to judge whether or not a test has content validity, we need a specification of the skills or structures. That it is meant to cover a principled selection of elements for inclusion in the test. The greater a test’s content validity, the more likely it is to be an accurate measure of what it is supposed to measure.
According to Bachman and Palmer (1996) said that It implies using the construct correctly (concepts, ideas, notions). Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure. And Derewianka (1999) said that Construct validity is concerned with the extent to which the assessment task reflects the theoretical assumptions underpinning its construction. Moreover, Brow (2004) also said that construct validity is any theory, hypothesis, or model that attempts to explain observed phenomena in our universe of perception.
What is the reliability level of the entrance exam for teacher training program at NIE?
What is the practicality level of the entrance exam for teacher training program at NIE?
What is the validity level of the entrance exam for teacher training program at NIE?
The two teachers who English at NIE and the two students who study at NIE were selected to participate in one-on-one interviews. These people were invited to participate in this study because they are more experiences in designing the tests and scoring the tests at NIE and other two students are the trainee there and they had experience in doing entrance exam there. More importantly, they really would like to share their demonstration experiences with the interviewer. Moreover, as a result, enough information must be obtained through in-depth interviews with the four participants about the practicality, reliability, and validity level of the entrance exam for the teacher training program at NIE.
The two teachers and two students were invited in one-on-one interview at different time. And I have made the appointment at least two days before the acting the interview was occurred because I would like to give more time for them to prepare for the interview process. While I am interviewing, the process of noting and recording will be revealed and the main objective of the research will be told to the four participants. After recording and noting the participant speech will be transcribed into the writing script because it is easier for writing the result and discussion which need to be done at the end of this research. The result from the participants and the findings from literature review will be compared to find the similarities and differences because the purpose of this research is just to further understand the practicality, reliability, and validity of the entrance exam for teacher training program at National Institute of Education (NIE).
The instruments were selected to collect the data needed for the research. Those kinds of instrument gave information about specific points to evaluate in the research such as preferences, practicality, reliability, validity level of the entrance exam at for the teacher training at NIE. At the end of the treatment, the information collected by those instruments was the necessary to find the main purpose of the research.
Scope and Limitation
It is not possible to extend more result to represent the practicality, reliability, and validity level of this research because the number of participants participating in this research is small and the data from interviewees is very limited. Therefore, this research result aims to confirm the previous research result in the literature review
The following is the result of an interview with the four people who act as the participant or interviewee in this research. The interview’s result is shown the following paragraph:
The two participant teachers seem to have the same the same ideas about the tests for doing the entrance exam at NIE. However, they still some points that are different from one another. Moreover, the two students seem to have the same ideas when they answer the questions of the interview. However, they have a little bit different ideas in answering the questions. I really appreciate for their ideas because they are good and useful for this research.
Reliability level of the entrance exam for teacher training program at NIE
The two teachers said that most of the test item there is only one answer such as section A, B, C, but except section D because in the section D is the writing skill and the students would have a different answer from their own views. Moreover, the first teacher said that he didn’t think he was properly trained because the content just like copied and pasted, the introduction was not clear and the format and layout was messy. The second teacher said that the test designer was well-educated because he had gotten a master degree from Japan, major in educational management. However, he thought that the test designer was not well-trained in designing the test. Both of them seem to have the same ideas about the test designer.
In addition, when I asked them if there were any subjective test items in the test, both of them replied “Yes” and there was only one “section D writing skill” and also said that writing skill the student would have a different answer from the own opinion to write the answer. Then when I asked them about their criterion for scoring the students’ writing, they seem to have a little bit different idea for making that subjective item. The first teacher said that for marking the writing section first he looked at the format such as introduction, body, and conclusion, grammar, punctuation, coherence whether the ideas were well-linked, and one more thing was about acceptability. And the second teacher said that he had some criterion differed from other lectures here and those criterion were organization, content and ideas, grammar, sentence structures, and the last one is critical thinking. However, if the two criterions are compared to each other, there are some points which are the same such as: format and organization, grammar, and there are some points which are also different such as: critical thinking, acceptability, and sentence structure.
Moreover, both of them said that there was only one rater for marking one test, but there were two examiners in the classroom while students were doing the tests because they didn’t want students cheat during the exam.
Practicality level of the entrance exam for teacher training program at NIE
Both of the teachers said that the test was administrated in the classroom at NIE and all the trainees who did the tests. Moreover, they said that they didn’t know exactly about the cost of exam paper, but the y said that the test may be cheap there were three exam papers including the answer sheet. They also said that the financial for copying the test was supported by the NIE and the Ministry of Education and all the examinees were allowed only sixty minutes to do the tests.
Validity level of the entrance exam for teacher training program at NIE
The first teacher said that from his own view it was beyond some of the students’ ability. However, he said that the test could measure the actual students’ ability because it was the entrance exam and the school had to choose the best one. He also gave example that if the school needed sixty and the school had to select the top sixty students. The second teacher said that the students may have faced several difficulties with doing this one because most of the test items had been copied from the TOEFL book. However, He still thought that there were some points (grammar or vocabulary) students had studied when they had been the students at the university, so he also said that they could do well.
Moreover, the first student said that some grammar and vocabulary test items he had been seen such as 2, 7, 8â€¦but beside those he had never seen before, and he also said that this test could measure his actual ability because he had done all points on the exam paper and also passed the exam there. He also mentioned that this test was suitable for students had finished the bachelor degree. The second student said that there were some points which she had seen when she had been studying English such as in grammar and vocabulary, but she also that it was difficulty with doing reading skill because there were some key works she couldn’t understand. However, she also said that she had done all test items. Then she said that the test could measure her actual ability because she passed the exam and studied there.
The improvement of the test
The firs teacher said that in his opinion, the selection of the test designer should be properly selected, especially, makes sure that the test designer should be properly trained in designing the tests. The second teacher said that the test should be clear enough to see especially the layout, one more thing was that the test designer should be add more time to the test because the time was a little bit short and that meant the test should be one hour and thirty minutes because I hour is so short including delivering the exam paper to students and also students signed on the test and attendant sheet.
Moreover, the first student said that the time was a little bit short for all the students and the test designer should add more time to the test and he also suggested that the test should change from one hour to one hour-thirty minutes. And the second student said that the test items are long if she compared to the time arrangement in the test. She thought it was better to cut down one reading in the test or test designer should add more time to 90 minutes because there were only 60 minutes.
Overall, the finding of reliability level of the entrance exam for teacher training program at NIE shows that most of the test items are easy to mark to the students because they are the objective tests and it is multiple choice questions, but except writing section because it is the subjective tests and it is hard to judge or evaluate the students’ writing and there is only one rater for marking the test items, but there are examiners in the classroom while students are doing the exam. However, the test raters are well-trained and have more experience in marking writing; especially they have a clear criterion for marking the writing section, so I think that test is more reliable. This finding is also similar to Jones, 1979; McNamara, 1996; Clark, 1979.
Another finding is similar to Bachman and Palmer, 1996; Brown, 2004 because the result shows that the test is cheap, especially cost for copying the exam paper is supported by NIE and the ministry of education. And the test is easy to mark, the test was administrated in the classroom at NIE and they are comfortable classroom all, but the time-allowed is a little bit short. However, I do believe that the test is still practicality because the two participants finish the test on time and it is the entrance exam, so the school must select the top students.
The last result shows that it is hard to judge whether or not it is a content validity because it is the entrance exam, and it focuses on the general English, so I think that most of the test items are construct validity because the students do it depending on the instructions in the tests. And most of the test items are multiple choices and the student would familiar with these ones because they experienced in the university they would know the theory of doing it. This means that most of test items similar to the TOEFL questions. This finding is similar to Bachman and Palmer, 1996; Brown, 2004 because they have mentioned that construct validity implies using concept, ideas, notions, theory, hypothesis, and model.
However, I think that some test items are the content validity because when I did the interview with the students they said there were some points in the grammar and vocabulary test item they have seen and done them before such as question 2, 7, 8, 12â€¦. because they were at the university they have taken the TOEFL class. That’s why they have seen some test items before and this is similar to Bachman and Palmer, 1996; Hughes, 1989 because these authors said that content validity refers to the connections between the test items and the subject-related task and it is representative sample of the language skills, structures.
In conclusion the result above show the practicality, reliability, and validity level of the entrance examination at NIE in 2010. However, there are also a little bit suggestions about the test items from students and lecturers there because they would like the test more practical, reliable, and valid able both the present and the future.