'''orang halus''': language test

~for paper language test tomorow, i haven't read anything yet..arghh!!and for now,i feel like want to write sth here..so,why dont i write about language test terms??hehheehe...hehe..for me,for this paper,it would be easy if i just find all the informations in internet.hehe...ok,so now start with RUBRICS..

what is rubrics?

A rubric is an explicit set of criteria used for assessing a particular type of work or performance. A rubric usually also includes levels of potential achievement for each criterion, and sometimes also includes work or performance samples that typify each of those levels. Levels of achievement are often given numerical scores. A summary score for the work being assessed may be produced by adding the scores for each criterion. The rubric may also include space for the judge to describe the reasons for each judgment or to make suggestions for the author.

Why use rubrics?

To produce assessments that are far more descriptive than a single, holistic grade or judgment can be. Instead of merely saying that this was a "B- paper," the rubric-based assessment describes the quality of work on one or more criteria. For example, an English paper might be assessed on its use of sources, the quality of the academic argument, and its use of English (among other criteria). A department's strategic plan might be assessed using a rubric that included the clarity of its learning goals for students, the adequacy of staffing plans, the adequacy of plans for advising, and other criteria.
To let those who are producing work ("authors") know in advance what criteria judge or judges will apply to assessing that work
To provide a richer and more multidimensional description of the reasons for assigning a numerical score to a piece of work.
To enable multiple judges to apply the same criteria to assessing work. For example, student work can be assessed by faculty, by other students and by working professionals in the discipline. If a rubric is applied to program review, a panel of visiting experts could use the same rubric to assess the program's performance.
To enable authors to elicit formative feedback (e.g., peer critique) for drafts of their work before final submission;
To help authors understand more clearly and completely what judges had to say about their work
To enable comparison of works across settings. For example, imagine an academic department trying to develop skills A-G among their students. One first year course focuses on teaching goals A, B, and D, while another first year course teaches A, C, and E. One second year course is trying to deepen skill B while introducing skill E. And so on. If faculty use the same rubrics and then pool data (which can be done with Flashlight Online), the department can monitor student progress as they work toward graduation. It's a far more informative way to assess student progress and guide changes in the curriculum than to monitor student GPAs: faculty can see which skills are developing as hoped, and where there are systemic problems in teaching and learning.

Reliability and Validity: What's the Difference?
Reliability
Definition: Reliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. In short, it is the repeatability of your measurement. A measure is considered reliable if a person's score on the same test given twice is similar. It is important to remember that reliability is not measured, it is estimated.
There are two ways that reliability is usually estimated: test/retest and internal consistency.
Test/Retest
Test/retest is the more conservative method to estimate reliability. Simply put, the idea behind test/retest is that you should get the same score on test 1 as you do on test 2. The three main components to this method are as follows:
Internal Consistency
Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. For example, you could write two sets of three questions that measure the same concept (say class participation) and after collecting the responses, run a correlation between those two groups of three questions to determine if your instrument is reliably measuring that concept.
One common way of computing correlation values among the questions on your instruments is by using Cronbach's Alpha. In short, Cronbach's alpha splits all the questions on your instrument every possible way and computes correlation values for them all (we use a computer program for this part). In the end, your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient, the closer it is to one, the higher the reliability estimate of your instrument. Cronbach's alpha is a less conservative estimate of reliability than test/retest.
The primary difference between test/retest and internal consistency estimates of reliability is that test/retest involves two administrations of the measurement instrument, whereas the internal consistency method involves only one administration of that instrument.

Validity
Definition:Validity is the strength of our conclusions, inferences or propositions. More formally, Cook and Campbell (1979) define it as the "best available approximation to the truth or falsity of a given inference, proposition or conclusion." In short, were we right? Let's look at a simple example. Say we are studying the effect of strict attendance policies on class participation. In our case, we saw that class participation did increase after the policy was established. Each type of validity would highlight a different aspect of the relationship between our treatment (strict attendance policy) and our observed outcome (increased class participation).
Types of Validity:
There are four types of validity commonly examined in social research.
Threats To Internal Validity
There are three main types of threats to internal validity - single group, multiple group and social interaction threats.
Single Group Threats apply when you are studying a single group receiving a program or treatment. Thus, all of these threats can be greatly reduced by adding a control group that is comparable to your program group to your study.
A History Threat occurs when an historical event affects your program group such that it causes the outcome you observe (rather than your treatment being the cause). In our earlier example, this would mean that the stricter attendance policy did not cause an increase in class participation, but rather, the expulsion of several students due to low participation from school impacted your program group such that they increased their participation as a result.
A Maturation Threat to internal validity occurs when standard events over the course of time cause your outcome. For example, if by chance, the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy.
A Testing Threat to internal validity is simply when the act of taking a pre-test affects how that group does on the post-test. For example, if in your study of class participation, you measured class participation prior to implementing your new attendance policy, and students became forewarned that there was about to be an emphasis on participation, they may increase it simply as a result of involvement in the pretest measure - and thus, your outcome could be a result of a testing threat - not your treatment.
An Instrumentation Threat to internal validity could occur if the effect of increased participation could be due to the way in which that pretest was implemented.
A Mortality Threat to internal validity occurs when subjects drop out of your study, and this leads to an inflated measure of your effect. For example, if as a result of a stricter attendance policy, most students drop out of a class, leaving only those more serious students in the class (those who would participate at a high level naturally) - this could mean your effect is overestimated and suffering from a mortality threat.
The last single group threat to internal validity is a Regression Threat. This is the most intimating of them all (just its name alone makes one panic). Don't panic. Simply put, a regression threat means that there is a tendency for the sample (those students you study for example) to score close to the average (or mean) of a larger population from the pretest to the posttest. This is a common occurrence, and will happen between almost any two variables that you take two measures of. Because it is common, it is easily remedied through either the inclusion of a control group or through a carefully designed research plan. .
In sum, these single group threats must be addressed in your research for it to remain credible. One primary way to accomplish this is to include a control group comparable to your program group. This however, does not solve all our problems, as I'll now highlight the multiple group threats to internal validity.
Multiple Group Threats to internal validity involve the comparability of the two groups in your study, and whether or not any other factor other than your treatment causes the outcome. They also (conveniently) mirror the single group threats to internal validity.
A Selection-History threat occurs when an event occurring between the pre and post test affects the two groups differently.
A Selection-Maturation threat occurs when there are different rates of growth between the two groups between the pre and post test.
Selection-Testing threat is the result of the different effect from taking tests between the two groups.
A Selection-Instrumentation threat occurs when the test implementation affects the groups differently between the pre and post test.
A Selection-Mortality Threat occurs when there are different rates of dropout between the groups which leads to you detecting an effect that may not actually occur.
Finally, a Selection-Regression threat occurs when the two groups regress towards the mean at different rates.
Okay, so know that you have dragged yourself through these extensive lists of threats to validity - you're wondering how to make sense of it all. How do we minimize these threats without going insane in the process? The best advice I've been given is to use two groups when possible, and if you do, make sure they are as comparable as is humanly possible. Whether you conduct a randomized experiment or a non-random study --> YOUR GROUPS MUST BE AS EQUIVALENT AS POSSIBLE! This is the best way to strengthen the internal validity of your research.The last type of threat to discuss involves the social pressures in the research context that can impact your results. These are known as social interaction threats to internal validity.Diffusion or "Imitation of Treatment occurs when the comparison group learns about the program group and imitates them, which will lead to an equalization of outcomes between the groups (you will not see an effect as easily).
Compensatory Rivalry means that the comparison group develops a competitive attitude towards the program group, and this also makes it harder to detect an effect due to your treatment rather than the comparison groups reaction to the program group.
Resentful Demoralization is a threat to internal validity that exaggerates the posttest differences between the two groups. This is because the comparison group (upon learning of the program group) gets discouraged and no longer tries to achieve on their own.
Compensatory Equalization of Treatment is the only threat that is a result of the actions of the research staff - it occurs when the staff begins to compensate the comparison group to be "fair" in their opinion, and this leads to an equalization between the groups and makes it harder to detect an effect due to your program.
Threats to Construct Validity
I know, I know - you're thinking - no I just can't go on. Let's take a deep breath and I'll remind you what construct validity is, and then we'll look at the threats to it one at a time. OK? OK.
Constuct validity is the degree to which inferences we have made from our study can be generalized to the concepts underlying our program in the first place. For example, if we are measuring self-esteem as an outcome, can our definition (operationalization) of that term in our study be generalized to the rest of the world's concept of self-esteem?
Ok, let's address the threats to construct validity slowly - don't be intimidated by their lengthy academic names - I'll provide an English translation.
Inadequate Preoperational Explication of Constructs simply means we did not define our concepts very well before we measured them or implemented our treatment. The solution? Define your concepts well before proceeding to the measurement phase of your study.
Mono-operation bias simply means we only used one version of our independent variable (our program or treatment) in our study, and hence, limit the breadth of our study's results. The solution? Try to implement multiple versions of your program to increase your study's utility.
Mono-method bias simply put, means that you only used one measure or observation of an important concept, which in the end, reduces the evidence that your measure is a valid one. The solution? Implement multiple measures of key concepts and do pilot studies to try to demonstrate that your measures are valid.
Interaction of Testing and Treatment occurs when the testing in combination with the treatment produces an effect. Thus you have inadequately defined your "treatment," as testing becomes part of it due to its influence on the outcome. The solution? Label your treatment accurately.
Interaction of Different Treatments means that it was a combination of our treatment and other things that brought about the effect. For example, if you were studying the ability of Tylenol to reduce headaches and in actuality it was a combination of Tylenol and Advil or Tylenol and exercise that reduced headaches -- you would have an interaction of different treatments threatening your construct validity.
Restricted Generalizability Across Constructs simply put, means that there were some unanticipated effects from your program, that may make it difficult to say your program was effective.
Confounding Constructs occurs when you are unable to detect an effect from your program because you may have mislabeled your constructs or because the level of your treatment wasn't enough to cause an effect.
As with internal validity, there are a few social threats to construct validity also. These include:
See, that wasn't so bad. We broke things down and attacked them one at a time. You may be wondering why I haven't given you along list of threats to conclusion and external validity - the simple answer is it seems as if the more critical threats involve internal and construct validity. And, the means by which we improve conclusion and external validity will be highlighted in the section on

SummaryThe real difference between reliability and validity is mostly a matter of definition. Reliability estimates the consistency of your measurement, or more simply the degree to which an instrument measures the same way each time it is used in under the same conditions with the same subjects. Validity, on the other hand, involves the degree to which your are measuring what you are supposed to, more simply, the accuracy of your measurement. It is my belief that validity is more important than reliability because if an instrument does not accurately measure what it is supposed to, there is no reason to use it even if it measures consistently (reliably).

article on SCHOOL BASED ASSESMENT

~~School-based assessment concept will be expanded, says Minister~~

Published on: Saturday, July 26, 2003

Johor: The school-based assessment concept, which stresses on communication and skills, will be expanded to cover other subjects in stages, Education Minister Tan Sri Musa Mohamad said Friday.
He said the concept was started this year for students in Year One till Form Five with Bahasa Melayu and English because the two subjects have oral elements.
Previously, the oral aspect is not emphasised although it is an important component to learn a language. If its found to be feasible, well apply the system for other subjects,he told reporters after opening the Education Service Conference.
The conference is being held for the first time in the country.
Musa said the concept is aimed at moulding students to develop self-communication and creative skills and not memorise facts and figures in a subject merely to pass the examination.
He said the concept would not be a burden to teachers because standard teaching aids would be supplied to all schools as well as procedures to gauge students skills.
Musa said the concept does not stress on examination but the level of skills attained by a student would be measured and achievements recorded.
We want a two-way learning process in classrooms so that students can communicate well with one another. They can communicate through discussions, lectures and other ways, he said.
Musa said although certain quarters had criticised the concept, saying some teachers might be biased in awarding marks, it would be implemented as planned because the concept had proven to be successful in developed countries.
If teachers are said to be biased or overloaded with work, the proposal will remain a proposal, while other countries will continue to implement it.
This is not only a good proposal, in fact, it will test our children in many fields, he said.
Earlier, in his speech, Musa said several countries like South Korea and Finland, which do not have much natural resources, have become world economic powers because their students were taught to communicate and be creative.
Thirty years ago, South Korea was lagging behind us, but now they are exporting cars and electrical goods worldwide. Finland only has four million people but can produce Nokia phones which dominate the world market, he said.
Musa said the yardstick to measure a students ability is not only on how many As he scores in the examination, but also his communication skills and creativity.
He said Malaysian students performed better than American and British students in a science and mathematics competition recently, but it was not reflective of their actual achievements.
But in reality, the United States is the No 1 in the technological era now and they produce about 100,000 patents a year. This is because their students are not only taught knowledge, but also to become creative, he said.
Musa said Malaysias education system is considered to be excellent because of the good infrastructures and about 300,000 teachers, but it could not produce the manpower that was competitive due to less emphasis to skills.-Bernama

'''orang halus'''

the world is just awesome...

2010-04-26

language test

Reliability and Validity: What's the Difference?

No comments: