Note that Cohen Kappa`s agreements are only concluded between two advisors. Fleiss` kappa (according to Joseph L. Fleiss, 1981) was used for a similar level of compliance in categorical evaluation when there are more than two spleens. Fleiss` Kappa can only be used with binary or nominal nominal values. No version is available for ordered categorical reviews. The positive predictive value (APP) indicates the probability of the presence of a disease if the test is positive and is calculated as a percentage of people with a positive result who actually have the disease. Apart from comparing sensitivities and specificities when raw data on test performance is available, formal comparisons can be made. The appropriate test for comparison depends on the setting. If diagnostic tests were studied on two groups of independent patients, two tests for screening for binomial proportions, such as the chi square test or the Fisher exact probability test, are appropriate. However, if both tests were conducted on the same set of subjects, coupled data results and methods that take into account correlated binary results are necessary, such as the McNemar test and the square chi test. In 1960, Jacob Cohen proposed Kappa`s statistic as a measure of the agreement among advisors on category variables. It is generally considered a more robust measure than calculating a simple percentage of agreement, because the agreement is taken into account by chance.

Kappa can be used to compare the ability of different advisors to divide subjects into two or more categories. Kappa can also be used to assess compliance between alternative techniques of categorical evaluation when reviewing new techniques. Landis and Koch (Landis JR, Koch GG. The measure of the compliance agreement for categorical data. Biometrics 1977;33:159-74) proposed the following limits for the interpretation of values. However, these values are arbitrary and not universally accepted. The number of categories and themes affects the size of the value. Kappa tends to be higher when there are fewer categories. Sensitivity indicates the likelihood of a positive result in the event of illness. It is calculated as a percentage of people with a disease who are correctly classified as being classified with the disease. A test would be considered sensitive, in general, if it is positive for most people with the disease.

If the evaluators or the tests match perfectly, then No. 1. If there is no agreement between the advisors or the tests (except what one might expect), then No. 0. Cohens Kappa`s calculation is relatively simple and can be done by hand.