Intro
On June 5, 2019, ISTC-CNR (Institute of Cognitive Sciences and Technologies of the National Research Council) and Affidaty S.p.A signed an agreement of fundamental importance for futuristic scientific-technological purposes: to create a series of artificial neurons that can be trained with specific texts for a particular domain of expertise. This will allow a human to ask one or more “in-scope” questions to a very large number of individuals, who can provide open-ended answers without resorting to the classical multiple-choice systems. The synergistic system of neurons will be able to discard responses not related to the documentation used for its training, as well as classify all suitable responses with a score ranging from 0 to 1 based on the relevance of the elaborated knowledge domain. Various actors, including the cultural association science2mind and BUP, have also been involved in this project.
The importance of open-ended responses.
Classic multiple-choice tests force the candidate to move within a circumscribed space, the domain of possible answers. However, many questions have answers that are not known a priori but reside in the preparation and competence of the individual. In this case, classic techniques are “paralyzed” by an ineffective mechanism.
For example:
Try formulating a question like “What solution would you propose to reduce CO2 emissions produced by global industrial scenarios?” and try to build a multiple-choice answer system to collect a proposal compatible with reality from as many candidates as possible. Would it make sense to formulate a multiple-choice answer?
The project, named AISilomar, aims to introduce a new method of evaluation and aggregation of information in use cases where, to verify millions of open-ended responses, a high number of human verifiers would be needed.
But there is more.
The mechanism proposes an objective evaluation system that is more effective than the human one because we know how much humans are influenced by many parameters.
Here are some examples:
If we asked an expert evaluator to formulate a question to 20 candidates and give a score from 0 to 100 for each received response, we know that the order in which the evaluator will examine the responses may influence the score. This is because their objective evaluation criteria may be modified as they become aware of how the candidates are responding, thus leading to different initial and final objectivities.
Now, let’s increase the number of candidates from 20 to 100,000. It is easy to understand that the evaluator could take many days to evaluate the responses, and their objectivity could undergo much wider oscillations compared to the case of only 20 candidates.
If we now thought of having the texts of the 20 candidates evaluated by a committee of 10 experts, it is normal to assume that each evaluator could give different judgments to individual responses, especially if the scale was from 0 to 100.
Finally, if we needed to evaluate millions of responses in reasonable times (a few hours), a very high number of evaluators would need to be employed, and the responses would need to be distributed so that each evaluator only evaluates a few. In this case, we would no longer have the objectivity of the commission but only a probabilistic distribution of evaluator-response pairing.
In these circumstances, it is difficult, if not impossible, to determine which response is the most relevant (the best, so to speak) because the quality evaluation curve drops dramatically as the number of candidates and responses increases.
The AISilomar project aims to build an algorithmic objectivity, assisted by a small group of experts, that allows the neural network to read all responses within acceptable times (minutes, hours…) and classify them based on the relevance of the answers in a unique way.
This will make it possible to aggregate responses that have a common logical sense, the so-called Logical Signature, to have an idea of the size of the logical groups, to understand how many candidates have formulated similar responses and, among these groups, to classify the responses coming from the candidates in an orderly manner. Even by changing the order of introduction of the responses into the system, or by removing or adding other responses, the evaluation system will never be influenced and is able to reproduce the same evaluation for each individual response.