Fake News detection - Grip op desinformatie
How do fake news detection algorithms work?
ProjectIt is often difficult to judge whether an article we read online is based on fact or not. Just hit current events: corona, reports from Russia about the war with Ukraine. Because it is impossible to manually verify every news report that is published, there has been a lot of research into automatic misinformation detection. But can we trust these algorithms?
It is often difficult to judge whether an article you read online is true or not. Since it is impossible to manually verify every piece of news that is published, a lot of research has gone into automatic detection of misinformation. But can we trust these algorithms? That's what Suzana Bašić, Marcio Fuckner and Pascal Wiggers and others investigated in their Research Project Explainable Fake News Detection.
First, they looked at the data used to train computer models for misinformation detection. Since complex algorithms require a lot of data, the datasets are also often created automatically. This can lead to certain biases in the data and thus in the predictive models. For example, they found that the models often learn which news sources can or cannot be trusted instead of learning whether an individual article contains incorrect information.
When they removed this source of bias from the data, they found that a very simple algorithm produced results similar to those of a very sophisticated algorithm. This is an interesting finding for several reasons. Simpler models consume fewer resources, making them sustainable. They also require less data, allowing us to build smaller but higher-quality but better-quality datasets. Finally, unlike more complex algorithms, they are often explainable. That means they can explain why a model made a particular decision in each case.
Finally, they conducted experiments to examine the results of SHAP, a popular explanatory method used to explain "black box" algorithms. These are complex algorithms such as neural networks where it is not entirely clear why they make certain decisions. They have identified several problems with this method when applied to text.
Some of these problems are shown in Figure 1.
The parts highlighted in red indicate fake news, while the parts highlighted in blue indicate otherwise.In the skating example, the explanation method is inconsistent because it assigns completely opposite weights to the same words, for example "skating" is marked both red and blue in the same text.In Figure 2, we see that many words are highlighted that do not have much meaning, such as "and," "it," "is," "then," "on."
When they used this method to explain a simpler model, many conjunctions and prepositions were also selected.But because the simpler model was explainable, they could see that those words were not that important for the predictions at all.This means that the SHAP method does not explain the models well enough.In future work, they plan to further analyze why these problems occur and how they can be avoided.