Skip to the content.
AI-Powered Misinformation Detection

Welcome! This site explores how we’re using artificial intelligence to detect and combat misinformation. With the rapid spread of news through social media and online platforms, it has become increasingly difficult to determine which information is credible. Our project leverages both machine learning and generative AI to analyze news articles, rank their credibility, and provide explanations for how truthful (or misleading) they might be.

Introduction: What’s This All About?

Misinformation and disinformation have become major challenges in the digital age. Misinformation refers to false or misleading content that is spread unintentionally, while disinformation is deliberately crafted to deceive audiences. Both pose serious risks, from influencing public opinion and elections to spreading harmful health advice. The sheer volume of online information makes it nearly impossible for individuals to fact-check every claim they come across.

Our project seeks to automate the misinformation detection process using AI-driven techniques. Instead of relying on human fact-checkers alone, we use a hybrid approach—a combination of predictive models that analyze factuality based on structured features and generative AI that assesses context, bias, and consistency in language.

Methodology: How Does It Work?

Step 1: Scraping and Storing News Content

The first step in our misinformation detection process is data collection. We gather news articles, social media posts, and other online content using web scraping techniques. Since misinformation often spreads through specific sources, our system identifies and collects relevant articles from fact-checking websites, news platforms, and other digital sources.

Once collected, the articles are broken into smaller, manageable chunks—typically sentences or short paragraphs. This ensures that our models can analyze them in detail without losing context. Each chunk is then stored in a vector database, a specialized system that enables fast and efficient searches based on content similarity. This database helps in tracking how similar claims are repeated across different sources and identifying patterns in misinformation.

Step 2: Predictive Model - Fact Checking with Data

Our predictive model acts as the first line of defense against misinformation. It is trained on labeled datasets, including the LIAR-PLUS dataset, which contains thousands of news statements rated for truthfulness. Using machine learning techniques, the model extracts features such as:

  • Sentiment analysis – Does the statement use emotionally charged language?
  • Readability scores – Is the language overly complex or vague?
  • Named Entity Recognition (NER) – Who or what is being discussed?
  • Historical patterns – Does this claim align with previously debunked misinformation?

Based on these factors, the predictive model generates a veracity score ranging from 0 (completely false) to 1 (completely true). However, some statements may fall into a gray area, where additional context is required to assess accuracy.

Step 3: Generative AI - Deep Language Understanding

This is where our generative AI model comes into play. Unlike predictive models, which rely on predefined rules and historical data, generative AI can analyze the meaning, bias, and coherence of statements in real time. We use Google Gemini, a cutting-edge large language model, to evaluate misinformation based on nuanced contextual understanding. To further explore the capability of genertative AI, we experimented different prompting styles including normal chain-of-thoughts and fractal chain-of-thoughts, which tells the AI to think step by step.

Step 4: Bringing It All Together

By combining predictive and generative AI models, we create a multi-layered approach to misinformation detection. The predictive model assigns a factuality score, while the generative AI refines the analysis by evaluating bias, language manipulation, and missing context. Here is an example snapshot of our veracity engine combining the two models:

product example

Conclusion/Result: Why Does This Matter?

The spread of misinformation can have real-world consequences. False information has fueled public health crises, influenced elections, and even led to violence. Social media algorithms often amplify sensational content, making misinformation spread even faster than factual news.

To best resolve this problem, we explored different prompting styles during our experimentation and the result is as follows:

Factuality Factors Normal CoT (1 to 6) FCoT (1 to 6) Comparison
Biases 4 4.5 FCoT is more critical of missing counterarguments and detects subtle biases better.
Context Veracity 4 3.75 FCoT penalizes unverified claims more heavily and is stricter on context shifts.
Information Utility 4.25 4.5 FCoT values completeness, considering historical context like past recalls to improve article usefulness.
Average 4.08 4.25 The article selected is factual, and FCoT provided better precision in evaluating it.

Our goal is to provide an AI-powered solution that helps combat misinformation at scale. By using a hybrid detection approach, we hope to create a system that is more reliable, adaptable, and transparent than traditional fact-checking methods.