Iran war media framing

How do US and international media outlets frame the conflict? How can we make media bias visible and navigable? This project maps how language, emphasis, and narrative structure diverge across 77 sources covering the 2026 Iran war.

The pipeline runs in four stages. Web scrapers and MediaCloud queries collected articles across 77 sources; sentence-transformers (all-MiniLM-L12-v2) filtered out unrelated articles by title embedding similarity. A locally-hosted LLM (Gemma4, 4B, instruction-tuned) then scored each article against structured prompts across five framing dimensions:

1. kinetic focus (military hardware and activity)
2. humanitarian focus (wellbeing of civilians and disadvantaged groups)
3. diplomatic focus (how differing parties interact)
4. economic focus (economic impacts of the war)
5. culpability bias (use of active language and strong verbs to assign moral blame)

Aggregate framing scores per outlet were then clustered using BERTopic into five media groups. NetworkX built a similarity graph of outlets — nodes weighted by framing profile, edges by cosine similarity — making structural patterns visible across the media landscape regardless of nominal political affiliation.

Click on the Data and Methods tab in the app for more detail.

Stack: Python · sentence-transformers · Gemma4 (local LLM) · BERTopic · NetworkX · Streamlit · Altair · Plotly · BeautifulSoup

Role: data pipeline · LLM scoring framework · NLP · network analysis · data visualisation · app design

Columbia University QMSS, 2026. Group project with Maximilian Chelminski and Yixiao Liu.

open live app → view on GitHub →