Praise and complaint patterns by NYC neighbourhood — four-panel dashboard

NYC supermarket secrets

What can the language of 1,300+ Google Maps reviews tell us about how New Yorkers relate to where they shop? This project uses NLP to extract sentiment, vocabulary, and behavioural patterns from reviews across 480+ stores and 16 neighbourhoods — asking not just how people rate supermarkets, but how they talk about them, and what that language reveals about community, class, and care.

The pipeline begins with a custom Google Places API data collection script covering 16 neighbourhoods across Manhattan, Brooklyn, and Queens — from Tribeca to Harlem to Astoria. Reviews were cleaned and parsed, then analysed across several dimensions: VADER sentiment scoring (compound, positive, negative, neutral) per review and aggregated by chain and neighbourhood; TF-IDF distinctive word extraction to surface what vocabulary is uniquely associated with each chain; keyword-based complaint and praise categorisation (quality, service, price, checkout, availability); vocabulary complexity and diversity metrics; and a heuristic fake review detector using multi-factor scoring (generic language patterns, extreme sentiment, review length, absence of specific product mentions).

Key findings: Astoria and Harlem have the highest complaint-to-praise ratios; Long Island City concentrates almost all complaints in quality and service; Whole Foods reviewers use the most distinctive vocabulary (dashi, vegan, pickle) while Target reviewers mention the store itself most frequently; Food Bazaar and Trader Joe's have the highest positive sentiment share across all chains; 2.6% of reviews were flagged as potentially fake — below industry average.

Top distinctive words across major chains — TF-IDF bar chart Complaint category distribution by neighbourhood — normalised heatmap Emotional composition by chain — stacked bar chart

Stack: Python · Google Places API · NLTK · VADER · TF-IDF (scikit-learn) · pandas · matplotlib · seaborn

Role: data collection · NLP pipeline · sentiment analysis · data visualisation

Columbia University, 2025. Solo project.

github.com/halfabluebanana →