Data Science Meets Happy Hour: Building a "Sommelier" for Dive Bars

You know the feeling. You’re looking for a spot with character—somewhere authentic, maybe a little rough around the edges, but with surprisingly good food. You open Google Maps, see a place with 4.2 stars, and think, “Eh, looks like every other place in this neighborhood.”

…but you just missed a hidden gem.

Google’s star ratings are a blunt instrument. They average everyone’s opinion into a single number, which destroys nuance. A 4.2-star rating doesn’t tell you why a place got that score. Is it because the bathroom is sketchy but the burgers are world-class? Or is it just… mediocre?

As a student in the UT Austin Machine Learning & AI program, I realized this was a data problem. I wanted a way to find places that are underrated, authentic, and have real character—specifically identifying the spots where “grit” is a feature, not a bug.

So, I built Dive Bar Detective.

The Hypothesis: Variance is Value

My original idea for this goes back years, long before I was formally studying ML. I used to look for the standard deviation of ratings rather than the average.

My theory was simple: High Variance = Polarization.

If a place has a 50/50 split of 1-star and 5-star reviews, it usually means the place has a strong personality. People either “get it” (and love the cheap drinks and loud music) or they “hate it” (and complain about the service). To me, high variance screamed “potential dive bar worth checking out.”

But statistical variance only tells you that people disagree, not why. To get to the truth, I needed to parse the actual text.

The Solution: Multi-Dimensional Scoring

I built an automated pipeline that analyzes 4,260 reviews across 251 Denver locations. Instead of a single star rating, the system breaks every establishment down into four distinct “Lenses”:

Lens	The Question it Answers
Quality	Is it actually good? (Focuses on food, drinks, and value)
Character	Does it have soul? (Focuses on authenticity, history, and “divey-ness”)
Underrated	Is it better than Google says? (Identifies gaps between sentiment and star rating)
Blended	The Sweet Spot. (A weighted combination of all three)

The result is a ranked list sorted by what you actually care about.

The Tech Stack: Under the Hood

For the engineers in the room, here is how the sausage is made (or in this case, how the cheap beer is poured).

1. The “Sticky Floor” Paradox (NLP Analysis)

I couldn’t just use keyword searching. In a hospital review, the phrase “sticky floors” is a critical failure. In a dive bar review, “sticky floors” might actually be a positive signal of authenticity.

I utilized OpenAI’s GPT models (specifically gpt-5-nano for speed and cost) to analyze every single review. The model extracts 9 specific signals on a 0–1 scale, including:

divey_score: How gritty is the energy?
unpretentious: Is it trying too hard?
classic_institution: Is this a neighborhood staple?

(Fun fact: Pulling the text of reviews is easy. Pulling the granular numerical rating associated with every specific review from Google is surprisingly painful. I’m working on a fix for v2.)

2. The Vibe Map (Dimensionality Reduction)

This is the feature I’m most proud of. I plotted every location on a scatter plot:

X-Axis: Quality
Y-Axis: Character

This visualizes “Vibe Space.”

Top-Right Quadrant: The Holy Grail. High Quality + High Character.
Top-Left: The “True Dives.” High Character, variable quality.

![Image: Screenshot of the Quality vs Character Vibe Map]

3. Unsupervised Learning Layers

To make the data richer, I ran a few traditional ML algorithms on top of the NLP output:

UMAP & K-Means: To cluster bars into “neighborhoods” based on vibe (e.g., “Late Night Rowdy” vs. “Quiet Old School”).
BERTopic: To auto-generate badges like “Live Music” or “Patio” based on topic dominance.
Isolation Forest: To detect anomalies—places that are statistically unique compared to the rest of the dataset.

Does it actually work?

The proof is in the pudding (or the nachos).

Case Study: Adrift Tiki Bar

Google Rating: 4.3 (Good, not great)
Dive Bar Detective Score: 8.7/10 (Underrated)
Why: The NLP picked up high signals for authenticity (0.88) and memorable (0.85). The model recognized that while some people didn’t like the noise, the people who loved it really loved it.

Case Study: Sancho’s Broken Arrow

Google Rating: 4.1
Character Score: 9.1/10
Why: This is a polarizing spot. The model saw high scores for divey_score (0.91) and unpretentious (0.93). If you want a sterile environment, you’ll hate it. If you want a classic Denver music venue vibe, the data says this is your spot.

Try it Yourself

This project is currently a Proof of Concept limited to Denver, CO. It is designed for desktop (because maps are hard on mobile), but it’s fully functional.

I’m currently looking for “evals”—feedback from locals to see if the algorithm is correctly identifying the legends versus the tourist traps.

👉 Launch Dive Bar Detective

If you are interested in the code, the architecture uses FastAPI (Python 3.14), Supabase (PostgreSQL), and is hosted on Render. You can check out the repo on my GitHub.