How Matching Works

When you write what you're looking for, we break it down into individual desires, figure out how important each one is, and check how well each desire matches specific things someone wrote about themselves.

The Pipeline

Step 1: Normalize the candidate's about_me

When someone signs up, an LLM rewrites their about_me into a consistent structured format: simple "I am...", "I enjoy...", "I value..." sentences. It also infers implied traits from what the person wrote.

Raw profile text (Sol)

"I'm a cinephile I loooooove watching movies at the cinema, home too with at least a great sound haha. Love series and anime too. And I often go to the theater (should definitely go more). Fun is also having drinks with friends and going out, video games and I wanna do more board games!! Communication and respect is key to everything, the more you say honestly the best it is even if it hurts. I value honesty, respect and silliness."

After normalization (showing a selection)

"I am a cinephile"
"I enjoy watching movies at the cinema"
"I enjoy watching series"
"I enjoy anime"
"I often go to the theater"
"I enjoy having drinks with friends"
"I enjoy going out"
"I enjoy playing video games"
"I value communication"
"I value respect"
"I believe honesty is key to everything"
"I value silliness"
"I believe in not taking yourself too seriously"
"I am social" ← inferred from going out with friends
"I am fun-loving"
"I am straightforward" ← inferred from "say honestly even if it hurts"
…

This matters for matching: the normalizer splits compound phrases ("honesty, respect and silliness" becomes three separate sentences) and infers implied traits ("straightforward" from "say honestly even if it hurts"). Each sentence gets its own embedding, so a desire like "honest" can match the specific sentence about honesty rather than competing with unrelated content about movies and board games.

Step 2: Decompose and normalize your "looking for"

An LLM reads your looking_for text and splits it into 5–10 individual desires, each with an importance weight. Every desire is rephrased into a standardized "a person who..." format, which does two things: it disambiguates vague words (like "fit" → "physically fit and active") and it gives every desire a consistent embedding structure.

Input

"I like people who are pretty, entertaining, decent at holding a conversation, very open-minded, no more than moderately opinionated, smart but not overly intellectual, a tiny bit dommy"

Decomposed and normalized desires

0.4 a person who is pretty

0.4 a person who is entertaining

0.4 a person who is decent at holding a conversation

0.7 a person who is very open-minded

0.4 a person who is no more than moderately opinionated

0.4 a person who is smart but does not have their entire identity built around intelligence

0.4 a person who is at least a tiny bit dominant in a dating context

Weights reflect how strongly the text expresses each desire:

Weight	Meaning	Signal in text
`1.0`	Must-have / dealbreaker	"must", "only", "essential", "I need"
`0.7`	Strong preference	"really", "very important", repeated, clearly core
`0.4`	Preference (default)	mentioned once, no special emphasis
`0.2`	Nice-to-have	"ideally", "bonus", "a plus"

Step 3: Match each desire against best sentence

For each desire, we compare its embedding against every sentence in the candidate's about_me. The best-matching sentence wins. This means a desire like "patient with people" finds the one sentence that actually talks about temperament, rather than getting diluted by 21 unrelated sentences about hobbies and routines.

Before comparing, we apply structural debiasing. Sentences that start with "I am..." or "I enjoy..." share a grammatical prefix that inflates cosine similarity regardless of meaning. For example, "I am average" and "I am athletic" both start with "I am", which gives them a baseline similarity to any desire about personality traits. We measure this baseline by computing the cosine similarity between the desire and the bare prefix itself (just "I am" or "I enjoy" with no content) and subtract it, so only the meaningful similarity above the structural noise counts.

The desires above, scored against this candidate

33%pretty→"I have a sunny vibe" (no appearance info, closest match)

43%entertaining→"I believe in having fun"

51%decent at holding a conversation→"I am a good listener"

70%very open-minded→"I am open"

19%no more than moderately opinionated→"I believe curiosity broadens the mind"

55%smart but not identity-built-around-it→"I value curiosity as a form of intelligence"

21%at least a tiny bit dominant→"I have a sunny vibe" (weak match)

Step 4: Weighted average

Each desire's match score is multiplied by its weight, then averaged. Desires with higher weights matter more.

raw = (0.4×33 + 0.4×43 + 0.4×51 + 0.7×70 + 0.4×19 + 0.4×55 + 0.4×21) / (0.4+0.4+0.4+0.7+0.4+0.4+0.4)

= (13.2 + 17.2 + 20.4 + 49 + 7.6 + 22 + 8.4) / 3.1

= 137.8 / 3.1 = 44.5% (raw) → displayed as 61%

The "very open-minded" desire (weight 0.7) contributes the most, accounting for 49 of the 138 weighted points. "Pretty" scored 33% against "I have a sunny vibe" (not a real match for physical appearance, but the closest thing Sol wrote). "Smart but not identity-built-around-it" matched at 55% against "I value curiosity as a form of intelligence", where the qualifier found a genuine echo. The raw 44.5% is compressed to 61% for display using a power curve, which reduces the gap between good and great matches.

Step 5: Length adjustment

Very short profiles (under 15 words) get penalized because there isn't enough text to match against meaningfully. A 5-word profile might accidentally score well on one desire, but we can't be confident with so little information.

AND vs OR Grouping

Most people describe one type of person they want, so all desires go into one group (AND). A candidate must match most of them.

But some people describe genuinely different types:

A looking_for with two alternatives

"Focused, high on desire/ambition/drive. Intelligent. I like mathematics/theoretical CS, linguistics, neuroscience, or physics. OR a tennis coach."

This gets decomposed into two groups. Each group is scored independently (weighted average within the group). The final score is the best group's score. A tennis coach who knows nothing about theoretical CS can still score perfectly.

Scored against a tennis coach's profile

Candidate sentences: "I am a certified tennis coach", "I enjoy teaching tennis to beginners", "I enjoy teaching tennis to advanced players", "I am patient", "I am encouraging", "I am athletic", "I am disciplined", "I value dedication", "I value practice"

Group A (academic): focused/driven (70%), intelligent (4%), math (0%), CS (9%), linguistics (14%), neuroscience (14%), physics (4%) = 20% raw → 38% displayed
Group B (tennis): tennis coach (100%) = 100% raw → 100% displayed

Final = max(38%, 100%) = 100%

Structural Debiasing

Embedding models encode grammatical patterns, not just meaning. Two sentences that share a prefix like "I enjoy..." will have inflated cosine similarity even if the activities are completely unrelated. Without correction, "I enjoy taxes" would get a non-zero score against a desire for "a person who enjoys exploring". Not because taxes and exploring are related, but because the "I enjoy" prefix creates structural similarity in embedding space.

To fix this, for each desire we measure the structural floor: the cosine similarity between the desire embedding and the bare prefix embedding (just "I am" or "I enjoy" with no content word at all). This is the similarity that comes purely from shared grammatical structure, with zero semantic content.

If the desire "a person who is physically fit" has a cosine similarity of 0.25 against the bare prefix "I am", we know that 0.25 is noise. Any "I am..." candidate sentence needs to score above 0.25 for its match to mean anything. We subtract the floor and rescale:

debiased_score = (raw_similarity − floor) / (max_similarity − floor) × 100

Different sentence shapes get different floors. "I enjoy..." sentences have their own floor, "I am..." sentences have theirs. Sentences that don't match any known pattern (like "I rebuild vintage motorcycles") use the global minimum as their floor, since they don't have prefix inflation to correct for.

Desire Normalization

Raw desires like "fit" or "smart" are ambiguous. The word "fit" could mean physically fit, a good fit for something, or a clothing fit. When embedded as-is, the vector captures all meanings at once, producing weak matches with any of them.

By rephrasing every desire as "a person who [verb phrase]", the LLM uses context from your full looking_for text to pick the right meaning. "Fit" in "someone outdoorsy who is fit" becomes "a person who is physically fit and active", a much sharper embedding that strongly matches "I am athletic" (80%) where the bare word "fit" barely registered (35%).

This also standardizes the embedding structure. Every desire has similar grammar, so debiasing floors are more predictable, and the cosine similarity scale is consistent across desires.

Scaling

Raw cosine similarity between short text embeddings typically ranges from 0.15 (unrelated) to 0.55 (very similar). That's a narrow band: the difference between a decent match and a perfect match is only about 0.20 in cosine space. We scale this to 0–100%, then apply a power curve to better reflect how humans perceive match quality.

Step 1: Linear scaling

First, we map the raw cosine range to 0–100:

linear_score = (cosine_similarity − 0.15) / (0.55 − 0.15) × 100

With structural debiasing, the floor is subtracted before this scaling. The effective range shrinks per sentence shape, which means scores are harder to earn but more trustworthy.

Step 2: Display compression

Linear scaling exaggerates gaps. A cosine difference of 0.04 (e.g., 0.35 vs 0.39) produces a 10-point gap on the linear scale, but in practice both embeddings are capturing similar meaning. One just happens to share a few more tokens with the query. A "good match" at 50% and a "great match" at 70% are closer in actual quality than a 20-point gap suggests.

To correct for this, we compress the displayed score with a power curve:

displayed_score = (linear_score / 100)^0.6 × 100

Cosine	Linear	Displayed	Meaning
≤ 0.15	0%	0%	Unrelated
0.25	25%	38%	Vaguely related
0.35	50%	66%	Moderate match
0.45	75%	81%	Strong match
≥ 0.55	100%	100%	Near-exact match

Why Not Compare Whole Texts?

The previous algorithm compared one embedding of your entire looking_for against one embedding of their entire about_me. This caused two problems:

Problem 1: Profile dilution

Imagine an about_me that talks about astronomy, board games, baking sourdough, distance running, and learning Japanese. The single embedding averages all of those topics together. A search for "astronomy" only gets a moderate match because the astronomy signal gets diluted by everything else in the profile.

Chunk matching fixes this. The "astronomy" desire finds the one sentence about stargazing and matches it directly, ignoring the unrelated sentences entirely.

Problem 2: Long looking_for inflation

If your looking_for mentions philosophy, fitness, cooking, programming, and hiking, the single embedding becomes a generic "has lots of interests" vector that moderately matches almost everyone. Everyone scores 70–90%.

Weighted desire matching fixes this. Each desire is scored independently. A candidate who matches philosophy but not fitness, cooking, programming, or hiking only gets credit for philosophy. The weighted average stays honest.

Known Limitations

Negation blindness. Embeddings encode topic similarity, not logical meaning. "Does not smoke" and "I smoke heavily" both contain the word "smoke" and produce similar embeddings. The dealbreaker filters (smoking, drinking, etc.) handle this case separately from the match percentage.

Role confusion. "Dominant" and "submissive" are both power dynamics terms, so embeddings see them as similar. A submissive person can score well on a "dominant" desire. Kink role could be added as a dealbreaker filter.

Weight inference limits. The LLM infers importance weights from phrasing cues ("must", "ideally", emphasis). A flat list like "smart, funny, kind" gives everything the same weight (0.4) because there's no emphasis signal. Only the user truly knows which desires matter most.

Normalization artifacts. The LLM that normalizes profiles occasionally infers traits that aren't well-supported by the text. For example, "tech background" might produce "I am detail-oriented" even though that's a stretch. These over-inferred traits can create false matches if they happen to be semantically close to a desire.

Fallback

If chunk data isn't available for a user (e.g., they signed up before the feature was deployed and haven't been backfilled), the system falls back to the old whole-profile cosine similarity scoring. This ensures no one sees broken or missing match percentages.

semantic match.