When you write what you're looking for, we break it down into individual desires, figure out how important each one is, and check how well each desire matches specific things someone wrote about themselves.
When someone signs up, an LLM rewrites their about_me into a consistent structured format: simple "I am...", "I enjoy...", "I value..." sentences. It also infers implied traits from what the person wrote.
This matters for matching: the normalizer splits compound phrases ("honesty, respect and silliness" becomes three separate sentences) and infers implied traits ("straightforward" from "say honestly even if it hurts"). Each sentence gets its own embedding, so a desire like "honest" can match the specific sentence about honesty rather than competing with unrelated content about movies and board games.
An LLM reads your looking_for text and splits it into 5–10 individual desires, each with an importance weight. Every desire is rephrased into a standardized "a person who..." format, which does two things: it disambiguates vague words (like "fit" → "physically fit and active") and it gives every desire a consistent embedding structure.
Weights reflect how strongly the text expresses each desire:
| Weight | Meaning | Signal in text |
|---|---|---|
1.0 | Must-have / dealbreaker | "must", "only", "essential", "I need" |
0.7 | Strong preference | "really", "very important", repeated, clearly core |
0.4 | Preference (default) | mentioned once, no special emphasis |
0.2 | Nice-to-have | "ideally", "bonus", "a plus" |
For each desire, we compare its embedding against every sentence in the candidate's about_me. The best-matching sentence wins. This means a desire like "patient with people" finds the one sentence that actually talks about temperament, rather than getting diluted by 21 unrelated sentences about hobbies and routines.
Before comparing, we apply structural debiasing. Sentences that start with "I am..." or "I enjoy..." share a grammatical prefix that inflates cosine similarity regardless of meaning. For example, "I am average" and "I am athletic" both start with "I am", which gives them a baseline similarity to any desire about personality traits. We measure this baseline by computing the cosine similarity between the desire and the bare prefix itself (just "I am" or "I enjoy" with no content) and subtract it, so only the meaningful similarity above the structural noise counts.
Each desire's match score is multiplied by its weight, then averaged. Desires with higher weights matter more.
The "very open-minded" desire (weight 0.7) contributes the most, accounting for 49 of the 138 weighted points. "Pretty" scored 33% against "I have a sunny vibe" (not a real match for physical appearance, but the closest thing Sol wrote). "Smart but not identity-built-around-it" matched at 55% against "I value curiosity as a form of intelligence", where the qualifier found a genuine echo. The raw 44.5% is compressed to 61% for display using a power curve, which reduces the gap between good and great matches.
Very short profiles (under 15 words) get penalized because there isn't enough text to match against meaningfully. A 5-word profile might accidentally score well on one desire, but we can't be confident with so little information.
Most people describe one type of person they want, so all desires go into one group (AND). A candidate must match most of them.
But some people describe genuinely different types:
This gets decomposed into two groups. Each group is scored independently (weighted average within the group). The final score is the best group's score. A tennis coach who knows nothing about theoretical CS can still score perfectly.
Embedding models encode grammatical patterns, not just meaning. Two sentences that share a prefix like "I enjoy..." will have inflated cosine similarity even if the activities are completely unrelated. Without correction, "I enjoy taxes" would get a non-zero score against a desire for "a person who enjoys exploring". Not because taxes and exploring are related, but because the "I enjoy" prefix creates structural similarity in embedding space.
To fix this, for each desire we measure the structural floor: the cosine similarity between the desire embedding and the bare prefix embedding (just "I am" or "I enjoy" with no content word at all). This is the similarity that comes purely from shared grammatical structure, with zero semantic content.
If the desire "a person who is physically fit" has a cosine similarity of 0.25 against the bare prefix "I am", we know that 0.25 is noise. Any "I am..." candidate sentence needs to score above 0.25 for its match to mean anything. We subtract the floor and rescale:
Different sentence shapes get different floors. "I enjoy..." sentences have their own floor, "I am..." sentences have theirs. Sentences that don't match any known pattern (like "I rebuild vintage motorcycles") use the global minimum as their floor, since they don't have prefix inflation to correct for.
Raw desires like "fit" or "smart" are ambiguous. The word "fit" could mean physically fit, a good fit for something, or a clothing fit. When embedded as-is, the vector captures all meanings at once, producing weak matches with any of them.
By rephrasing every desire as "a person who [verb phrase]", the LLM uses context from your full looking_for text to pick the right meaning. "Fit" in "someone outdoorsy who is fit" becomes "a person who is physically fit and active", a much sharper embedding that strongly matches "I am athletic" (80%) where the bare word "fit" barely registered (35%).
This also standardizes the embedding structure. Every desire has similar grammar, so debiasing floors are more predictable, and the cosine similarity scale is consistent across desires.
Raw cosine similarity between short text embeddings typically ranges from 0.15 (unrelated) to 0.55 (very similar). That's a narrow band: the difference between a decent match and a perfect match is only about 0.20 in cosine space. We scale this to 0–100%, then apply a power curve to better reflect how humans perceive match quality.
First, we map the raw cosine range to 0–100:
With structural debiasing, the floor is subtracted before this scaling. The effective range shrinks per sentence shape, which means scores are harder to earn but more trustworthy.
Linear scaling exaggerates gaps. A cosine difference of 0.04 (e.g., 0.35 vs 0.39) produces a 10-point gap on the linear scale, but in practice both embeddings are capturing similar meaning. One just happens to share a few more tokens with the query. A "good match" at 50% and a "great match" at 70% are closer in actual quality than a 20-point gap suggests.
To correct for this, we compress the displayed score with a power curve:
| Cosine | Linear | Displayed | Meaning |
|---|---|---|---|
| ≤ 0.15 | 0% | 0% | Unrelated |
| 0.25 | 25% | 38% | Vaguely related |
| 0.35 | 50% | 66% | Moderate match |
| 0.45 | 75% | 81% | Strong match |
| ≥ 0.55 | 100% | 100% | Near-exact match |
The previous algorithm compared one embedding of your entire looking_for against one embedding of their entire about_me. This caused two problems:
Imagine an about_me that talks about astronomy, board games, baking sourdough, distance running, and learning Japanese. The single embedding averages all of those topics together. A search for "astronomy" only gets a moderate match because the astronomy signal gets diluted by everything else in the profile.
Chunk matching fixes this. The "astronomy" desire finds the one sentence about stargazing and matches it directly, ignoring the unrelated sentences entirely.
If your looking_for mentions philosophy, fitness, cooking, programming, and hiking, the single embedding becomes a generic "has lots of interests" vector that moderately matches almost everyone. Everyone scores 70–90%.
Weighted desire matching fixes this. Each desire is scored independently. A candidate who matches philosophy but not fitness, cooking, programming, or hiking only gets credit for philosophy. The weighted average stays honest.
Negation blindness. Embeddings encode topic similarity, not logical meaning. "Does not smoke" and "I smoke heavily" both contain the word "smoke" and produce similar embeddings. The dealbreaker filters (smoking, drinking, etc.) handle this case separately from the match percentage.
Role confusion. "Dominant" and "submissive" are both power dynamics terms, so embeddings see them as similar. A submissive person can score well on a "dominant" desire. Kink role could be added as a dealbreaker filter.
Weight inference limits. The LLM infers importance weights from phrasing cues ("must", "ideally", emphasis). A flat list like "smart, funny, kind" gives everything the same weight (0.4) because there's no emphasis signal. Only the user truly knows which desires matter most.
Normalization artifacts. The LLM that normalizes profiles occasionally infers traits that aren't well-supported by the text. For example, "tech background" might produce "I am detail-oriented" even though that's a stretch. These over-inferred traits can create false matches if they happen to be semantically close to a desire.
If chunk data isn't available for a user (e.g., they signed up before the feature was deployed and haven't been backfilled), the system falls back to the old whole-profile cosine similarity scoring. This ensures no one sees broken or missing match percentages.
❀