This is cool but definitely can see where you're running into some of the same AI tendencies that I've run into in my own (much less fun) projects.

There's some variety in here but AI in general really struggles to vary in tone within a single output. I'll be interested to see if the project can overcome that tendency.

The scoring - AI HATES to give things low scores. It's too nice. In my experience it does better if you have named outcomes e.g. negative, neutral, positive and then convert those to numbers. A more interesting solution might involve logprobs where you ask "do you like this person yes/no" and then use the logprobs value on yes/no to measure the AI's "uncertainty" about the match.

yep, a lot of interesting challenges! Will keep you posted. I think cracking the "tone variety" challenge would be a big unlock.