liminal on Nostr: Yeah, pretty easy to explain actually. There's a ton of embedding models that you can ...
Yeah, pretty easy to explain actually. There's a ton of embedding models that you can work with off the shelf if you'd rather not train yourself (can find them on huggingface, a repository of AI models).
Words go in -> numbers come out.
Your vocabulary of words/tokens gets assigned some number values and the training task is to predict some part about the group of words you've been assigned
1) I've masked word or set of words with a blank value, predict the words.
2) predict the next n tokens
etc.
the performance depends on the task and the data it was trained on, but the result is you get a bunch of numbers that you can compare through a bunch of distance metrics. Grab a bunch of text with vectors assigned to them and you ask "which text is closest to this thext i care about", which is some K-nearest neighbors algorithm
Published at
2025-04-24 01:57:24Event JSON
{
"id": "43b27e1a76c45dcd66f930d71d6abf4c7e44b8d5b13c0227aeb19a739c3f3de3",
"pubkey": "dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06",
"created_at": 1745459844,
"kind": 1,
"tags": [
[
"e",
"400a6b818edc22706b0264c9c024f22ae851a2dab3408bf4d85f70bfdfaed288",
"",
"root"
],
[
"e",
"006ae94c7acf72c2d38d16e169ff1a9131ae6d1015964a7668c6633091c342eb"
],
[
"e",
"3d437d0bf24f6d614ef0f365aab6d3a2f6ec9fda7dd9c88e4939e46bf05bffe6",
"",
"reply"
],
[
"p",
"dc4cd086cd7ce5b1832adf4fdd1211289880d2c7e295bcb0e684c01acee77c06"
],
[
"p",
"d28413712171c33e117d4bd0930ac05b2c51b30eb3021ef8d4f1233f02c90a2b"
]
],
"content": "Yeah, pretty easy to explain actually. There's a ton of embedding models that you can work with off the shelf if you'd rather not train yourself (can find them on huggingface, a repository of AI models). \nWords go in -\u003e numbers come out. \nYour vocabulary of words/tokens gets assigned some number values and the training task is to predict some part about the group of words you've been assigned\n1) I've masked word or set of words with a blank value, predict the words.\n2) predict the next n tokens \netc. \n\nthe performance depends on the task and the data it was trained on, but the result is you get a bunch of numbers that you can compare through a bunch of distance metrics. Grab a bunch of text with vectors assigned to them and you ask \"which text is closest to this thext i care about\", which is some K-nearest neighbors algorithm\n\n\n",
"sig": "b2192cac8c7245e3f8a2d8f8ce6e23386d87da27c653ebf0fee128f7e947f666ba890d52e1413ab58249f0ef9f2ee1e701b2b2a8028e9329d1ae409a2dee4418"
}