Pavel Korytov :emacs:☮️ on Nostr: I've tried making a full-text #RSS feed for the websites of ScienceX, the parent org ...
I've tried making a full-text #RSS feed for the websites of ScienceX, the parent org for Phys.org and Tech Xplore.
The webpages are very straightforward, so the bridge (for #rssbridge) took just about 200 LoC. But! #CloudFlare is super zealous there.
Even with the following parameters:
- 3 feeds
- fetch every hour
- cache webpages for 7 days (= fetch each webpage only once, for all intents and purposes)
I already got 429'ed. I'll try fetching every 4 hours, I guess...
W-why such extreme measures to prevent parsing? I'm sure #AI corps or whoever needs their data will just hire a bunch of people to solve CloudFlare's CAPTCHAs, but everyone else will be left behind.
Just give me the damn full-text RSS, I'd even pay for it... if I could sign up, the signup form returns 503 for me.
Published at
2024-11-09 21:01:01Event JSON
{
"id": "2717771f3234fe318c333e2efce5e085a9c07ff5edc522d4e8f3d980bf0050d3",
"pubkey": "a0beb2d46e7c459f153950cf3af3c6e29ba1d316d08d9e92c9317c0687c99b55",
"created_at": 1731186061,
"kind": 1,
"tags": [
[
"t",
"rss"
],
[
"t",
"rssbridge"
],
[
"t",
"cloudflare"
],
[
"t",
"ai"
],
[
"proxy",
"https://mastodon.bsd.cafe/users/sqrtminusone/statuses/113455009695893640",
"activitypub"
]
],
"content": "I've tried making a full-text #RSS feed for the websites of ScienceX, the parent org for Phys.org and Tech Xplore.\n\nThe webpages are very straightforward, so the bridge (for #rssbridge) took just about 200 LoC. But! #CloudFlare is super zealous there.\n\nEven with the following parameters:\n- 3 feeds\n- fetch every hour\n- cache webpages for 7 days (= fetch each webpage only once, for all intents and purposes)\nI already got 429'ed. I'll try fetching every 4 hours, I guess...\n\nW-why such extreme measures to prevent parsing? I'm sure #AI corps or whoever needs their data will just hire a bunch of people to solve CloudFlare's CAPTCHAs, but everyone else will be left behind.\n\nJust give me the damn full-text RSS, I'd even pay for it... if I could sign up, the signup form returns 503 for me.",
"sig": "f769bd17d85515a9c74688a3383d48c16e5d3fbcc35904c112fd75f5790ba3b2449fa1a9f8ac5f61140bb93e4f3a9e9f741f6d871df1c4d24b1abe7e9ee2987b"
}