"Thank you, Mr. Nuff."
Oct. 16th, 2005 02:15 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
- I'm in Tallahassee, yay :)
- In the afternoon, I hung around with the family, and in the evening, I went climbing at the rock gym with Garrett
lomonthang (and my forearms and hands are now sore in a pleasant way -- man, climbing is hard) and then we had pizza (I've finally been to Decent Pizza! And it's quite good!) ... and watched Berry Gordy's The Last Dragon -- possibly the very best kung-fu film ever produced by Motown. Busta Rhymes looks exactly like the "Shogun of Harlem" character, and this was not lost on him; he parodies (well, maybe just "reproduces chunks verbatim because it was so silly to begin with") this film in his excellent video for "Dangerous". Right-o: many thanks to the G-UNIT for a lovely evening :)
- KOMPRESSOR NOW HAS OKCUPID PROFILE AND RECENTLY ANNOUNCED THE COMING SINGULARITY/ESCHATON:
kompressorpower.
- What if, instead of going out and spidering the web independently, search engines were run on data collected from consenting anonymized users? What better spider could there be than the set of users out there? The major issue I'm thinking about is that a page might match your search criteria really well by having the right words in it -- but does it answer the question you were thinking about? What if you could mod up (or down) the usefulness of a given page? Say you're looking for technical information about some computery thing, and all the hits you pull up are archived mailing list posts -- this happens to me all the time -- most of them are pretty useless, and you've got to do a lot of sifting. When you find that one post where that one alert mailinglist member says that exact thing you needed, you should be able to mark it in some way so other people are more likely to find it.
- In the afternoon, I hung around with the family, and in the evening, I went climbing at the rock gym with Garrett
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
- KOMPRESSOR NOW HAS OKCUPID PROFILE AND RECENTLY ANNOUNCED THE COMING SINGULARITY/ESCHATON:
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
- What if, instead of going out and spidering the web independently, search engines were run on data collected from consenting anonymized users? What better spider could there be than the set of users out there? The major issue I'm thinking about is that a page might match your search criteria really well by having the right words in it -- but does it answer the question you were thinking about? What if you could mod up (or down) the usefulness of a given page? Say you're looking for technical information about some computery thing, and all the hits you pull up are archived mailing list posts -- this happens to me all the time -- most of them are pretty useless, and you've got to do a lot of sifting. When you find that one post where that one alert mailinglist member says that exact thing you needed, you should be able to mark it in some way so other people are more likely to find it.
no subject
Date: 2005-10-16 06:37 am (UTC)What if you could mod up (or down) the usefulness of a given page?
and my infosec background sadly answers:
because unscrupulous people will rent a botnet and monkey with the ratings.
of course, if you're interested in applying your AI kung to the infosec domain, let me know and i'll make the appropriate introductions.
no subject
Date: 2005-10-16 06:15 pm (UTC)Well, you could certainly check for pages getting suddenly and unusually popular -- have some metric for finding outliers. You could do a bit of processing on the pages and see if they're selling something. Maybe if a user finds himself diverted to a page that he thinks got botnet-modded-up, then it could be flagged and somebody could go have a look.
Or better than all of those, I think, would be having the user do an unobtrusive task every time they want to mod something, the way you have to do when you're registering for an LJ or buying tickets on ticketmaster or whatever. "Read this hard to read nonsense word" or something like that.
no subject
Date: 2005-10-16 07:44 pm (UTC)well, a flood of traffic is easy to recognize, but what if a botnet has been programmed to randomly wait between mod-ups? (say, 1-3 days, about the same time for dynamic IPs to roll over...)
Or better than all of those, I think, would be having the user do an unobtrusive task every time they want to mod something, the way you have to do when you're registering for an LJ or buying tickets on ticketmaster or whatever. "Read this hard to read nonsense word" or something like that.
ah, a "Completely Automated Public Turing test to Tell Computers and Humans Apart"? well, botnets have plenty of CPU power to break captchas algorithmically (http://www.cs.sfu.ca/~mori/research/gimpy/), or, hey, just pipe the captcha into the bot-infected guy's browser and have a real human pass the test.
now, all this mumbo-jumbo is hypothetical and probably wouldn't affect a small site like del.icio.us, but if google started doing it, there would definitely be abuse (http://money.cnn.com/2004/12/02/technology/google_fraud/?cnn=yes).
no subject
Date: 2005-10-17 05:06 am (UTC)... or, hey, just pipe the captcha into the bot-infected guy's browser and have a real human pass the test.
Very, very good point. I'm trying to think up a good replacement for captchas, but that is a really difficult problem.
*considers* Anyway, this is what you and Tim
The deep deep issue is that I want a way to mark That One Really Good Mailinglist Post, and spidering the web seems archaic and vaguely wrong what when people are out there looking at stuff all day anyway. Maybe there's some other way to keep it honest. I'll have to consider...
no subject
Date: 2005-10-17 05:14 am (UTC)no subject
Date: 2005-10-17 11:49 pm (UTC)no subject
Date: 2005-10-18 03:54 am (UTC)no subject
Date: 2005-10-17 05:03 am (UTC)Gimmie a ring next time you hit up the rock gym or something equally fun. 321.277.3899
no subject
Date: 2005-10-17 05:10 am (UTC)But I'm heading to Gainesville tomorrow, to visit that Lloyd kid! Maybe I'll be in town again soon... :-\
no subject
Date: 2005-10-17 01:42 pm (UTC)You should be sneaky and develop a search site with user rankings. For your unobtrusive tasks, use your SAT-style questions you'd mentioned. Then start an open source site bent on defeating your SAT-style questions with AI... then benefit from the AI development that going into ruining your site. :P
You could even go further and start an open source site bent of improving the SATesq question building AI to beat the SATesq question answering algorithms... what better way to speed AI development than start a war!?
no subject
Date: 2005-10-17 05:37 am (UTC)Namely, google. Simple reason. Speed. People can't bounce through that many pages that fast. Also, how will these anonymized users find pages if someone doesn't spider them first?
What you're really looking for is a question of having users mod sites based on usefulness and interestingness, etc. Only issue there is that you might mod a site really low because it doesn't answer your question, but it will answer mine perfectly well. I vote for language/semantics understanding and search engines that allow meta-searching.
- Tim
no subject
Date: 2005-10-17 11:52 pm (UTC)no subject
Date: 2005-10-18 03:55 am (UTC)no subject
Date: 2005-10-18 06:15 pm (UTC)no subject
Date: 2005-10-18 10:23 pm (UTC)EXCEPT ONLY IF I CAN FRIEND YOU BACK!!
no subject
Date: 2005-10-18 11:13 pm (UTC)no subject
Date: 2005-11-08 03:11 pm (UTC)I'm probably taking you way too literally, but there's always StumbleUpon.