alexr_rwx: (jumping)
[personal profile] alexr_rwx
- I'm in Tallahassee, yay :)

- In the afternoon, I hung around with the family, and in the evening, I went climbing at the rock gym with Garrett [livejournal.com profile] lomonthang (and my forearms and hands are now sore in a pleasant way -- man, climbing is hard) and then we had pizza (I've finally been to Decent Pizza! And it's quite good!) ... and watched Berry Gordy's The Last Dragon -- possibly the very best kung-fu film ever produced by Motown. Busta Rhymes looks exactly like the "Shogun of Harlem" character, and this was not lost on him; he parodies (well, maybe just "reproduces chunks verbatim because it was so silly to begin with") this film in his excellent video for "Dangerous". Right-o: many thanks to the G-UNIT for a lovely evening :)

- KOMPRESSOR NOW HAS OKCUPID PROFILE AND RECENTLY ANNOUNCED THE COMING SINGULARITY/ESCHATON: [livejournal.com profile] kompressorpower.

- What if, instead of going out and spidering the web independently, search engines were run on data collected from consenting anonymized users? What better spider could there be than the set of users out there? The major issue I'm thinking about is that a page might match your search criteria really well by having the right words in it -- but does it answer the question you were thinking about? What if you could mod up (or down) the usefulness of a given page? Say you're looking for technical information about some computery thing, and all the hits you pull up are archived mailing list posts -- this happens to me all the time -- most of them are pretty useless, and you've got to do a lot of sifting. When you find that one post where that one alert mailinglist member says that exact thing you needed, you should be able to mark it in some way so other people are more likely to find it.

Date: 2005-10-16 06:37 am (UTC)
From: [identity profile] rusty42.livejournal.com
given your LISPy AI background, it's understandable that you'd ask

What if you could mod up (or down) the usefulness of a given page?

and my infosec background sadly answers:

because unscrupulous people will rent a botnet and monkey with the ratings.

of course, if you're interested in applying your AI kung to the infosec domain, let me know and i'll make the appropriate introductions.

Date: 2005-10-16 06:15 pm (UTC)
ext_110843: (removal of signs)
From: [identity profile] oniugnip.livejournal.com
That actually occurred to me, the Unscrupulous People Problem, although I hadn't thought of a whole botnet being used...

Well, you could certainly check for pages getting suddenly and unusually popular -- have some metric for finding outliers. You could do a bit of processing on the pages and see if they're selling something. Maybe if a user finds himself diverted to a page that he thinks got botnet-modded-up, then it could be flagged and somebody could go have a look.

Or better than all of those, I think, would be having the user do an unobtrusive task every time they want to mod something, the way you have to do when you're registering for an LJ or buying tickets on ticketmaster or whatever. "Read this hard to read nonsense word" or something like that.

Date: 2005-10-16 07:44 pm (UTC)
From: [identity profile] rusty42.livejournal.com
Well, you could certainly check for pages getting suddenly and unusually popular -- have some metric for finding outliers. You could do a bit of processing on the pages and see if they're selling something. Maybe if a user finds himself diverted to a page that he thinks got botnet-modded-up, then it could be flagged and somebody could go have a look.

well, a flood of traffic is easy to recognize, but what if a botnet has been programmed to randomly wait between mod-ups? (say, 1-3 days, about the same time for dynamic IPs to roll over...)

Or better than all of those, I think, would be having the user do an unobtrusive task every time they want to mod something, the way you have to do when you're registering for an LJ or buying tickets on ticketmaster or whatever. "Read this hard to read nonsense word" or something like that.

ah, a "Completely Automated Public Turing test to Tell Computers and Humans Apart"? well, botnets have plenty of CPU power to break captchas algorithmically (http://www.cs.sfu.ca/~mori/research/gimpy/), or, hey, just pipe the captcha into the bot-infected guy's browser and have a real human pass the test.

now, all this mumbo-jumbo is hypothetical and probably wouldn't affect a small site like del.icio.us, but if google started doing it, there would definitely be abuse (http://money.cnn.com/2004/12/02/technology/google_fraud/?cnn=yes).

Date: 2005-10-17 05:06 am (UTC)
ext_110843: (juggling)
From: [identity profile] oniugnip.livejournal.com
Well said, sir.

... or, hey, just pipe the captcha into the bot-infected guy's browser and have a real human pass the test.

Very, very good point. I'm trying to think up a good replacement for captchas, but that is a really difficult problem.

*considers* Anyway, this is what you and Tim [livejournal.com profile] neuroticmonk are for, right?

The deep deep issue is that I want a way to mark That One Really Good Mailinglist Post, and spidering the web seems archaic and vaguely wrong what when people are out there looking at stuff all day anyway. Maybe there's some other way to keep it honest. I'll have to consider...

Date: 2005-10-17 05:14 am (UTC)
ext_110843: (removal of signs)
From: [identity profile] oniugnip.livejournal.com
Zot! SAT-style analogy problems. Or anything that requires really heavy language understanding -- maybe something like a reading comprehension question that requires you understand synonyms.

Date: 2005-10-17 11:49 pm (UTC)
From: [identity profile] rusty42.livejournal.com
like, a multiple choice problem? one in which a random choice is correct 20 or 25% of the time?

Date: 2005-10-18 03:54 am (UTC)
ext_110843: (removal of signs)
From: [identity profile] oniugnip.livejournal.com
Like a multiple choice without the multiple choices, clearly. Yeah, making all the options available right there is pretty obviously not the right way to do it.

Date: 2005-10-17 05:03 am (UTC)
From: [identity profile] brainfaucet.livejournal.com
Overall, I think Rusty's right. Though sites like Slashdot seem to be okay at minimizing the damage with karma, mod points, unobtrusive tasks, etc. Your user bot-net flagging would probably work well too.

Gimmie a ring next time you hit up the rock gym or something equally fun. 321.277.3899

Date: 2005-10-17 05:10 am (UTC)
ext_110843: (cartoon me)
From: [identity profile] oniugnip.livejournal.com
I'll put your number in my phone :)

But I'm heading to Gainesville tomorrow, to visit that Lloyd kid! Maybe I'll be in town again soon... :-\

Date: 2005-10-17 01:42 pm (UTC)
From: [identity profile] brainfaucet.livejournal.com
Say "Hurro!" to Lloyd, Janice, Cydelle and the bunnies for me. :)

You should be sneaky and develop a search site with user rankings. For your unobtrusive tasks, use your SAT-style questions you'd mentioned. Then start an open source site bent on defeating your SAT-style questions with AI... then benefit from the AI development that going into ruining your site. :P

You could even go further and start an open source site bent of improving the SATesq question building AI to beat the SATesq question answering algorithms... what better way to speed AI development than start a war!?

Date: 2005-10-17 05:37 am (UTC)
From: (Anonymous)
What better spider could there be than the set of users out there?

Namely, google. Simple reason. Speed. People can't bounce through that many pages that fast. Also, how will these anonymized users find pages if someone doesn't spider them first?

What you're really looking for is a question of having users mod sites based on usefulness and interestingness, etc. Only issue there is that you might mod a site really low because it doesn't answer your question, but it will answer mine perfectly well. I vote for language/semantics understanding and search engines that allow meta-searching.

- Tim

Date: 2005-10-17 11:52 pm (UTC)
From: [identity profile] rusty42.livejournal.com
i think part of the problem is that people want an expert answer (i.e. one that can be achieved with a few search terms and skimming the top of the google results) without becoming an expert.

Date: 2005-10-18 03:55 am (UTC)
ext_110843: (condescending unix users)
From: [identity profile] oniugnip.livejournal.com
Not being an expert is no reason why your widget shouldn't work. You can't know everything. Or at least, your mom can't be expected to know everything.

Date: 2005-10-18 06:15 pm (UTC)
From: [identity profile] sydelleofcourse.livejournal.com
Might I have the honor of "friending" you?

Date: 2005-10-18 10:23 pm (UTC)
ext_110843: (juggling)
From: [identity profile] oniugnip.livejournal.com
By all means :)

EXCEPT ONLY IF I CAN FRIEND YOU BACK!!

Date: 2005-10-18 11:13 pm (UTC)
From: [identity profile] sydelleofcourse.livejournal.com
I'd be delighted.

Date: 2005-11-08 03:11 pm (UTC)
lindseykuper: Photo of me outside. (Default)
From: [personal profile] lindseykuper
When you find that one post where that one alert mailinglist member says that exact thing you needed, you should be able to mark it in some way so other people are more likely to find it.

I'm probably taking you way too literally, but there's always StumbleUpon.

Profile

alexr_rwx: (Default)
Alex R

May 2022

S M T W T F S
1234 567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 9th, 2025 09:06 pm
Powered by Dreamwidth Studios