alexr_rwx: (Default)
Alex R ([personal profile] alexr_rwx) wrote2010-06-22 01:56 am

May I present: kompressr

kompressr: make text shorter harnessing the power of acronyms (MTSHTPOA)

Try it out, let me know what you think and if it breaks :) Should be useful for making long papers shorter, by automatically extracting acronyms and using them wherever possible.

Public beta!tm

(running on App Engine, with NLTK!)

[personal profile] chrisamaphone 2010-06-22 07:04 am (UTC)(link)
cool! i tried it on my recent LJ posts to get an idea for what it did, and then learned a bunch about my own writing patterns. :) some of the acronyms were pretty awkward; at first i thought because they include common short words but maybe more because they don't fall on natural phrase boundaries. stuff like "dinner at", "talking about", "and the", "i had a"...
Edited 2010-06-22 07:05 (UTC)

[identity profile] lyceum-arabica.livejournal.com 2010-06-22 03:21 pm (UTC)(link)
(laughs) Very cool! ...with some pretty funny pre-tweaking results. I fed it the first couple paragraphs of the wikipedia article on space flight, and it decided that 'Space Exploration' (SE), 'during the' (DT), and 'of Space Exploration' (OSE) were all good picks :-) This isn't my area, but I'd think the problem of identifying important common phrases particular to a piece of work (like SE), while weeding out generally common phrases and things that are a concatenation of the two, might be paper-worthy.

Also, you might want to keep it from acronyming its acronyms:

"Various criticisms of SE (OSE) are sometimes made. SE has often been used as a proxy competition for geopolitical rivalries such as the (AT) (SAT) Cold War. The early era OSE was driven by a "Space Race" between the Soviet Union and the United States (US); the launch of the (OT) first (TF) (OTF) (TLOTF) man-made object to orbit the Earth..."
lindseykuper: Photo of me outside. (Default)

[personal profile] lindseykuper 2010-06-22 05:20 pm (UTC)(link)
*applauds!* I love you.

Bug report!: if I paste in text with newlines, it doesn't recognize previously acronym'd phrases that straddle the line breaks.
lindseykuper: Photo of me outside. (Default)

".dll?BUSINESS_LOGIC=BUSINESS_LOGIC_SHORTEN.ACTION"

[personal profile] lindseykuper 2010-06-23 02:06 am (UTC)(link)
I believe what the kids say is "o_O". Or "...".