alexr_rwx: (Default)
Alex R ([personal profile] alexr_rwx) wrote2010-06-22 01:56 am

May I present: kompressr

kompressr: make text shorter harnessing the power of acronyms (MTSHTPOA)

Try it out, let me know what you think and if it breaks :) Should be useful for making long papers shorter, by automatically extracting acronyms and using them wherever possible.

Public beta!tm

(running on App Engine, with NLTK!)
lindseykuper: Photo of me outside. (Default)

[personal profile] lindseykuper 2010-06-22 05:20 pm (UTC)(link)
*applauds!* I love you.

Bug report!: if I paste in text with newlines, it doesn't recognize previously acronym'd phrases that straddle the line breaks.
ext_110843: (happy robot)

[identity profile] oniugnip.livejournal.com 2010-06-23 01:39 am (UTC)(link)
<3!

Working on reproducing...
lindseykuper: Photo of me outside. (Default)

[personal profile] lindseykuper 2010-06-23 01:57 am (UTC)(link)
Sorry, that was a bad explanation! You can repro with this, from my notes from work from Monday:
When the analyzer starts, it displays the message "Starting
emufuzzer_analyser XMLRPC server on port 55555".  It then sits in a
tight loop, waiting for the emulator module to pass it some
information: a machine state and an instruction to execute.  [TODO:
Where in the code does this happen?]

EmuFuzzer starts by both write-protecting and read-protecting all
pages of memory on the real machine.  [TODO: Where in the code does
this happen?]  The emulator sends over instructions, one at a time, to
be run on the real machine.  When an instruction tries to read memory,
we intercept the access via the page fault and go and get the page
from the emulated environment (so we "lazily" grab pages from the
emulated environment as needed).  Then we have the page in the
physical environment, readable but still not writable.
"Where in the code does this happen?" appears twice, so there ought to be a "WITCDTH?" the second time. But because it appears across a line break the second time, it doesn't get fully acronym'd even once. (Somewhat excitingly, I know the answer to both questions now!)
ext_110843: (removal of signs)

[identity profile] oniugnip.livejournal.com 2010-06-23 02:35 am (UTC)(link)
!!!

There seem to be carriage returns passed in from the form? I certainly didn't expect that.
ext_110843: (mighty penguin)

[identity profile] oniugnip.livejournal.com 2010-06-23 02:44 am (UTC)(link)
Should be fixed now! It was the carriage returns. (In this day and age, \r\n? I wonder where they get introduced!)