May I present: kompressr
kompressr: make text shorter harnessing the power of acronyms (MTSHTPOA)
Try it out, let me know what you think and if it breaks :) Should be useful for making long papers shorter, by automatically extracting acronyms and using them wherever possible.
Public beta!tm
(running on App Engine, with NLTK!)
Try it out, let me know what you think and if it breaks :) Should be useful for making long papers shorter, by automatically extracting acronyms and using them wherever possible.
Public beta!tm
(running on App Engine, with NLTK!)
no subject
This isn't my area, but I'd think the problem of identifying important common phrases particular to a piece of work (like SE), while weeding out generally common phrases and things that are a concatenation of the two, might be paper-worthy.
It's a good intuition that that's an important problem, but it's way been done. There's a whole literature on finding sequences that commonly appear together (say, in a set of documents, or characteristically to particular documents)... words that NLP/corpus linguistics people say when they're discussing such a thing include "collocation (http://en.wikipedia.org/wiki/Collocation)" and "TF/IDF (http://en.wikipedia.org/wiki/Tf-idf")", if you're interested.