Scripted text processing

When working on the last project at work, my tasks were often defined poorly. I had to make web services that would accept complicated data structures, but I only had one example file to go by. The file got updated from time to time, but never came as a valid XML – some can hardly be convinced you can’t just write strings instead of numbers or put a “-” anywhere in the file and expect it to be valid. So I had to process every file by hand. And by “hand”, I of course mean regexes or Python, just as any geek would do when faced with an arduous task.

Geeks vs Non-geeks doing repetitive tasks

I should perhaps note that I do have an exceptionally low ‘boredom’ threshold. Started scripting right away.

Since the mistakes were always the same, I would have liked to be able to load the input file into Notepad++, run a script and be done with it. I wanted a tool, ideally Notepad++, that would enable me to write a script that takes the currently opened file, processes it and spits it back out. Convinced someone must have needed that before me, I tried asking on SuperUser, without success.

After this particular project was finished, I found that, of course, there actually is a plugin that does exactly what I need. It is, shockingly, called Python Script, and provides a neat way to add new scripts and execute them, directly from the menu.

Python Script menu

For a quick example, consider you want to shuffle the letters inside words, like in this famous piece of text:

It dseno’t mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae.

To simplify, we expect the resultant words to be separated by a single space, and we won’t cover the corner cases.

Click Pyhton Script | New Script and call it JumbleLetters.py. A new file is created and saved in the plugin’s script folder, and is immediately available from the Scripts submenu.

from random import shuffle 
words = editor.getText().split(' ') #get words from the editor 
def jumble(word): 
  if len(word) < 3: 
    return word 
  w = list(word[1:-1]) shuffle(w) #shuffle the inside of a word 
  return word[0] + "".join(w) + word[-1] #join the words back together and write them to the editor 
editor.setText(" ".join([jumble(word) for word in words]))

So now, when we start with something like

Fuzzy sheep are great companion animals

we can get to

Fzuzy sehep are geart comianopn amianls

in just one click. This simplified version would break on other whitespace characters, punctuation and more, but it proves the point.

Another great extension is PyNPP, which can run the currently opened Python script directly, or in interactive mode, which is great for quick writing and debugging. Both of these plugins are available from the NPP Plugin Manager.

So, all in all, I now have a great toolset, which I’ll hopefully never need again. :)