login
v2
v1

jmoiron.net

development

writing things in code

I try to write software occasionally. If I ever write about it, it usually means I'm comfortable about the way it has progressed, but hardly ever guarantees that it will be finished.

posts under 'development'

Howdy Y'all from Austin, TX

posted June 19th, 2008 @ 01:10:36

- tags: development , life , travel

- comments: 0

I suppose this post is long overdue in a lot of ways. I've ended my employment at Attila Technologies and started a position at Advance Internet. I've quit my job as a toolsmith/utility/architect and have become a python/django developer. I started my new job not in Journal Square (where the company is located), but in Austin, Texas.

I have some experience with Austin; my aunt Mel, herself fantastic, lived here with her wonderful friends before she passed away, and I've been down here a few times before to visit. A few months ago, really, after not having my contract formally renewed at Attila, and not getting any raise whatsoever, or any offer of any kind of compensation at all other than what I had been receiving, I decided to pursue other options. I decided on Advance, and upon that decision they told me that they were sending some developers down to Austin and wanted me to join.

The long and the short of it is that I wanted to join, too (even at the cost of a week's vacation), and given the flurry of activity and learning this experience has been, I'm glad I did it. We are visiting a contracting company called Optaros who have developed the application that it seems me and a few other developers will soon take over.

Optaros Austin is, to this point, the epitome of a laid back awesome "agile" development environment. The people on our project are cool, varied, professional (in a good way), intelligent, excited about their work, out-going; pretty much everything good imaginable and with the relaxed, calmly upbeat tone Attila lacked. Part of it is probably Austin itself, and another part of it is probably them. I came into the situation here completely lost on both sides (having not worked a day in AI's office and having never met any Optaros guys), so it's been pretty interesting for me so far. I think that I've been able to make a fairly decent contribution all things considered, although I'm not really sure. The separation from the familiar for me has been very good, and seeing the architecture and the way in which they go about development have given me a lot of ideas on how I will want to work upon my return to New Jersey.

Good of Society?

posted May 10th, 2008 @ 18:47:37

- tags: development , python

- comments: 0

I've disabled comments because I have some neat captcha-less ideas for how to tell bots to fuck the hell off from my comments section but I don't feel like implementing them for django. The new system (mostly homegrown glue w/ selector, beaker, mako & werkzeug thrown in) has been in heavy development the past few weeks and commenting should be available by the summer. Sorry Mike; now you've no reason to come here.

For a while the bots were just hitting a few old posts about 5-10 times a day, so I'd kill the spam every once in a while, but then I started to get pretty disgusting stuff on brand new posts so now it's gone. It's one of the classic models of digital social interaction at work, I suppose.

Otherwise, I've been hard at work developing the new site. As part of creating the new site, I wanted to develop very specific things and then generalize them out quickly to create a sort of reusable toolkit. That toolkit (which is standalone, basically) is davenport, and it's been getting a lot of love recently. Some couch-specific modeling stuff and hopefully a davenport CRUD generator will be in the future, as I flesh out the backend of the Hot New Shit. I also have to get my act together and send cmlenz some patches for python-couchdb. Busy Busy Busy.

Mercurious

posted April 23rd, 2008 @ 22:49:26

- tags: development

- comments: 0

This past week/end, I set up Mercurial on my server and started a few projects there. One is called slipcover, which I talked about last time. The other project, which I've started more recently, isn't really called anything. It was started as a direct result of me using Hg and I must say that so far I am pretty happy.

That project is called Beaker and it already has a perfectly nice home. Beaker is a session management library for Python that allows a number of different storage backends: memcached, database (via sqlalchemy), memory, files, etc. My project is a branch that has a CouchDB backend extension for Beaker.

In Hg, the "checkout" operation actually creates a branch that is no different in terms of operation from upstream: you can checkout from it, checkin new revisions, etc. If you are more interested about how Hg being distributed helps make this type of operation simple, and are coming from a centralized versioning system background (as I was), I highly recommend you read Armin Ronacher's "Mercurial for SVN Users", which does the job of explaining the philosophy of Hg to people used to svn/cvs admirably and better than I could without lots of effort.

Hg is as of just last month "1.0" software, and has already proven itself in major projects like Java. There is a fairly mature trac plugin and various rcs conversion tools. Although there is no WebDAV support, hgwebserve supports most of the same uses with respects to checking in/out via https, and comes with a revision browsing interface that includes colorized diffs and various changelog views. In the coming weeks, I'll be migrating my old svn repositories to Hg and migrating my trac instances to use TracMercurial.

Slipcover

posted April 19th, 2008 @ 19:42:50

- tags: development , python

- comments: 0

I've been doing a lot with CouchDB and WSGI the past few months, with positive and negative results. I'm finally getting the "hang" of Document Orited Databases (DODB? Rubbish acronym), and starting to understand what is possible and what isn't, how to do one-to-many and many-to-many relationships without having lots of queries. It struck me that although the per-query time in CouchDB is thusfar a lot slower than relational databases, the overall database time for individual webpage loads is far less, and there are far less queries and complex information joins going on.

There are a few reasons for this. The first is that Django is a general purpose framework, and it's ORM is similarly meant to be general purpose. The result is that getting the comment counts for 10 posts on the front page takes 10 queries. This type of a query will be possible in one view in CouchDB when Reduce is implemented, but the only way to do this without modifying your "data schema" is to get all of the comments and count them manually.

Because data is so fluid, it's no problem to add a "comments_n" field to each of your commented-on documents. This kind of application specific update-on-write hack is probably going to become very commonplace as CouchDB gains mindshare and the "best practices" are discovered, not because there are inherent limitations with the view system (or at least, there won't be, once it's done), but because the lack of a schema makes it extremely easy, and it's always more efficient to write the application this way than to calculate it over and over.

In light of all this tinkering, I've done a few things. I've started to take a closer look at Erlang especially of late, as I've started to use CouchDB trunk instead of 0.7.2. I've also done a lot of python speed & feature related hacking with both the more or less official couchdb-python library and with my own re-implementation using pycurl instead of httplib2 which I've called curlcon. Originally, I had devised an httplib-style replacement object that used curl as it's transport layer (Hence Curl Connection) to be plugged into python-couchdb, but it evolved into a sort of experiment to see how fast I could get the http part of CouchDB to run.

The code is now available in my mercurial repository under the "slipcover" repository. As I start to add testing to some of my "web framework" for the next version of jmoiron.net, "slipcover" will become more of a full project whose purpose (besides to run my website) is to be an example of an idiomatic CouchDB/Python web application.

Happy Anniversary: An Introspective

posted February 22nd, 2008 @ 22:40:36

- tags: development , life , python , site news

- comments: 2

Happy Anniversary to me!

6 years ago I started my blog on the now defunct IRIX server attila.stevens-tech.edu. The first post in this blog was made with a bash script that basically used cat & sed to style a quasi-structured text file that I'd shell into the server and create. The server did not offer any CGI services (for fear that the students would screw up and bring down the server, which also handled email), so it would be a year or so until I was able to move my website to PHP.

From there, I wrote two versions of jmoiron.net using the classic LAMP stack. The second one used PEAR:DB for database safety, a thin home-grown templating system based in PHP (both heavily inspired by jeremy mikola).

In May, 2003, I started to learn the Python programming language (This was the beginning of my summer vacation for the Junior->Senior year of University). My very first mention of the language in these hallowed pages was, well, completely retarded: I was "gonna write an interpreter, or a C compiler, or something." It's funny ha-ha.

By late 2004/early 2005, I had ditched the PHP beginnings and had written some custom mod_python stuff to run the site. I ditched mysql at the same time. By late 2005/early 2006, I started writing my own plain-text -> HTML markup language based on MoinMoin syntax. I had almost my whole database converted (automatically) to this script when, in mid 2006, I traded manual mod_py for django and my markup language to markdown.

I quickly ditched the first iteration of the django site and ended up with what you see today; It's been here for over a year and a half! It was by far the harshest transition, because I also made a switch from storing my posts as HTML to storing them as markdown (which I am, of course, now unhappy with); and back then the automagic html->markup filters left a bit to be desired. I am working on yet another iteration with yet another set of technologies; the next iteration (probably set to finish around April) won't even involved SQL at all, and will be the first one without a real solid framework since I moved to Django.

This site is kind of a technical experiment of mine; It's where I express myself both through code, through design, and through words. Hopefully sometime soon, through pictures too. I hope that it will continue to be something that I can hang out there to dissuade future employers from hiring me!

On WSGI, CouchDB

posted January 30th, 2008 @ 01:25:41

- tags: development , python

- comments: 0

pythons on a couch

I've been thinking about WSGI and CouchDB recently, while on the subject of digital inflexibility. First, I want to clarify a few things about what I mean by flexibility with respect to an application, and how the current crop of frameworks approach this problem. If you want to follow this musing well, I highly suggest reading "What PHP Deployment Gets Right" by Ian Bicking; or just his entire blog, and most of the crosstalk on the web about REST, Web Services, and the evolution of the WWW.

How do modern frameworks (Rails, Django, Turbogears, or other "canonical" ones) deal with the problem of flexibility? They don't, for the most part. For one, flexibility is hard both programmatically and conceptually. Secondly, they replace flexibility with simplicity, which is almost always a tradeoff that results in quality. They achieve this simplicity by strictly dividing tasks and then conquering each task by building up a structure around how one is supposed to go about solving that task.

These are not negative qualities at all; one of the things that Rails has gotten right is that design ought to be opinionated. So your task when you go to develop in your now "classic" framework is to set up your REST API (your "Routes" or "urls.py") and your controllers, design your models, and set up your views, and you've got the whole MVC ready to go. The problem is rigidity, repetition, and BigDesignUpFront.

The solution is flexibility. Joel defends BigDesignUpFront, and when you are working with a team on some critical make-or-break software for your company, BDUF might be well worth it for it's benefits in fleshing out potential problems, helping with schedule and cost estimation, etc. But for prototyping, exploring technology, or exploring a problem space, BDUF is deadly. For "agile" or TDD, popular buzzwords that are worth far less than their hype but still provide useful insight for all developers, this is potentially damaging. Coupled with rigidity (in the form of SQL) and repetition (even in DRY espousing frameworks like Django) this is tough to overcome, especially when looking at migrating lots of data to a new application framework.

The first way to overcome flexibility I want to talk about is WSGI, whose design is inspired in part (or so I understand it) by Java Servelets. At it's core, WSGI is a specification for how web servers and python applications communicate; but more interesting (and far more necessary in the statically typed world of Java) it also defines specifically how various python applications are called by the web server. This means that other python applications, given that they abide by the specs, are free to call other WSGI applications themselves with impunity and expect them to work.

The way it's implemented, you need only define the __call__ method to receive 2 passed arguments and return an iterable in order to qualify as a WSGI application. These are incredibly weak requirements on applications, and make many middlewares truly plug and play. What's more, the effort was originally to define a standard that the existing plethora of Python frameworks could all use so that their component pieces would be interoperable with each other. WSGI is still pretty new, and opinionated frameworks like Django are probably not eager to ditch their middleware integration layers for pure WSGI interfaces anytime soon (although Django does work w/ WSGI, I think that's more of an interface between a web server and a Django application taken as a whole), but the proposition of using, say, Django's caching middleware, for any python web application written to conform to WSGI is really exciting.

This gives you flexibility in designing your own "framework" built of hand chosen component pieces. Pylons is essentially a framework built upon PythonPaste that facilitates you in choosing these WSGI middleware components, but I've found some of the areas (particularly the URI routing) to be a little less flexible than I'd like (and, sadly, the documentation is a far cry from Django's). Accepting the dogma of one framework or another does come at a practical advantage; you avoid writing the necessary glue between components. But as the glue itself is agonized over, standardized and simplified, it becomes just another component.

It also gives you another interesting flexibility: the ability to attach applications written completely differently (even in different frameworks) to different URIs at the same site, all of them using the same middleware. This blog works as a Django application; why change it? But my Gallery might be better implemented using other technologies (and I discuss this below); with everyone on board using WSGI, it'd be trivial to attach a different application to handle the '/gallery/' URI space but keep both applications using the same caching, gzip, and authentication middleware. This idea is extremely powerful, because it allows one to select the proper tool for the job and align with whatever tool chain most closely reflects the problem at hand.

What about flexibility at the genesis of the application? Web applications these days deal mostly with the storage and presentation of data. Certainly, the current crop of frameworks reinforce this idea; ditch Django's ORM or ActiveRecord and see what's left with respect to creation of a data driven website. This is where CouchDB, or what I perceive CouchDB to offer, enters the equation.

As a metaphor, lets look at programming languages and type binding as a method of describing and manipulating data. In a statically typed programming language, the structure of data is described explicitly and is enforced by the compiler. You go about defining what a widget is, and then create instances of widget. Methods that would manipulate widgets must receive a widget as their in put.

Where statically typed languages provide subtypes, super types, and other ways to make the definition of what qualifies as a widget more malleable, databases struggle at this. You describe data (in the form of tables, relations, etc) beforehand as before, with each field being a strict type and each table describing some strictly typed record. To alter these definitions, you have to define new tables to make additions to the previously defined record types, and modifying the type of their existing data is not possible.

If you want to act on all widgets, you must be cognizant of other widget like tables. Even if your new widget is exactly alike from the old one, grouping both is either manual or inane and always slow. So how do you do migration? You dump the database, add or massage the types of the new table columns you will be adding, and then re-import. Some frameworks provide tools around this process, but the necessity is fundamentally broken.

The document oriented approach CouchDB takes is much more like having a large, flat, "duck typed" table where you can store anything. You define views of your large data soup that pick out items based on specific characteristics of those items, not on their structure. Want all "things" published on some day? It isn't a problem; everything is a thing. A quick stab at structure is to add a type field that allows you to filter out "things" that match a type string. These things are guaranteed, upon delivery, only to have matched that type string and nothing more. This is a weak guarantee, but weak guarantees buy us flexibility.

In Python, often times functions are described as taking objects that allow certain actions on them; for instance, iterable. Requiring only that an object be iterable is a very weak requirement, far weaker than acting on "anything of type foo". In practice, many functions merely require that the objects they manipulate only contain certain methods or attributes, not necessarily that they satisfy some larger unused type structure. This is a trade off, to be sure, but it's a trade off towards both simplicity and flexibility.

As a concrete example of how this can be useful, lets take my ever languishing gallery application. The goal is to keep in the database my images as well as their EXIF tags such that I could easily perform a search like "Find me all images with this aperture" or "Find me all images taken with this camera." Because I have images taken from at least 3 different cameras (not to mention pictures my friends or family take that I might want to include), and camera makers all add their own types of tags in the "MakerNote" section, I can't have a single per-image "tags" table.

As it stands now, my proto-SQL database has 3 simplistic tables to handle this: a gallery_image consisting of id, title, description, etc; a gallery_image_tag, which is supposed to represent a single EXIF tag consisting of an id, title, desc, etc, and 'gallery_image_tags' which allows me to tie the two together so I can get "an image and all of it's exif tags" in one query. This is straightforward (albeit painfully unoptimized) using the Django ORM, but it's a horrible rigid design that sees me making potentially dozens of database updates for each uploaded image.

In CouchDB, I could simply designate my images as having a type field of "image", and then dump in the tags as key/value pairs. It is as trivial to create views of the type described above of this database as it is to create a view returning all images; while the 'all images' view would map documents based on their satisfaction of (type == image), the more complex views are just as simple (camera_model_name == ...).

Looking to the future, it is also far easier to modify the CouchDB database to allow for new features. Lets look at some potentially interesting features: an algorithm that gauges the color temperature of a photo to group "like" photographs together, such that you can view a "gallery" of dusk pictures, dark pictures, or black and white pictures algorithmically rather than by manual tags. Implementing facial recognition to determine whether or not a picture is a portrait. I could run these algorithms on my database images in batch mode and then simply update each document with their temperature score or their boolean portrait status without ever explicitly modifying any structure. As the temperature scores or portrait statuses are tabulated, they are added to each document and the "gallery" views incorporate them automatically.

Software developers have this kind of wish list view of the future, where writing a web gallery can quickly turn into pushing the forefronts of computer vision technology or spawn a perl to python compilation project. Sometimes these whims manifest themselves as something very interesting or inspiring, and wherever they aren't too critical they should be possible!

Digital Inflexibility

posted January 25th, 2008 @ 18:45:28

- tags: development

- comments: 1

I've been playing with a few ideas recently, stemming from the perceived inflexibility in my blog software and my inability to develop a gallery over the past, oh, year and a half.

I already know what level of flexibility I want; the wiki. No logging into an admin interface and filling out some convoluted form to create a flat page; you go to the URI you want to represent your text and you add it.

Wiki's can afford this flexibility by sacrificing a lot of what makes a traditional website powerful: structure. When you add data to a wiki page, there is no structure to the text within that page relative to the system. When I add a blog post, there is a strict structure: there is some text that is a "body", "title", "timestamp", etc. This structure is very powerful, because then you can treat all posts the same and provide different views on them. This blog post is rendered by the exact same tiny template as any other blog post in any other url that you find on my site.

But in a traditional "backend" driven website, that structure exists in a relational database. Changing databases is difficult and annoying, and gets moreso as the amount of data you store in them increases. More unfortunately, you have to create that database structure at the beginning of your website development, forcing you to think about the exact structure and interdependence of your data before you have anything off the ground. I've spent 12 months being paralyzed by the daunting task of coming up with all of the possible metadata I might want to save on an image to create photo gallery software.

Some of this is tool failure, some of it is personal failure, some of it might even be my very own ignorance. But there is a kernel of structural failure in the very way that the classic "web application" is created. As it turns out, the "new jersey method" of web content delivery, the completely unstructured wiki, is better for most things.

At its core, at its most deep philosophical level (ignoring community driven content, which is outside the scope of this discussion), a wiki is a way of linking a UID, represented by a URL, to a piece of text. The other step forward in wiki is the pervasive use of simple, human friendly markup.

What if you could apply these to more than just text? What if you could give structure itself an easy, wiki-like markup? A wiki's handling of rich content like images or movies is notoriously bad; what if you applied pythonic namespace theory to a wiki so that /!img/foo.png allowed you to "create" foo.png if it did not exist, or /!mov/bar.flv allowed you to "create" the flash video bar.flv at that "location"?

I've been mulling over this recently, and over my own dissatisfaction with django/turbogears/pylons and just SQL in general. I've had some interesting ideas for wiki extension (like pervasive use of DAV, using an RCS backend instead of a FS backend, etc), and looking into "thin" web glue technologies like paste to assist in creating a framework suitable to my needs using mod_wsgi, django's URL routing, CouchDB as a backing store and whatever templating language I end up preferring (probably the fastest one).

Pythonicism

posted October 6th, 2007 @ 03:37:23

- tags: development , python

- comments: 0

I have been programming in Python for quite some time now, and I've been doing it professionally for over 2 years. Despite this, I am not nearly as proficient at the language as I could be, probably because I am using it professionally and have to devote time I could be using learning the language to solving problems.

I find myself struggling sometimes to figure out what the "pythonic" way of doing something is. Whenever I realize a new solution to a design problem or just a regular coding problem, I mull over whether or not it fits the language I am writing in first. When I do this in Python, when I use what the language gives me rather than trying to force it to provide the solution I originally thought of, the results are almost always clearer and faster.

The python daemon I wrote at work in the last few months deals a lot with system state. The Object Oriented Paradigm really shines here, because there is much to share via inheritance, and by extension much to gain. The states I deal with are usually just kept in lists, with objects built around the lists mostly to provide the necessary knowledge on how to create them and occasionally to provide convenient transformations or functionality. There are many places in my program where I want to filter some of these state objects based on an arbitrary parameter; say I want all network interfaces with a last-measured latency under 100ms, or a list of writable data partitions with over 100MB free.

The way I was taught to do this, quite frankly, was terrible. If you have some special list, you are taught to write all sorts of special crap for every transformation you want to allow on that list, and that is that. If you have an object that represents a collection of something, like BagOfFruit, under this school of thought, you'd create some methods like BagOfFruit.filterByColor, or even better, the Bag superclass will have implemented a 'functional' style filter that takes a comparison function.

The "right" answer, in Python anyway, is a lot simpler. You have at your fingertips one of the most delightfully malleable built in generic collection objects in the world of programming languages. Want to filter some items from a list? There's a bunch of easy, short, and agile ways to do that in python code: just grab that list from the object and go. There's 4 ways I can think of, off the top of my head, to filter items from a list in python, and most of them look and read better than adding methods everywhere:

The oldfashioned way:

    mylist = []
    for item in oldlist:
        if item.foo < threshold: mylist.append(item)

The functional way:

    mylist = filter(lambda item: item.foo < threshold, oldlist)

The itertools way:

   mylist = list(itertools.ifilter(lambda item: item.foo < threshold, oldlist))

And the new way:

    mylist = [item for item in oldlist if item.foo < threshold]

The old way is the way you'd think to solve this problem if you were a programmer who did not know python. You think about what know you can do and what you need to return and you go about creating it. You need a list, so you make one, and then you add to it everything that meets your conditions. Even though this is a very manual way of doing this in python, it's still a very useful level of abstraction over C/C++: no manual iteration.

The functional way is so named because map, filter, reduce and it's ilk were created historically for programmers used to that paradigm. Functional programmers deal mostly in data transformations (that's what functions do, since there's no state: everything's a transform), and as such they already had "patterns" on how to deal with many of these "I have a list and I want to do something with it" problems. Unfortunately, if you don't know python, and you aren't from a functional programming background (which is probably true), you'd have to reach for the documentation on 'filter' to know what was really going on in this code.

itertools is a python module bundled with the distribution that provides the same functionality as a lot of standard python functions, but returns an iterator instead. Although it's not the case in this example, if you were not planning on going through every item in the original list, and if that list was very large, using ifilter to filter your original list might be a very large time savings!

Now, on to the "answer." The new way is the best for lots of reasons. It's short, uses the list literal syntax in the creation of mylist, contains only semantics about the creation of the list (no book keeping or comparison function creation), and is more flexible than the filter, since you can store permutations on 'item' in the resultant list trivially (you can do this by composing filter & map, but if you do it in the straightforward way, then have fun iterating over the whole list twice).

But, back to my original question, which way is fastest? I wrote up a quick little test of dubious scientific quality, and here were the results:

running oldway() 100000 times ... 2.91310501099
running functional() 100000 times ... 3.14215993881
running itertools way (iter_) 100000 times ... 3.90754389763
running newway() 100000 times ... 2.10518980026

Not only is the list comprehension cleaner semantically than the other 3, but it is a lot faster. What if you wanted to iterate over the filtered list and do some more complex operations on the filtered set? This is presumably where itertools would be the Right Way (tm), but it looks like it's almost twice as slow as the comprehension. Indeed, when I added code to the comprehension to save the list, iterate over the whole saved list (but do nothing), and then return the filtered list, it still ran 100000 in only 2.61785793304 seconds.

The real importance of this all isn't that list comprehensions are the fastest, it's that their semantic purity is not a performance tradeoff. They really are the most pythonic way to approach this particular problem, and they happen to be the fastest. I ran into a similar problem with some timestamp printing code in a small logging library I wrote:

    tz_adjust = (time.gmtime()[3] - time.localtime()[3]) * 3600
    
    def default():
        t = time.time() - tz_adjust
        ms = ("%.2F" % t).split(".")[1]
        return time.strftime('%H:%M:%S', time.gmtime(t)) + '.' + ms

Since this function would format the timestamp for every logged message, you had better believe that I tested the crap out of it to make sure it was the fastest I could manager. I was dismayed that the time module didn't give me anything better to work with than time.time as far as getting microseconds; as you can see from the code, not only do I have to convert the value (which is seconds since the epoch w/ 6 decimal digit microseconds) in order to get the time, but I had to pre-calculate the timezone since time.time() apparently always returns it's value in GMT/UTC.

Later on, I found out (writing something else where I needed to deal with dates) that the datetime module has a function/object called datetime.datetime.now(), which returns an object that has the current local time and microseconds in easily accessible attributes! I re-wrote my function as follows:

   def default2():
       dt = datetime.datetime.now()
       return "%02d:%02d:%02d.%s" % (dt.hour, dt.minute, dt.second, str(dt.microsecond)[:2])

It ran faster, it was less code, no more fooling around with timezones. What more could I ask? Correctness would be nice. This code has a bug in the way it displays hundredths of a second. While thinking it over, I remembered something I had noticed while goofing with the old code: in python, even if you think a numeric task is going to be complicated, it's almost always faster to stick with integer operations than to convert to a string. I only wanted 2 digits (hundredths), and the conversion would be as easy as dt.microsecond/10000, so I rewrote default2() to use integer division. Here are the times for default, default2, and default3 over 1000000 calls:

  • default: 14.675798892974854
  • default2: 10.18687105178833
  • default3: 9.7911970615386963

My code now basically looked like this return "%02d:%02d:%02d.%02d" % (dt.hour, dt.minute, dt.second, dt.microsecond/10000); far cleaner than the original, actually correct unlike the second one, and the net speed increase was around 33%. Unfortunately, in both of these examples, the multitude of possibilities obscured the "right" solution; hopefully, with Python3k moving forward, and my python skills moving with it, some of the standard library can be merged so that a good mental coverage of it will be easier.

The formality of formal software engineering

posted September 27th, 2007 @ 02:23:52

- tags: development

- comments: 0

I am taking a class in Quantitative Software Engineering. It's a discipline that likes to throw around all these numbers, but is mostly posturing as something more defined and understood than it really is. The professor is smart and encouraging and I like to write on the discussion boards.

I think it is interesting and worthwhile that the actual foundation of the whole discipline is in question, and that this is being discussed. Too often people believe everything they read, especially when it seems very complex. Here is something I wrote today in response to: "There is an active debate on the utility of Formal Methods for Software Engineering. How do you stand on the issue -- are Formal Methods worthwhile?"

I am a cautious skeptic who sees the necessity of formal methods but is unhappy with their current state.

My main complaint on a lot of the formal methods of software engineering is that they are not derived formally in the classical sense. There is no mention or even attempt at doing controlled experiments; at applying the scientific method. Instead, anecdotal evidence is offered, and in the utmost perversion, statistics and case studies that fit (or even don't fit) the models are offered as evidence of their veracity. UFP = 4I + 5O + 4E + 10L + 7F. Am I supposed to believe that?

Why? It might be a reasonable model, but I think the presentation of it wrongly postures as formality rather than embracing the rich history and ingenious intuition it is built upon. Brooks' treatment of the subject seems to fall in line with what I am referring to, and the quest for formalizing his insights has, I think, largely proven empty handed.

I also see no evidence that "great" software is designed by a conference of users and software experts in an iterative process. In fact, I don't see any evidence that anything great was ever designed in this way. One cohesive determined vision by a skilled and enthusiastic designer always seems to come along and steal the carefully committee designed software's lunch. As a software engineer, it's my goal to be involved in the creation of "great" software.

That having been said, there's no way that a chaotic process of "visionary design" can be repeatable by enough people to create the volume of software solutions that are necessary (or desired). Frank Lloyd Wright conceived of the Guggenheim after lots of personal effort and vision, and you won't see such a masterful structure coming from New York City's public housing commission; but more buildings are required than the masters can design, and they are built in all levels of competency. I think that the act of introspection into the software engineering process is very interesting and yields a lot of insight into how to make the process more smooth; not all software needs to be great, just like not all buildings have to be great. But, since most software is so lousy, it would behoove everyone to adopt and experiment with formal methods that can improve the bottom line.

The question then, along side with these assertions, is how do you allow for the creation of great software when you propose this top-down "anti-greatness" structure? Software engineering books seem to go to great lengths to move the goal posts and attempt to modify the meaning of "great". If that's the real consequence out of all of this work, I'm not sure it was worth it.

Even though a small bit of process is a great improvement on the cowboy coding you might think I have been describing fondly, I think it's a fallacy to assume that a much greater amount of control and formal process around software engineering would yield much greater results. This is all somewhat paraphrased from Alex Martelli, but there comes a point where the management process takes on a life of it's own and absorbs boundless energy in a self perpetuating mess of documenting insignificant minutiae.

I think in the end, like most people, I favor formal methods that provide me with the language and tools to describe processes I understand or use intuitively. Along with language and vocabulary comes understanding, more intuition, and more complex formulations. Any sufficiently complex area of study, whether it be biology or social sciences, develops it's own precise language out of necessity; such a foundation is required in order to build more complex models of reality. It was only after a comprehensive language was laid down about computation that it really started to explode, even though it was many decades after the possibility was first postulated. I favor methods that take the vocabulary seriously and favor precision, not the bizarre collection of acronyms and abbreviations in the current literature that look more like the result of a marketing department.

Consistency

posted July 20th, 2007 @ 23:11:22

- tags: development , life

- comments: 0

I just came off a day at work, a week at work, that was legendary in frustration and triumph. I'm pretty sure that most of the details of what I struggled through, heroically, are confidential at worst and tangential to an NDA at best, so I'll gloss over things in generic but jargon laden terms.

As most of you probably know, when left to my own devices for a software project my first thought is usually to use the python programming language. I find that most of the software that isn't already written out there requires a higher order of thought, or a higher level of abstraction to write comprehensively and with some level of correctness. If your software is something like 7 years old, then maybe "dirtier" languages (and by that I mean languages that have side effects, force you to get in the trenches, and have libraries where you must initialize externs or globals, or prepare inputs, strip outputs, call functions in a certain order beyond the normal init and shutdown are concerned, know idioms in order to accomplish simple tasks, etc.) are going to work just as well. But I try to pick the right tool for the job, and it just so happens that a lot of times my job fits Python pretty well.

My work environment is both colorful and diverse in personality and language of choice. Me, an italian-american redneck in training (Greg), and two frenchmen (Cheech and Nico) have been battling it out with editors, compilers, virtual machines, interpreters, kernels, packaging, and just plain old getting things done for weeks now. We started this endeavor with an idea, a proof of concept, and a 2 year history of knowing that both of them worked in practice. What we've done since April is plain and simple mercenary programming, pounding out a system built from the ground up to work with reliability, stability, and extensibility in mind, delicately balancing academic techno-mathematical purity with necessary engineering tradeoffs so as not to wind up with a beautiful piece of slow code 8 months too late. And this week, we finally landed on the moon.

Our 4-part architecture fit our 4-man team quite well. Greg (a Java guy) tackled the part where we knew Java had been used historically, and knew that Java was a good fit. I (a Python and generic 'systems' guy) tackled the tough integration and communication glue problem. Our pieces are designed to dance delicately in some geeky wet-dream of function and semantic decoupling, and by now they are pretty damned good at it. But we got communication and logistics problems out of the way a long time ago. I spent over a week working on a mysterious bug in a library we were using that caused segfaults (of all things) from python code from only one linux distribution (of 4 that I surveyed). But we had it pretty much down and functional about a month ago. But the past two weeks have been something else altogether.

We were tasked with porting old working code from the legacy "demo" days to the new API we had been developing. No more exceptions, cheap 3 line profiling, or even sane string types for us, oh no. The new task for this pair of mercenaries was untangling a twisted web of casts and somehow building something connection based on top of a broadcast/receive foundation that was prone to crumbling and still very much in flux. And for almost 3 weeks, armed only with our confusion, gdb, and our constant badgering of cheech, we struggled (off and on, occasionally revisiting the sanity of our self-architectured products and cleaning them, perfecting them) to make this work. And today; today, our efforts were finally vindicated.

Greg, always colorful, had many epithets locked and loaded to describe the week. It was pretty exhausting mentally, but we're done. Kinda. We have the smallest, working system that enables us to move forward. Smallest there is actually quite important: there are two types of large software systems, those that evolve from small systems and those that don't work. We plan on making ours work.

裏切者

posted February 3rd, 2007 @ 17:40:42

- tags: development

- comments: 0

Unfortunately for my family my updates recently are not about me, but about things I'm doing. I've been pretty prolific as far as coding is concerned thusfar this year, both writing programs at work and on my own. Because I spend so much time at my girlfriend's place, and she has a windows machine, I've had to impose a few hopefully non-annoying programs on here, but I finally found the last missing piece: PieTTY. Now that unicode things work in my console, I'm able to develop on her computer and actually verify what I'm doing.

I've actually spent quite a lot of time in the past few weeks investigating Japanese character encodings and writing code that deals directly with the unicode spec and EUC_JP and the translation both of Kana to Romaji and of wide-format unix characters to normal format unix characters (for fuzzy string comparissons). Character encoding is actually pretty exciting when you "get it right" (one of my dreams is an operating system that correctly displays any textual information; it's somewhat impossible without giving it some hints, but at least having all unicode glyphs and a prioritized list of encoding guesses would be nice).

Here's a rundown of some stuff I've been hacking on:

  • a google map mashup at work w/ sensor information and map overlay stuff (for all intents and purposes this is private, so no links)
  • a patch to python-romkan that makes it work w/ utf-8 instead of euc-jp
  • a ton of updates in a code sprint earlier today to pyexif
  • a lot of changes to the nds libraries including finishing ndstool compat on the header
  • created the efteep project page

uromkan

A few notes on 'uromkan' (utf8-romkan):

romkan is a seemingly popular perl module that can convert between romaji (either entirely hiragana or entirely katakana), but coming from the perl (and ruby) world, it was targeted towards some non-unicode text encoding and was riddled with unnecessary regular expressions. Some of them were so thick my mind couldn't actually penetrate them, but it seem to work.

In [5]: print uromkan.hirakata(uromkan.romkan('aisu kuri-mu'))
アイス クリーム

In [6]: print uromkan.romkan('uragirimono')
うらぎりもの

In [7]: print uromkan.kanrom('そうして')
soushite

In [8]: print uromkan.romkan('ficchi')
ふぃっち

(That last thing is nonsense, but I just wanted to show that small-vowels and small tsu consonant doubling is working)

Fiddlesticks

posted January 22nd, 2007 @ 11:22:00

- tags: development

- comments: 0

You might have noticed that the word finished was in quotes in my previous post. This was on purpose, and anyone who is a hacker (or a perfectionist, with, say, deadlines) will know why and roughly the content of this post. Ninrename (which is a terrible name that is more and more nondescript of the programs capabilities) has been expanded upon quite a lot in the past week (or two). Features are being added left, right, and center, and the desire is there to make it actually very good and not just a one off.

So far, I've added lzma (7zip) support and limited header reading support. Before I "release" again, there are some institutional things I need to prepare, and I need to tourniquet off some features that will take me a long time to get right. I have most of the stuff parsed out from the header (thanks to the source of ndstool), but I still need to figure out how all the internal checksums are working. I also want to start working towards merge support, a-la goodtools, but by building the merge database programatically rather than pushing out hundreds of versions. Software should have a life cycle that ends when it works, not when new information is no longer available.

During the course of writing the recent version, I put all of my extraction routines into a library. It turns out that rarfile's error suppression didn't work quite right when it was a bit down the import chain, so I went in and modified it to stop using the quasi-dangerous os.tmpnam(). Since it was using tmpnam to actually put a rar file on the filesystem and then use the unrar utility, I used the new (since 2.3) module tempfile to get around that vulnerability. If you are interested, the patch to rarlib.py is available.

The work done on that script has made me want to do a lot more work with compression and crc, because it's an annoying manual process that I've wanted to make simple for a long time. File-roller is nice but it's drag/drop is incompatible with Thunar. I might want to just get my hands dirty and fix that, but compiling gnome applications is extremely annoying and I'm not really all that thrilled about the File-roller user interface anyway. My own pyunarc archiver still seems comfortable but the UI of an archive utility is something that is no longer cut and dry to me. I think some studies are needed as to what the most common need is; to extract the whole archive, or to extract particular files? I centered around the former as the overwhelming majority of use cases when conceptualizing pyunarc/Archive, but I'm not so sure that's the case anymore.

I've also been working on another google maps mashup, again for work. This time it's going to be a bit more complicated, but thanks to the version 2 API all sorts of things that would have been ugly nasty hacks are now easy (like custom overlays and the polygonal API). This time around a lot of the data I'm playing with has to be private, so I have to figure out how to not include it in the mandatory public version of the map. Thanks to the hacking I did on my todo list, javascript is pretty natural and I feel comfortable writing and debugging it. I don't even remember the dark days of js development before I found firebug, and that's a good thing because it wasn't a pleasant experience.

ninrename

posted January 11th, 2007 @ 00:18:46

- tags: development , games , linux , python

- comments: 0

Just "finished" hacking together some code that started out as a small desire and has ended up an obsession of sorts. Initially, it was just going to be a smart, specialized file renamer. It has ballooned into a poorly written (but fairly solid) beast of a program with crc checking and unrar/zipfile support. When I say it's poorly written, I mean it's not beautiful like my xdccq module is, for instance. It isn't elegant in the least, does things in a way that is acknowledged as poor design decisions, and the main dispatch is a giant ugly conditional mess. But it works pretty well!

The script takes a ClrMamePro formatted rom release list and a local directory as arguments (with various options available), and can perform hash checks and smart renames of files to 'Official' names. You can find such formatted files at advanscene or pocketheaven. There is a special switch on the program that removes several hardware dumps from the pocketheaven list and outputs a 'clean' list which will match up more with the 'scene' numbering system.

There are quite a lot of programs like this available, but this one is command line, written in python, runs in linux, suited to logging, and written by me. You'll need the rarfile module written by Mario Kreen (thanks Cheeseshop!) if you want rar support (or if you want the program not to throw an exception when it doesn't find it). You'll also need cksum.py in the same directory because I am feeling quite lazy this time around. Even though there's nothing in there (even rarfile has a windows equivalent wich wraps UnRAR.dll or something) that is overly platform specific, because 1. this was a oneoff, 2. I run linux, and 3. I don't like Windows anyway, this won't run in Windows and I have ZERO impetus to change that.

Anyway, the help screen (options):

usage: gbaname.py [options] datfile [romdir]

options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -c, --crc             perform crc checking
  -r, --rename          perform rom renaming
  -p, --pretend         do not rename but display actions
  -m, --missing         show missing roms
  -g RANGE, --range=RANGE
                        perform only over range (#-#)
  -n, --nds-clean       remove MAX Media Launcher, reorder, etc
  -v, --verbose         ever present verbose mode

Features:

  • looks up files by release number and correlates them to your release info dat
  • performs actions over a range of release numbers (rather than all files)
  • renames files according to the dat
  • performs crc hash checks on files (.gba, .nds, .zip, and .rar supported)
    • if a file fails a hash check and renaming is active, (!) is recorded in the filename
  • clean pocketheaven NDS lists (this feature is pretty dangerous; it will mangle non-PH lists)
  • show a log of 'missing' roms (missing from the current list of known dumps)

If you still want it (it is pretty to look at at least), you can download it. There are some useful recipe's in there, even for what's basically an elaborate one-off... too useful not to throw up on the internet.

There are tons more features I can think of that I'm in no hurry to implement, including a feature that was initially planned which is doing fuzzy string comparisson (using normalized levenshtein distances) to figure out what file is what rather than release numbers (which, afterall, can be wrong). This would provide a great deal of safety to the whole thing (although if your archive's crc files match up it's pretty straightforward to modify this script and fix the names). I won't make any ridiculous patronizing remarks about how this is for your own backups etc and so forth, since most people who are looking for something like this do not own 2600 gba games and fret unduly over their private dump collection. I will add (interestingly) that, using my own linker (which dumps as well), I was able to dump a bad image of my broken Metal Slug Advanced cart and good (matching the 'release' crc!) images of Final Fantasy IV and Megaman Zero 2. So.. hey!

The Boo Language

posted November 30th, 2006 @ 00:36:00

- tags: development

- comments: 0

On friday November 17th, I started the first of what will be 2 two week vacations in 2 months. That night, sitting at my desk in the office at 7:00, I decided to fire up monodevelop and try out the Boo Language. I had been using Banshee since fixing the support for it in my xchat irc script, and was somewhat impressed with the startup speed of mono gtk applications when compared to Python. Although I had just finally gotten a pygtk app to run in both windows and linux (unmodified! it was kind of exciting) I decided to go ahead and play around with the CLR's library a little.

One of the things I noticed immediately upon learning Perl (and of course Python) is that I was able to write programs extremely quick and with fairly few bugs even though my hand was not being held by a static type checker. As far as I know, there isn't any serious research being done as to why this is (other's have experienced it as well), but a few people smarter than myself, notably Bruce Eckel, have made some educated guesses. Eckel, an experienced Java developer who had written books on Java, said that the dynamic typing enabled rapid prototyping by getting nagging details (like types) out of your way.

Boo has a decent type inference system that allows you to skip over most type and variable declarations. This might not save much time really, but it saves two things that are extremely important: mental real estate and spatial real estate. You don't have to think about what variables you need and what types they must be. While exploring an API (which is what you do when you learn a language) this is paramount; often all you want to do is store the result of a method call and print out some information about it. If the compiler is checking that you have created the right type and knows when there's an error, why can't it just determine the type itself?

The spatial real estate is also very important. Less code is simply better in almost any scenario, the exception being when the smaller code is obfuscated or orders of magnitude more complex for small spatial gain. This argument is somewhat ignored but it's important.. when you're searching for a bug, the less you have to read the better. It's almost always faster to understand 1 line than 3, and it is always faster to understand what's going on in one section of code when everything that pertains to it is nearby.

Boo saves you these, and the semi-colons, and gives you compile time type checking and optional duck typing. The optional duck typing in boo lets you escape the type checking net, but I found myself not using it because the introspection in CLR is much less handy than the introspection in Python and since the compiler keeps track of types it's usually just as easy to insert a cast to the right type as it is to insert a cast to duck.

Without knowing .NET at all, I put down about 330 LOC in 2 days. There was some gui code in there, and there wasn't a lot of intellectual heavy lifting, but the equivalent in C++ or C# would have been far larger, and that really does matter ... really! One of the nice things about the CLR is that I'm free to use software that has been written in the radically more popular C# without really doing anything to it. The bad thing is that typically the libraries written in C# are either of low quality, restricting license, or charge a fee. The OSS Python ecosystem is actually much healthier, even for simple 25 year old networking protocols.

Evil lives in the skin

posted November 15th, 2006 @ 00:30:00

- tags: development , life

- comments: 0

A month ago while I was visiting home I put on my girlfriends glasses and lo, I was able to see much better than normal. I had noticed slight deterioration of my vision about a year earlier, as signs that were far away were slightly fuzzy, but I saw far better with the glasses on than without. This was a bit alarming.

I still haven't seeked a specialist to solve the problem. Those that know me really well know that I'm attracted to women with glasses. The real reason is that I'm attracted to people smarter than me, and people with glasses look smarter (It'd probably be interesting to investigate why, psycologically, this is the case, but most people probably agree with me here). I don't necessarily hate the idea of wearing glasses myself, but when I put them on and move my head around, the world swirls about in a focused fisheye that makes me dizzy as hell. The last few days have been considerably less stressful than the previous week and I actually am seeing quite a bit better: no more tunnel vision, focusing is coming easier, so I'm putting off a trip to the eye doctor for a bit.

While I haven't been studying my ass off or memorizing my ass off, I've been coding (my ass off). Although a lot of progress has been made on saudade (all of which can be seen), the stuff I've been doing recently has been mostly dealing with xchat scripts. Most of the hits on the devsite have been in search of "xchat", "xmms", or both. In response, pymp, a simple xchat mp3 announce & control script, now has support for 5 media players: audacious, xmms, beep, banshee, and juk.

The pymp code is actually pretty clean, but there is a lot of code that could probably be done away with. There are a few shortcuts and instances of macro-programming (macro has become somewhat of a dirty word but I mean it in the lisp meta-programming sense) that I am a little proud of. There is a LOT of passing of functions as data and a lot of dynamic code evaluation that saves a ton on LOC (which in general saves a ton on correctness).

I am really getting tired of the current theme, and since the saudade-based site has numerous enhancements (auto-summaries, improved rss feed, blog list navigation, and huge improvements in todo list performance), I want to get it out as soon as possible. I am a bit behind on 日本語の宿題, but I hope I can get the next version of jmoiron.net off the ground soon and start to work on the gallery. I'm shooting for November 20th.

MochiKit v. Scriptaculous, mod deflate

posted November 2nd, 2006 @ 22:52:00

- tags: development

- comments: 0

I spent the last few days worth of free time tearing my hair out with respect to the todo list and it's general lack of performance. The problem seems to be that nested sortables (Sortables that contain other Sortables) are extremely slow in Firefox on linux. In a quest to get out of the prototype/scriptaculous world and also fix my application, I spent 3 hours surveying various ajax toolkits.

The first toolkit I tried out was dojo. The project has the backing of some big boys like IBM and AOL, and it looks to be where a lot of non-rails guys are going. I downloaded the library from the site, but the documentation was severely lacking. Seeing that the basic dojo.js necessary include was a whopping 149k after javascript compression, I decided to check out something a little lighter.

Enter MochiKit, a slim javascript library aimed at making javascript "not suck." I liked what I saw right away; off the bat the most important thing the toolkit stressed is documentation. Not flashy features or some useless wiki, but documentation; a real rarity among javascript libraries.

I read on and watched the screencast. The author seems to be quite taken with Python, and many of the features that they have developed for javascript were in fact aimed at making it more Pythonic. But when I saw their visual effects API, I knew immediately I had found what I needed. MochiKit 1.4 (the soon-to-be-released version) has implemented the Scriptaculous API for Sortables.

After about 4 hours, I had learned the MochiKit AJAX api and had ported most of the todo list over to use it. Since prototype alters the prototype of DOM elements in order to provide its features, after I removed it a lot of code had to be changed. Since FORM.Serialize was also gone, I wrote this generic and naive serializing function:

function serialize(form){
    s = '';
    for(var i=0; i < form.elements.length; i++){
        s += escape(form.elements[i].id) + '=' + escape(form.elements[i].value) + '&amp;';
    }
    strip(s, '&amp;');
    return s;
}

That was essentially all I needed to do to port the application over, but unfortunately, in the end I did not really see any significant speed boost. Worse yet, because I had to change the way I did certain things due to the changed DOM, IE and Opera are broken. The final problem is that MochiKit.Sortable is broken in IE and hides my css styles after dragging. So it looks like I'll be sticking with Scriptaculous for the first release of saudade.

Staring at Mochikit's unified JS being around the same size as the entrance point for Dojo's library, and realizing that my Scriptaculous library includes were ammounting to around 100K themselves, I decided to make sure I was running mod deflate. Sure enough, I wasn't, and after I enabled it I started to do some testing on my own to see if javascript compression is really worth it.

My todo.js file is 13k, and running it through Dean Edward's javascript compression algorithm flattened it to about 6k. This was really promising; over 50% taken right off the top of the file. But the way that javascript compression and obfuscation works also takes away a lot of the efficiency of compressing text. After gzipping the original file, I was left with a todo.js.gz that was 3.7k; almost a 50% gain on the obfuscated version. Gzipping todo-ob.js got me down to 2.8k, which is a savings of about 25% now instead of 50%.

So it really seems like, unless you are trying to stop people from using your code, that enabling mod deflate is the best way to go. For production code, if you are convinced that your javascript compression works flawlessly, then it doesn't hurt (but doesn't significantly help) to go through and flatten your library. It's probably more useful to be able to flatten your script includes into one and save on the ammount of request overhead you need to undertake, but those tests are for another day.

My Day (Friday 13th)

posted October 13th, 2006 @ 01:51:57

- tags: development

- comments: 0

Tonight, a night in which I wrote source code to achieve some purpose, felt good. It felt real good. There are a few things going down around the development-sphere that somewhat might indirectly involve me (or not), and here they are in no particular order:

  • python-markdown 1.6 was released tonight, which will probably fix the ampersand bugs I've been having (you might notice them, say, below this entry).
  • xdccq.py 0.3 was also released (by me) It's a simple local queueing script for the xchat irc client, and it's focus is simplicity and not hammering peoples bots, and it works rather well.
  • I have discovered the cheeseshop, which seems very interesting. I feel like writing something now, in order to have it included in the cheeseshop.

I am starting to have ideas on things to program. Some of them are simple/silly/useless (read: 日本), and some of them are ridiculously farfetched and doomed to failure.

  • xchat scripts:
    • wb script (simple)
    • unify my audacious/bmp/banshee mp3 announce script
  • modify python-markdown to fix (in some clean way) word resolution, ie, "_this should be italics_"
  • my own programming language (similar to boo but machine code compiled)
  • of course, continue on saudade
  • and some ideas i finally had at work for using the 770's we have

This can probably all commence tomorrow. Good times.

Django + jmoiron v. 1.1

posted October 1st, 2006 @ 16:50:13

- tags: development , site news

- comments: 2

くりすましょ~

1.0 of this site (what you might have been looking at for the past few months) was simply the migration from my own mod_python framework to django. 1.1 is the the culmination of about 3 months of work, mostly on the todo list application, but also a great deal on reorganizing the backend to make redeployment easier. There is still a great deal more work to do in that area, and the gallery is still yet to come, but development pacing has started to get pretty fast now that I'm very comfortable with all of django's nuances and idoioms.

The new features for this release:

  • pull entire site under one directory
  • reorganized /media/ to section off admin media
  • implement published bit
  • move urls.py to various apps
  • changed to RequestContext
  • login/logout
  • added edit shortcuts for logged in staff
  • todo app
    • reached 1.0
    • ajax based add/edit/removing of tasks and 'groups' (sublists)
    • color coded priority system
  • various improvements to blog app
    • changed date to auto_add
    • fixed date display

samurai sword (さ)

posted September 29th, 2006 @ 22:37:00

- tags: development , life

- comments: 2

Although it seemed like it might be complete and utter fallacy up until about August 30th, I actually did manage to not only sign up for 2 graduate courses but also Japanese levels 5 & 6, thus saving myself from being a total liar.

Automata and Formal Languages is essentially a theoretical computer science trial by fire, or maybe a theoretical computer science sink or swim test. Japanese 5&6 is a completely natural progression of levels 1-4, but that's probably mostly to do with the incomparable Minamoto-sensei. I handed in Cecilia's homework on thursday, and when Reiko-san asked me if I was there for a textbook, I said: 'ああ、いいえ、ともだちのしゅくだいがあります。' almost instinctively. In class, we've finally covered "the existence of experience" form, which is heavily used (ことがあります), and also the plain form of past and past negative tenses. My kanji reading is up to around 80, which is a something like 4% of what you need to be able to read a newspaper.

I've been djangoing ( v. to code using django ) quite a lot, and also quite studiously engaging in Web2.0 and Ajax development patterns and trying to figure out what was good and what was bad. Although work on the gallery section kinda hasn't really started, I feel much more comfortable with the framework and with "modern" web application development techniques, and I am starting to come up with better ideas on how to build such a system. A system in which every picture's exif tag gets its own row in a table would quickly result in a table with 10's of thousands of elements, but this is kinda exactly what RDBMS' are supposed to be able to cope with. I'm going to ask around on freenode to see if any database experts can give me any better ideas, and it might be interesting to solve this problem in a way that bypasses Django's ORM functions.

It seems like forever ago that I made the new years resolution to not plan anything. I think it's been pretty successful, and considering what was going on during my New Years, I think it was a logical extension of best practices. We're already on the cusp of October, which sounds really late in the year, but there are still a few months of my favorite season (秋, pronounced 'aki', whose radicals mean 'rice plant' and 'fire', is a particularly beautiful/poetic way of looking at it) and a festive month to spend with family. There's 66 days left in the work-year, and being able to take off 23 of them feels pretty good.

RESTless programming

posted September 10th, 2006 @ 04:54:00

- tags: development , life

- comments: 0

I've been in some kind of groove recently. It isn't too productive of a groove, but my mind has been racing like I'm on speed. If only insane hyperactivity came with the same metabolism benefits as methamphetamine. Another drawback is that I haven't come down for the past week, and have been going to sleep on average 20 minutes later each night.

A lot of my time has been going into coding, but what people don't realize is the ammount of perfectionism I suffer from and how much of my time it wastes. Say I write something pretty small; about 300 lines of python code. I test my application on average every 3 to 4 lines of code. This is, in my opinion, way too much, but I can't help it. I usually completely lack confidence that what I have written will do what I want it to, and I often end up unit testing any non-trivial aspect of a program.

So we have a small python module that has been tested at least once per line (that's a generous estimate, usually on a small project I will write about 1.5x the ammount of code that I end up with, since I tend to refactor or polymorphise at the slightest hint of necessity.. this introduces more testing and more bugs). If I actually get it done (usually I don't but in the last year this has started to change), I tend to document verbosely and elaborately. It might not look like much, but getting everything perfect on that page probably took me a few hours. Even though I can rip through html & css like lightning, I had to run the app in question and decide what should go in the documentation. Those were easy.. I've spent 1 week+ on documentation on projects for work in the past. Documentation I knew nobody would ever read.

Now that it's documented and somewhere, my little module has to get blogged about. The code that I've been linking here went through 3 complete re-writes before I even had a satisfactorily working version. I probably spent about 2 weeks thinking about its architecture and about what it should or shouldn't do before banging out most of the code in a single weekend. It stands virtually alone in the collection of software I've written in that it actually makes my life easier, other people have actually used it, and I've actually found and fixed bugs a few months after its "completion."

Lots of time went into creating that small app, but writing it only took me about 8 hours. I think I've been getting better, due in part to a breakdown in my reluctance to use tools developed by others to help me accomplish my task. Still, I've found fit to write something like this even though there are about five thousand apps out there that I could have used. When I sit down to write it, any one of a thousand distractions can impede my progress. For every 45 minute period where I am completely in the zone generating dozens features or bugfixes I spend 2 hours watching tennis, watching EPL, reading news, or blogging about it.

If I'm lucky, I'll figure out what music fits my mood in under 5 minutes, only to tune it out when I get down to business. If I'm lucky, I'm not trying to multitask, or else going to get the laundry will lead into a trip to the grocery store, making dinner, etc.

Wei Alan Tsang, hacker, philosopher, meme-expert and father, advised me in nike like fashion that when I remember I have something to do I should "just do it". The biggest void in my life right now is having never created anything I can look back on and be really proud of. There's no Lucious Anatole Moiron. When this void is filled, the next largest regret will be having only created one thing I could look back on and be proud of, and I'll commence work on the second. That's what it feels like when you can't stop thinking about what you want to do and can't seem to do what you can't stop thinking about. You're just a progress bar in a world that's standing still.

Some EXIF.py changes

posted September 5th, 2006 @ 00:56:08

- tags: development , python , site news

- comments: 0

This one's a quickie. I promised to make available some changes I made to Gene Cash's EXIF.py library. I'm only providing them as a diff patch because ceache is working on improving the performance of the library extensively by using PIL's exif parsing code with EXIF.py's MakerNote deciphering code.

Image.open()
runs way faster than EXIF.py's
parse_file()
, and EXIF.py was choking a bit too often (the last straw was failing to read EXIF 2.2.1 from ceache's canon 30D, whose standards document has seemingly gone missing).

He's hard at work banging out the changes and hopefully once he's done long awaited work on the gallery can start anew. There is already quite a lot going on in terms of development, but I still have to document a lot of changes and finalize some things. I'd say the todo app will be ready to launch in another week or two.

Doing nothing

posted August 28th, 2006 @ 13:57:31

- tags: development , life

- comments: 1

Feels like I've been doing less than nothing for the past 10 days.

I still haven't handed in my study plan and gotten signed up for classes. Today is the first day of the semester. I haven't called japansociety to register for 5&6. I haven't finished my todo list yet (big surprise).

I haven't made any headway on my gallery. I haven't made any headway on anything. I haven't done any research for any trips I'm planning (Brazil and/or Japan). I haven't gone to the city to get my laptop situation worked out, I haven't gotten my Egypt pictures off of my camera. I haven't charged my batteries for my mp3 player. I started writing a small post that was supposed to be a changelog of all of the changes I've made to the site in the development sandbox, but I haven't even added a single feature to the post yet.

I got terribly sick with bronchitis and didn't leave my bed for 3 days. On the 3rd day, I was feeling good enough to do something with that day, but didn't. And that's how it has been for the past week. Feeling good enough to do things, but not good about doing things. Gotta get out of this rut, it's seriously going to disrupt plans.

Knockout

posted August 19th, 2006 @ 05:06:29

- tags: development , general tech , music , site news

- comments: 1

I missed what seems like quite a few get-togethers of jeremy's, and when people asked my why the only excuse I could muster was that I don't like sushi. It's a pretty valid excuse, but I think if the company had been different I would have gone.

Zoro's hard drive bit the dust rather suddenly last week. I lost only a small bit of data, and it could have been better were it not for laziness. It could have been a hell of a lot worse, were it not for laziness as well: I had never transfered the movie files that I took off of my laptop, so a few movie files (mostly of skateboarders) that I had taken in Tokyo were lost, but I also still haven't even removed the pictures from Cairo from my camera, so those are all still safe on my CF card. Irony!

Palomar has a new album coming out someday soon, and although it won't be on the 30th of August, they will at least be playing live at the Mercury Lounge along with another great band Supersystem, who will be releasing their new album A Million Microphones on the 22nd. The following day, Asobi Seksu, who just released their new album Citrus (which is quite good), will play at Maxwells in Hoboken approximately 5 blocks from where I work. I encourage everyone I know to come out and have some fun.

Shifting a music discussion back into a technical realm, I purchased some cheesy $5 sennheiser earbuds when I bought my mp3 player, thinking that I'd just skimp on my commute headphones because they weren't important. But I've been regretting throwing that $5 away on cheap headphones ever since, and if my impulse $65 theft of HD280P's has taught me, a good pair of headphones is money well spent. So I did a little bit of impulsing, while at J&R in the city handing over my laptop for a warranty makeover, and purchased a pair of sennheiser CS 300's. I hadn't read any review of them, but I know that in ear headphones tend to be quite good, and that sennheiser headphones seem to be quite good.

For a few years yet, people in the US won't have usable internet on their cell phones. Until this isn't true anymore, what I did next will become more and more common. I called up a friend I knew would be by a computer and asked him to quickly look at some opinions on the headphones. He told me that they seemed pretty good, and that the price I got was also pretty fair (I consider anything within 15% of online to be fair after you add in instant gratification, and the headphones were $50), so I went ahead with it.

As I walked down the stairs at the WTC path train, I turned on my zen nano and fiddled around with the in-ear phones, trying to get a snug fit in my ear canal. Suddenly, the world around me was gone: the bustle of the financial district was just some unfortunate youtube video that ran without permissions to /dev/dsp, gasping for a hold of the mixer only to have it audaciously yanked away. These headphones sound fantastic, with a full rich bass and enough isolation to lock out all that path train noise. I listened to Interpol's Hands Away (a slow, bassy, quiet song) on my way back to Hoboken, slackjawed and in awe. So, when you purchase headphones, purchase good ones. You might be one of those "CD's and MP3's sound the same to me" people, but anyone would hear the difference between these and ipod headphones.

Just to ensure that this entry has the least ammount of flow and the maximum number of tags, I'll briefly mention some of my recent development work. I've been hammering out some administration side improvements and reorganizing the backend of this site on a development fork. There is the beginnings of some user-facing stuff, including some harmless tweaking of cookies (to remember comment names). I haven't found out a good way to do this yet, but I have a way to edit virtually any type of object from wherever it is displayed. Unfortunately it turns the page into a crap-shoot when I'm logged in, so I need to figure out a new interface for it.

Finally, due to some external pressure, I have been hacking away on the gallery section. I am still a little unsure how I want to store EXIF data or how I want to present the images for browsing, but I think the gallery is going to be a bit flashier and definitely a bit ajaxier than the main site. Most of my hacking so far has gone into EXIF.py, since PIL seems somewhat unable to garner any MakerNote data. Today I successfully hacked in Casio type 2 MakerNote tag handling, and added the PrintIM and other various tags from the 2.2 standard. I think eventually I'll decide on a decent subset of data to keep stored away in the database, and viewing details can strain the server a bit to introspect the actual file.

I'll write more on the hacking and make my changes available after I add in some other cameras. If anyone wants to help me out by submitting some EXIF loaded jpegs from their digital cameras to me, please do so.

The mario challenge!

posted August 11th, 2006 @ 17:21:00

- tags: development , python

- comments: 1

mario!

It started out as an insertcredit forum post, but the promise of extreme difficulty coupled with my complete amnesia about completing Super Mario Bros. 2 made me very interested. To throw some fun into the mix, and as a little academic exersize, I wrote the mario challenge! website for me and some friends to track our progress and win fabulous prizes.

The two linear mario games are very straightforward, and vary wildly in difficulty, which makes them really good for this kind of challenge. Their systems and controls are so tight that you hardly ever feel cheated.. in fact, more often, you feel you got away with glancing off of that koopa-troopa and flying up into the wild blue. Mario 2 in particular has somewhat of a mythical quality about it's difficulty that really makes it quite a pleasure to grind off of. Damn you 2-2, indeed.

Creating the website took a bit longer than 20 minutes, as you might have heard elsewhere. Most anything serious does. As a little test for myself, I ported a simple mp3 announce script to use banshee instead of audacious and timed it. Essentially, the real meat was porting a single, simple, "mp3" object to use one information finding backend to another, and even with the copious code pasting, I was still only able to put down the 95 line script after 30 minutes of frenetic toil.

So, the mario site took from 5 - 7 hours, including creating the html, css, forms, content, designing the database models, and writing the views and urls from scratch. Some featuers:

  • administration backend that can edit all site content and users
  • user authentication
  • milestone system and user modification of personal milestones
  • info pages for most objects created
  • cute pictures of mario

In all, I made 3 models, 3 custom template tags, 2 forms, and about 7 views and templates. With that project I think I have really entered the 'intermediate' level of django development. I'm comfortable enough with the framework to do most anything with it, but I still do not know it well enough to really start saving me a lot of time (developers with deadlines!). For instance, I could have saved about 2 hours and implemented a few more features if I had known about generic views and manipulators. I could have saved another hour or two if I had known about the user authentication system beforehand, generic form creation, and the various scaffolding scripts that automatically generate form template code from your models.

Eventually, I'd like to build up enough familiarity where most of my time is spent figuring out how to organize the app and not how to write it, but the framework looks very promising towards actually approaching this. Thankfully, even the 5 - 7 hours I spent ons uch a simple site was still a wild improvement from other projects i've done: doing the site in PHP with the user authentication and database code would have been a nightmarish 20+ hour ordeal that would still be rickety and sql-injection prone.

Coding's shifting complexity

posted August 7th, 2006 @ 00:38:00

- tags: development , python , site news

- comments: 0

I spent most of today investigating different ways to read EXIF data in Python for the up and coming gallery. A few days ago, as noted elsewhere, I spent a day investigating Mono and writing trivially simple GUI applications in boo. These were kind of liesurely activities, but their usage was immediately relevant to anyone.. even a non-coder. When my images are immediately available to everyone along with the ISO setting, shutter speed, and timestamp, normal people will understand. When I show my parents GUI apps that I have written, they understand its uses.

The complexity of programming is shifting from technical complexity to a more user-centric complexity. I spent more than half of today looking at different existing image gallery software and image gallery presentations. Some css only galleries are actually quite impressive. After quite some time, I finally got something working and grabbing exif data from photos taken from various cameras. Figuring out how exactly to present this (What should I cache, and how? What will I do as far as thumbnailing, fixing orientation, etc?) is going to take quite some time yet; at least some number of weeks.

Speaking to amit about this this evening, he came up (independently) with the same idea that Ceache and I had been kicking around ever since he got his SLR: combine GPS data w/ pictures and integrate a google maps mashup into your gallery. As more and more complex and useful API's are made public and available, the hard part is increasingly knowing what is out there and what you need to leverage them, and not really writing the program.

In the end, PIL did what I wanted but was missing a few vital pieces of information from the vendor specific tags (notably, the ShutterSpeed and ISO number are MakerNote's for canon cameras), so I settled for using (and modifying slightly) EXIF.py. I was a bit disappointed not to find anything simple for using ImageMagick for python (I could have sworn I had seen it), but PIL looks like it does most of the things I will need for image manipulation.

When will you be happy? (Sometime!)

posted May 23rd, 2006 @ 05:47:35

- tags: development , music , site news

- comments: 6

Palomar played tonight at Mercury Lounge with Hockey Night and some 2 piece that I didn't stick around to see. The playfulness and joy that Palomar show when they're playing is totally infectious; Randy told me that they were going to be playing in New Brunswick next week and I am really looking forward to bumming a ride from someone and catching that show. Their new album is coming out "Sometime..."

Hockey Night also impressed with their 70's feel. When they really got into a groove (3 of their 8 song set), they were really enjoyable. I hadn't heard them previously, but they seemed to be at their best when they were incorperating some disco into their stuff. Although it wasn't quite the same feeling as Palomar before them, these guys had fun while playing, and they played tight.

One thing that I've noticed about Japanese bands is that they never seem to be having fun on stage. The ones that live here (and sing in Japanese) probably have trouble getting record deals and keeping down jobs that are flexible enough to make the band work out without friction. Those that are visiting must feel somewhat alienated or intimidated, especially when their English is poor. Note that the Pillows were an exception to this, as were the unfortunately named Doodoos. The reason I bring this up is that I missed out on a chance to see the most solemn of them all, MONO, this past tuesday.

I've been somewhat delaying the announcement publically, partly because there were some people I still needed to tell in person and also because I seem to cry wolf on this kinda thing quite often, but I'll be moving from my current apartment to one in Hoboken on June 1st. Also on June 1st, I start levels 3 & 4 of Japanese (with Minamoto-sensei again) at the Japan Society. If I can read it by next year reasonably well I will be extremely pround of myself. There's actually something else happening at the beginning of June that I'm really excited about, but I don't know if its a secret or not so I won't mention it aloud for now! So now, If anyone (who knows me, or is cute) wants to get together for some drinks sometime in Hoboken, I'm all over it. Part of the reason for the move is so that I can do what I want (hang out and meet people) without constantly wondering how I'm going to get home.

Every few months I claim that I am writing some brand new system to host this site. In my mind it's going to do about two hundred thousand different things each more awesome than the last, everyone will love it, and my backend software will thrust me to the forefront of the literati. In reality, I never get past ugly proofs of concept rife with glaring bugs. I am not sure what this means; am I a horrible programmer, a horrible system architect, do I not have the time or the patience or even the mental stamina to work on a project and not abandon it?

Well, this iteration of the cycle, I decided to learn something that seems like it's really starting to hit critical mass: Django. Django is a python web framework "for perfectionists with deadlines", and somewhat follows the model-view-controller method of development. Of course, every framework these days is loaded with buzzwords that all say "I'll let you separate your data from your logic and your logic from your presentation" (even the ones that I've attempted to construct myself), but the loose coupling always feels like it gets torn around the edges.

Django not only does a good job at maintaining the loose coupling of the system, but also (somewhat more importantly) comes with tools to do the things that are boring and repetitive quick and elegantly. The documentation that exists is quite good, especially their 4 part tutorial, but once you start to leave this scope you are pretty much on your own. #django@irc.freenode.net is a decent source of information to answer quick questions, and the people there (even the Django developers) are very open to ideas on how some things might be done differently.

One last note. Due to massive comments spam (2500!) I've silently disabled the comments. They'll be re-enabled with the next version of the site (HEH) launches sometime early June. Sometime!

Software that gets you laid

posted March 17th, 2006 @ 20:32:58

- tags: development

- comments: 0

Jwz, who wrote or contributed heavily to XEmacs, Xscreensaver and Netscape, said in one of his many insightful blog entries that if you are trying to write social software, your main focus should be "How will this software get my users laid?" I haven't been so concerned with that recently (although it's an area that deserves some attention), but I've come to the decision after 2 or 3 years that I need something to keep track of me and tell me "You will feel bad if you do not accomplish this." I think others have come to this decision, too.

The things that I want to do that I don't do, but still feel like I have the opportunity to do (that's an important clause in all this) mainly have to do with expanding my horizons technically and keeping around these exploits for reference later. Talking to some other people, I think that many of us (CS Grads) go through case after case of aborted personal software projects. I decided that if I had something that was keeping track of some loose, easily managable goals, I would actually write code.

As you might realize, I've been working on redoing this site using some of the techniques I described last month. This is all up somewhere; I was able to blaze through a custom templating system, a new database abstraction object, new feature abstraction objects, and write a wiki-like parser. Most of them are rough and need a little bit of work, but I even have a system in place to assist with migration. A comment cleaner and a new comment system is probalby next; I plan to have most of the new site completed in its current position before "installing" it here.

This work has kind of stalled. It has stalled because I keep literally forgetting about it. I have so many hobbies that I often forget about those, too. "Oh shit, I play guitar!" has been a recurring thought recently. But something that software could actually fix is my dismal record at finishing software projects, no matter how insignificant. As it so happens, most of the stuff that I write at work never really needs to be finished either, but only reach a state of usability, so these things provide no refuge from a perception that my entire history is filled with failure.

I'm not a big software engineer guy, and I don't really want to waste time writing things that I don't think are useful, but I really needed something to manage myself with and that I could check up on my progress. Throwing some code into an SVN repository on this server was a great first start, and I found that rather than going about things at a fever pitch the last month I've been gradually updating small things and letting the bigger picture come together slowly. But the biggest help has been Trac.

Trac is probably my favorite web based software, because it does one job very consistently, intuitively, and simply, and doesn't try to recreate the wheel. It's an integrating glue framework, not some new thing that will try to take over your existing habits. It simply provides a convenient portal to oversee desired features, bugfixes, and code commits on an existing svn repository. It uses quicksilver for HTML templating, a modified moin for its wiki parsing engine, and enscript for syntax highlighting. Trac lets you browse your svn repository, set goals and milestones in a simple roadmap, and provides a simple ticketing system