Tuesday, July 28, 2009

GSoC: Hackystat July 20-27

This week:

Client authentication is FINALLY working! I cannot describe the satisfaction. Currently, the PUT authentication is significantly more meaningful than the GET authentication. I am still waffling on how I should address retrieving information from the database. In some cases, my apps need to access it freely (which is no problem and is mercifully working beautifully), but other users may need to have permissions set up so that they can only access nodes in the database that are connected to their nodes. This will be more difficult. On the other hand, I'm not sure that it's the best solution. It really does depend a lot on who is going to be using it. It might be good to set up sort of levels of access. Default, for a user, is for them to be able to access all of the information that is related to their user node. Then, permissions can be added for them to access the information (read only, naturally) of other users. I don't really know how to go about doing that--I suspect I can just change the user schema to include a permissions lists. Have I mentioned that I really like JAXB?

Continuous Integration:

I surprised myself by being much further along on this than I had initially thought. All of that pain a couple of weeks ago with Ivy was well worth the time and trouble! I do have much in the way of findbugs, pmd, and checkstyle errors, but not nearly as many as I was anticipating (somewhere in the range of 100.) I suspect this comes from having canablized portions of the code from the sensorbase. I'm considering setting the project up as an Eclipse project again, just so I can use the checkstyle plugin, but Eclipse and I get into a fist fight basically every time I try to use it. (I've been using Netbeans, which is a balm to my soul). Hopefully this week I'll get that under control. I am also considering breaking the project out into separate projects once the hackystat app is up this week, so that the clients have separate build files, etc.

Hackystat App

Is going slowly. Man, has it ever been a long time since I've done GUI work. The list consensus on which Telemetry to store... wasn't, although it did bring up an interested research question, of what exactly the telemetry streams a user singles out as important says about that user. This is such an interesting thing to me that I am going to store some standard telemetry streams, and then also store a list of which streams the user thinks are important.

Issues I'm still having:

JUnit tests. Somehow, my tests are not independent and this is causing them to fail. It has something to do with the initialization of the database.

This week!

Getting down to the wire, somewhat. I'm going to start focusing on some visualizations of the data. Basic things, like just showing the network. I don't know how to make it easy to see the relationship data (telemetry streams, for instance, are attributes of the relationship between a hackystat user and a project). So that's going to be a hurdle. Okay, truthfully, displaying it at all is going to be a hurdle. There are a couple of network graphics libraries out there that are open source, so I may look there. There's also Improvise, which I know can display stuff like this...though it's more oriented towards building the visualizations by hand as opposed to programattically, but since it can do it, it's possible that I can use their visualization engine. I <3 open source software.

Going to split the project up this week, as I feel that that's more in keeping with the way hackystat is built, and may be easier to upkeep. The CI stuff goes hand in hand with that--I want to have that all wrapped up by the end of the week.

I'm aiming at having the Hackystat app ready to go live tomorrow evening or wednesday around noon my time.


I have decided where I want to go with this as a tool set. I really like the idea of having an analysis tool for this that is, as Philip says, something of a hypothesis generator for groups of coders, such as within a class or company.

Unfortunately in my experience data mining algorithms require being tweaked halfway to hell before they work, so I'm guessing the inital version of this will not be terribly awesome. I already have a number of standard mining algorithms implemented in a library that zack and I have been intending to open source for a while, once it was cleaned up and documented. Unfortunately the part that works best, the spectral clustering, is based on a very fast, very ancient fortran library (I don't write in fortran), that has been seg faulting mysteriously for about five months now. I have had neither the time nor the skill to repair this.

However, the non-spectral clusterings work splendidly, so in theory, that could be plugged in. The graph specific stuff is likely to be more useful, like the SRPTs and the other thing that I have since completely forgotten the name of.


There are some things that I have just come to REALLY love this summer. Most of them are exceptionally nifty tools to which I had never been exposed during previous experiences. I LOVE the properties files. How clever is that? It just tickles me to death. It makes me want to find more things to hide in them, though I suspect after a while it drives your user crazy.

JAXB. There is nothing nicer than having your code write code for you. <3

Ivy. I am so glad that I tried to build the system before the ivy integration was finished. I feel that I have a much greater appreciation for how truly awesome ivy is.