Sunday, July 12, 2009

GSoC: Hackystat July 6-13

This Week:

Server and Database
The server and the database are up and open for business! (Well, in that you can run them. They are not open for business in the public server sense). It is Ivy integrated and mostly lovely to behold. It does not currently support the full API listed on the project wiki, but we're getting there. It implements all those necessary for clients to store information in the graph.

The server does not implement client authentication yet, but it's on the roster of things to do this week. It is also possible that there are one to many layers of abstraction between the resources and the graph, but that can be addressed in future revisions.

Problems encountered:

JAXB ugliness.

I had a really lovely interface created for my XMLRelationship objects that included two instances of a named complex type, XMLNode startNode and XMLNode endNode. However, when I tried to marshall it, the marshaller threw an exception because the node was missing an @XmlRootElement annotation. Adding that annotation works just fine, but apparently it wasn't being generated. Google provided this answer: http://weblogs.java.net/blog/kohsuke/archive/2006/03/why_does_jaxb_p.html

Apparently the JAXB compiler won't add the XMLRoot unless it can empirically prove that the type isn't going to be used by anything outside of that file.

Neither of the fixes proposed in the above blog worked for me, so I had to refactor the XMLNode to be an anonymous type. This makes the interfaces a lot more ugle, as instead of having two separately named instance of XMLNode, you have a list of two XMLNodes, called XMLNode. And instead being able to call

relationship.setStartNode(beginNode);

I have to do

ArrayList nodes = (ArrayList) relationship.getXMLNode();
nodes.add(beginNode);

The bad plurals hurt my soul. I've been looking into how to configure JAXB to generate custom types in hopes that I can clean it up later that way, but it's not a super high priority. Just something that would make my soul hurt a little less.

Not all of the REST API is supported by the server yet. The server code is also a little shy on the comments.

Time Traveling Exceptions.

So, in testing one of my get methods, I attempted to retrieve the user "Eliza Doolittle" from the database, using this uri

"{host}/nodes/IS_USER/Eliza Doolittle"

If you look closely you can probably guess what the problem was. Yes, folks, http calls do not like them some spaces, not at all. So it transferred as "Eliza+Doolittle", and asked the database for the node of that name, which didn't exist. Normally, this would have just thrown a NodeNotFoundException and that would have been handled appropriately, but in this case it threw the exception and through a series of unfortunate events, completely obscurred what was happening. I spent about four hours trying to figure out how something could be not null in the method passing it and then null when it is received... But, fortunately, there was no break in the space time continuum and I eventually got it worked out. Moral of the story is that spaces are a no-no for URIs until I implement + removal in the server.

Neo4J Ivy Integration

Neo was a poor choice for my first attempt at Ivy Integration. Neo4j has no consistent naming convention for their directories and libs in their releases, so Ivy kept trying to download a file that didn't exist. I couldn't for the life of me figure out where Ivy was getting that name from... Still haven't, actually. I solved the problem by instructing Ivy to rename Neo to the name it wanted before it started looking for it in the cache. The rest of the ivy integration was MUCH smoother. However, I would really like to know how one generates the xml files from xslt files.

Twitter App

The twitter app is also up! It sleeps for 15 minutes if there's an unidentified exception, and an hour between each round of polling twitter for changes and sending the information to the database. I am particularly proud of the caching. I was initially very frustrated because it's highly parallel but not quite identical between getting the followers of the Twitter Client account and getting the followers of a particular user. I arrived at a solution that was efficient in its code reuse and linear time instead of quadratic time. I was pleased.

Documentation

Gasp! There is actual documentation up at the socnet google project page! There are directions for installing and building from sources! There are directions for how to start sending your Twitter data to the server! It's full of awesome and wow.

This next week:

Client Authentication in the server. This is necessary so that people can't DOS my server, or fill it with a million instances of RickAstley objects.
Cleaning up server documentation
Hackystat data grabber
Getting Ohloh API key

I'm putting Facebook on the back burner for the time being. Currently, they are being sued about their developer data access policy. Not that this is likely to be resolved soon enough to help me, but I think I can do a fair amount with the hackystat and ohloh stuff.


Check out the shinier project page : http://code.google.com/p/hackystat-analysis-socnet/


0 comments: