Tuesday, July 28, 2009

GSoC: Hackystat July 20-27

This week:

Client authentication is FINALLY working! I cannot describe the satisfaction. Currently, the PUT authentication is significantly more meaningful than the GET authentication. I am still waffling on how I should address retrieving information from the database. In some cases, my apps need to access it freely (which is no problem and is mercifully working beautifully), but other users may need to have permissions set up so that they can only access nodes in the database that are connected to their nodes. This will be more difficult. On the other hand, I'm not sure that it's the best solution. It really does depend a lot on who is going to be using it. It might be good to set up sort of levels of access. Default, for a user, is for them to be able to access all of the information that is related to their user node. Then, permissions can be added for them to access the information (read only, naturally) of other users. I don't really know how to go about doing that--I suspect I can just change the user schema to include a permissions lists. Have I mentioned that I really like JAXB?

Continuous Integration:

I surprised myself by being much further along on this than I had initially thought. All of that pain a couple of weeks ago with Ivy was well worth the time and trouble! I do have much in the way of findbugs, pmd, and checkstyle errors, but not nearly as many as I was anticipating (somewhere in the range of 100.) I suspect this comes from having canablized portions of the code from the sensorbase. I'm considering setting the project up as an Eclipse project again, just so I can use the checkstyle plugin, but Eclipse and I get into a fist fight basically every time I try to use it. (I've been using Netbeans, which is a balm to my soul). Hopefully this week I'll get that under control. I am also considering breaking the project out into separate projects once the hackystat app is up this week, so that the clients have separate build files, etc.

Hackystat App

Is going slowly. Man, has it ever been a long time since I've done GUI work. The list consensus on which Telemetry to store... wasn't, although it did bring up an interested research question, of what exactly the telemetry streams a user singles out as important says about that user. This is such an interesting thing to me that I am going to store some standard telemetry streams, and then also store a list of which streams the user thinks are important.

Issues I'm still having:

JUnit tests. Somehow, my tests are not independent and this is causing them to fail. It has something to do with the initialization of the database.

This week!

Getting down to the wire, somewhat. I'm going to start focusing on some visualizations of the data. Basic things, like just showing the network. I don't know how to make it easy to see the relationship data (telemetry streams, for instance, are attributes of the relationship between a hackystat user and a project). So that's going to be a hurdle. Okay, truthfully, displaying it at all is going to be a hurdle. There are a couple of network graphics libraries out there that are open source, so I may look there. There's also Improvise, which I know can display stuff like this...though it's more oriented towards building the visualizations by hand as opposed to programattically, but since it can do it, it's possible that I can use their visualization engine. I <3 open source software.

Going to split the project up this week, as I feel that that's more in keeping with the way hackystat is built, and may be easier to upkeep. The CI stuff goes hand in hand with that--I want to have that all wrapped up by the end of the week.

I'm aiming at having the Hackystat app ready to go live tomorrow evening or wednesday around noon my time.

Vision:

I have decided where I want to go with this as a tool set. I really like the idea of having an analysis tool for this that is, as Philip says, something of a hypothesis generator for groups of coders, such as within a class or company.

Unfortunately in my experience data mining algorithms require being tweaked halfway to hell before they work, so I'm guessing the inital version of this will not be terribly awesome. I already have a number of standard mining algorithms implemented in a library that zack and I have been intending to open source for a while, once it was cleaned up and documented. Unfortunately the part that works best, the spectral clustering, is based on a very fast, very ancient fortran library (I don't write in fortran), that has been seg faulting mysteriously for about five months now. I have had neither the time nor the skill to repair this.

However, the non-spectral clusterings work splendidly, so in theory, that could be plugged in. The graph specific stuff is likely to be more useful, like the SRPTs and the other thing that I have since completely forgotten the name of.

Miscellaneous:

There are some things that I have just come to REALLY love this summer. Most of them are exceptionally nifty tools to which I had never been exposed during previous experiences. I LOVE the properties files. How clever is that? It just tickles me to death. It makes me want to find more things to hide in them, though I suspect after a while it drives your user crazy.

JAXB. There is nothing nicer than having your code write code for you. <3

Ivy. I am so glad that I tried to build the system before the ivy integration was finished. I feel that I have a much greater appreciation for how truly awesome ivy is.

Tuesday, July 21, 2009

GSoC: Hackystat July 13-20

Client Authentication is Made of Lose and Fail.

Okay, so it's maybe not that bad, but I'm definitely having much more difficulty with it than I had anticpated. In some ways I think it would be significantly easier if I had put it in from the beginning, as in having to go back through and make my client and resource code work with the authentication, I managed to break things pretty badly. The biggest difficulty that I have conquered so far was the Mailer. I'm using gmail as my smtp server, and I could not for the life of me get it to authenticate properly. As far as I can tell, the original Mailer code for the server does NO authentication. How is that even possible? Anyway, I tried a couple of different ways to add in the authentication for the mailer, but it was a variation of the following code from GaryM at the VelocityReviews forum thread on gmail as an smtp server that finally got it to work.

public class GoogleTest {

private static final String SMTP_HOST_NAME = "smtp.gmail.com";
private static final String SMTP_PORT = "465";
private static final String emailMsgTxt = "Test Message Contents";
private static final String emailSubjectTxt = "A test from gmail";
private static final String emailFromAddress = "";
private static final String SSL_FACTORY = "javax.net.ssl.SSLSocketFactory";
private static final String[] sendTo = { ""};


public static void main(String args[]) throws Exception {

Security.addProvider(new com.sun.net.ssl.internal.ssl.Provider());

new GoogleTest().sendSSLMessage(sendTo, emailSubjectTxt,
emailMsgTxt, emailFromAddress);
System.out.println("Sucessfully Sent mail to All Users");
}

public void sendSSLMessage(String recipients[], String subject,
String message, String from) throws MessagingException {
boolean debug = true;

Properties props = new Properties();
props.put("mail.smtp.host", SMTP_HOST_NAME);
props.put("mail.smtp.auth", "true");
props.put("mail.debug", "true");
props.put("mail.smtp.port", SMTP_PORT);
props.put("mail.smtp.socketFactory.port", SMTP_PORT);
props.put("mail.smtp.socketFactory.class", SSL_FACTORY);
props.put("mail.smtp.socketFactory.fallback", "false");

Session session = Session.getDefaultInstance(props,
new javax.mail.Authenticator() {

protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication("xxxxxx", "xxxxxx");
}
});

The code highlighted in red I initially left out, because I didn't know what it did. Leaving it out causes the whole thing to fail, so obviously it's important. The code highlighted in green is what is missing from the SensorBase Mailer that makes me think that the SensorBase mailer doesn't authenticate before sending emails.

So, the mailer was a somewhat frustrating goose chase. Now, the goose I am chasing is an "Undefined user null" error. Not an exception, mind you. I think it's a status message that's being set somewhere in code that I don't have direct access to (my best guess is somewhere in the restlet router or guard stuff.)

So, currently, the client authentication has stalled at the retrieving data part. You can register new users all the live long day, and it will send them emails with their login credentials. However, if you try to get information as an authenticated user, it finds you in the database, sees that you're properly authenticated, returns that you are a user and are okay to access the data... and then fails.

Hackystat Client

The hackystat client will allow the users to specify which project(s) and time frames they want to allow SocNet to access, through a simple little gui. (Only contiguous time periods can be selected for a project, so not like, two weeks here and then another two weeks later). The thing left to decide is... which telemetry data to use? What are the most common/most useful analysis?


This week:

Finishing the client authentication is the biggest deal, followed by finishing the hackystat client. I'll be sending an email to the list to get input on what telemetry analysis are the most useful, as well.

Additionally, I'm going to take a crack at the continuous integration stuff Philip has encouraged us to work on, so that will be a couple of new and exciting toolsets. I'm particularly excited about checkstyle.

Sunday, July 12, 2009

GSoC: Hackystat July 6-13

This Week:

Server and Database
The server and the database are up and open for business! (Well, in that you can run them. They are not open for business in the public server sense). It is Ivy integrated and mostly lovely to behold. It does not currently support the full API listed on the project wiki, but we're getting there. It implements all those necessary for clients to store information in the graph.

The server does not implement client authentication yet, but it's on the roster of things to do this week. It is also possible that there are one to many layers of abstraction between the resources and the graph, but that can be addressed in future revisions.

Problems encountered:

JAXB ugliness.

I had a really lovely interface created for my XMLRelationship objects that included two instances of a named complex type, XMLNode startNode and XMLNode endNode. However, when I tried to marshall it, the marshaller threw an exception because the node was missing an @XmlRootElement annotation. Adding that annotation works just fine, but apparently it wasn't being generated. Google provided this answer: http://weblogs.java.net/blog/kohsuke/archive/2006/03/why_does_jaxb_p.html

Apparently the JAXB compiler won't add the XMLRoot unless it can empirically prove that the type isn't going to be used by anything outside of that file.

Neither of the fixes proposed in the above blog worked for me, so I had to refactor the XMLNode to be an anonymous type. This makes the interfaces a lot more ugle, as instead of having two separately named instance of XMLNode, you have a list of two XMLNodes, called XMLNode. And instead being able to call

relationship.setStartNode(beginNode);

I have to do

ArrayList nodes = (ArrayList) relationship.getXMLNode();
nodes.add(beginNode);

The bad plurals hurt my soul. I've been looking into how to configure JAXB to generate custom types in hopes that I can clean it up later that way, but it's not a super high priority. Just something that would make my soul hurt a little less.

Not all of the REST API is supported by the server yet. The server code is also a little shy on the comments.

Time Traveling Exceptions.

So, in testing one of my get methods, I attempted to retrieve the user "Eliza Doolittle" from the database, using this uri

"{host}/nodes/IS_USER/Eliza Doolittle"

If you look closely you can probably guess what the problem was. Yes, folks, http calls do not like them some spaces, not at all. So it transferred as "Eliza+Doolittle", and asked the database for the node of that name, which didn't exist. Normally, this would have just thrown a NodeNotFoundException and that would have been handled appropriately, but in this case it threw the exception and through a series of unfortunate events, completely obscurred what was happening. I spent about four hours trying to figure out how something could be not null in the method passing it and then null when it is received... But, fortunately, there was no break in the space time continuum and I eventually got it worked out. Moral of the story is that spaces are a no-no for URIs until I implement + removal in the server.

Neo4J Ivy Integration

Neo was a poor choice for my first attempt at Ivy Integration. Neo4j has no consistent naming convention for their directories and libs in their releases, so Ivy kept trying to download a file that didn't exist. I couldn't for the life of me figure out where Ivy was getting that name from... Still haven't, actually. I solved the problem by instructing Ivy to rename Neo to the name it wanted before it started looking for it in the cache. The rest of the ivy integration was MUCH smoother. However, I would really like to know how one generates the xml files from xslt files.

Twitter App

The twitter app is also up! It sleeps for 15 minutes if there's an unidentified exception, and an hour between each round of polling twitter for changes and sending the information to the database. I am particularly proud of the caching. I was initially very frustrated because it's highly parallel but not quite identical between getting the followers of the Twitter Client account and getting the followers of a particular user. I arrived at a solution that was efficient in its code reuse and linear time instead of quadratic time. I was pleased.

Documentation

Gasp! There is actual documentation up at the socnet google project page! There are directions for installing and building from sources! There are directions for how to start sending your Twitter data to the server! It's full of awesome and wow.

This next week:

Client Authentication in the server. This is necessary so that people can't DOS my server, or fill it with a million instances of RickAstley objects.
Cleaning up server documentation
Hackystat data grabber
Getting Ohloh API key

I'm putting Facebook on the back burner for the time being. Currently, they are being sued about their developer data access policy. Not that this is likely to be resolved soon enough to help me, but I think I can do a fair amount with the hackystat and ohloh stuff.


Check out the shinier project page : http://code.google.com/p/hackystat-analysis-socnet/


Friday, July 10, 2009

GSoC: Hackystat -- The SocNet Server Goes PUT

PUT works!

And on the first run, too.

GSoC: Hackystat SocNet Server and the Ivy Integration

Today, I tried to build just the server. However, because of the new directory structure, it wouldn't build without also building the twitter sensors and the social media graph. I tried to exclude both from the build script, but for some reason the includes/excludes did not work properly in any way, shape, or form. After beating my head against that for a while, I decided to bite the bullet and integrate with Ivy.

ZOMG, what an ordeal. It was somewhat easier because there were already examples of the stuff that Philip had integrated. However, one of the libraries that I needed (neo4j) has all of the naming consistency of a teeter-totter in a hurricane. So in order to get that to download and install properly, there was an exceptional amount of bother. As it is, the neo4j xml files have significantly more hardcoding in them than I am comfortable with. It took FOUR HOURS to get the bloody thing working. The second library was much easier, in part because they followed a consistent naming scheme, and in part because by that time I had some clue what I was doing. The XML was beginning to develop meaning, the mysteries of ivy were becoming somewhat clear...

Just trying to figure out what needed to be in the XML files was difficult, even with the examples from IvyRoundup. I notice that most of those were generated files, which appear to somehow have been generated with an xlst file. I would very much like to know how that works, as I suspect that would make the job much easier.

For the time being, the ivy modules are stored on my google project page. I would eventually like to move them to IvyRoundup, but as I will have to get permission to do so first, I figured this was a good interim solution.

Thursday, July 9, 2009

GSoC: Hackystat and The Twitter App

Twitter FTW
The Twitter App is up!

Well, mostly, anyway. What it is missing is a more specialize wrapper for the generic wrapper for the REST API to the SocNet server. Write now I have the stubs of such a thing written, and it should be done in the early afternoon tomorrow.

Let's talk for a moment about the wrappers.

The generic wrapper for the REST API is going to throw exceptions if anything goes wrong (a different exception for each kind of thing that could go wrong.) You know, it's a very thinly veneer on the http calls. The more specialize wrapper will do some nice things, such as check to see if an object is already in the database before trying to add it. So that will add some latency and complexity, but it makes using it a little safer and more user friendly.

Things I have learned from the Twitter App:
Caching is non-trivial. In fact, it's bloody difficult.
There are a million things that can go wrong at any point when communicating over a network.

Here's how the app works, on a high level.

Upon initialization, the twitter client asks the SocNet server for the twitter accounts already listed in the database. All of these twitter accounts should be following the Twitter client Twitter account (HackystatSocNet, I think). Then, it asks the server for the followers and friends of each of the twitter accounts in the database. It uses these lists to initialize the cache. The first time around, all these lists will be empty.

Then, it gets the list of the users following the HackystatSocNet account from Twitter, compares the list received from Twitter to the cache, and adds or deletes users from the database and the cache as necessary.

Once it's done that for all of the users, it sleeps for an hour.

The caching ended up being much more painful to implement than I had anticipated. It took me a lot of rewriting to find an implementation that I was really happy with, but I like what I've got now.

It is, however, somewhat minimal in the "Catching Things If Exceptions Hit The Fan" department.

Next: REST Wrappers, and Server Work

I do not think I will have client authentication up on the server until next week. This week's server will probably be somewhat bare bones--handling of the REST calls but not as many of the niceties as the Sensorbase code has at this moment.

Tuesday, July 7, 2009

Reading up on JUnit and listening to Hank Williams.

Monday, July 6, 2009

GSoC: Hackystat, June 30-July 6

Last Week:

The database ended up in a much different (and significantly pared down) form than I had anticipated. Instead of the multitude of node wrappers that I had last week (one for each kind of object in the database), now there is only one kind, and nodes are distinguished between based on relationships. For instance, all "People" nodes are connected to the "People" subreference node. For clarity, here is a picture:



Subreference nodes are in blue, the reference node (database entry point) is green. Relationships are labeled in all caps. This is mostly an implementation issue, but I thought I would touch on it. Having lots of different node types (Plain Old Java Objects wrapping the underlying nodes) is only useful if they are each storing lots of different information. Because as planned, mine weren't (I brought most of the properties out to make them into nodes), it made more sense to kill the proliferation of wrapper nodes. It certainly cleaned up the implementation a lot.

All right, so that is up at http://code.google.com/p/hackystat-analysis-socnet/
It's shiny and commented, too.

The Server

Also this week, the Server work began. I watched the screencast on the design of the sensorbase, and the one on building it from sources...

I built the Sensorbase from source this week. This was actually very easy and pleasant. (I was pleasantly surprised!) Yay, Ivy integration! There was an issue with JavaMail and JAXB, but that was ironed out very easily. JavaMail only comes with Java 6 Enterprise Edition, or something like that.

I edited the SensorBase code so that when you ping it, it responds with a Hello World. This did not require much change, actually--I just removed a lot of code from the Server class, added a HelloPing resource, and borrowed code from the Client class to test and make sure it was working. I will put the changed code up on my google project page, though it's not terribly exciting.

Now, Philip says that developing an API really informs the design of a server, so I have the first draft of that. I based it somewhat off the Sensorbase API. I would very much appreciate feedback on the API. I think I hit on most of what was needed, but could totally have left things out. This will go in my google project's wiki as soon as I learn wiki markup for tables. Currently, it's in the downloads section.








































































































































































































GET {host}/nodetypes Returns a list of all of the node types in the graph
GET {host}/nodetypes/{nodetypename} Returns a representation of the named node type
PUT {host}/nodetypes/{nodetypename} Create a representation of the named node type (admin)
DELETE {host}/nodetypes/{nodetypename} Delete the named node type (admin)
GET {host}/relationshiptypes Returns a list of all the relationship types in the graph
GET {host}/relationshiptypes/{relationshiptypename} Returns a representation of the named relationship
PUT {host}/relationshiptypes/{relationshiptypename} Create a representation of the named relationship type (admin)
DELETE {host}/relationshiptypes/{relationshiptypename} Delete the named relationship type (admin)
GET {host}/clients Returns a list of all of the clients using the server
PUT {host}/clients/{client} Returns a representation of the named client
POST {host}/clients/{client} Updates the representation of the named client
DELETE {host}/clients/{client} Deletes the named client
GET {host}/nodes Returns a list of all the nodes in the graph
GET {host}/nodes/{nodetype} Returns a list of all the nodes of that type in the graph
GET {host}/nodes/{node}/ Returns a list of all of the nodes connected to the named node
GET {host}/nodes/{node}/{relationshiptype}/{relationshipdirection} Returns a list of all of the nodes connected to that node by the specified relationship and relationship direction
GET {host}/nodes/{node}/{nodetype} Returns a list of all of the nodes of the specified type that are connected to the named node
GET {host}/nodes/{node}/{relationshiptype}/{relationshipdirection}/nodes?startTime={tstamp}&endTime={tstamp} Returns a list of all nodes that were connected to the named node by the specified relationship and relationship direction during the specified time period
GET {host}/nodes/{node}/{nodetype}/nodes?startTime={tstamp}&endTime={tstamp} Returns a list of all of the nodes of the specified type that were connected to the named node by a relationship between the start time and the end time
GET {host}/nodes/{node}/relationships?startTime={tstamp}&endTime={tstamp} Returns a list of all of the relationships that were connected to the named node during the specified time interval
PUT {host}/nodes/{node} Creates a representation of the named node
POST {host}/nodes/{node} Updates the representation of the named node
DELETE {host}/nodes/{node} Delete the named node
GET {host}/people Returns a list of all of the people nodes in the graph
GET {host}/people/users Returns a list of all of the people nodes in the graph who are users of the system (ie, those who have added the facebook/twitter apps and are submitting data
GET {host}/people/nonusers Returns a list of all of the people nodes in the graph who are not users of the system (ie, those that are, for instance, friends of users, but are not users themselves.)
GET {host}/people/users/{user} Returns a representation of the named user
PUT {host}/people/users/{user} Creates a representation of the named user (admin)
POST {host}/people/users/{user} Updates the representation of the named user
DELETE {host}/people/users/{user} Deletes the named user
GET {host}/people/nonusers/{nonuser} Returns a representation of the named nonuser
PUT {host}/people/nonusers/{nonuser} Creates a representation of the named nonuser (admin)
POST {host}/people/nonusers/{nonuser} Updates the representation of the named nonuser
DELETE {host}/people/nonusers/{nonuser} Deletes the named nonuser



This Coming Week:

The SocNet Server is top priority. I will be modeling it after the Sensorbase, with a couple of changes. I intend for resources to be loaded from a config file, for one.

Authentication is somewhat of an issue: each client sending data will have a username and password, but what about getting data? There is no simple way for me to ensure that a user who has added the facebook app is the same one who is requesting data. So that is currently an issue that I am not sure how to resolve.

To get a better idea of what I am talking about, here is another picture:



The issue is that there is (currently) no overarching user registration plan. Perhaps such a thing should be added.

Sunday, July 5, 2009

Am writing a REST API specification for the socnet server.

Saturday, July 4, 2009

GSOC 2009: Hackystat and the Hello World Server Ping, and other adventures in Servers

Today I made the Sensorbase server code respond to a ping with "Hello, World!"

Basically, this just required adding another resource, in this case, HelloPingResource. I modeled it fairly closely after the PingResource, but I didn't use the client to check to make sure that it had worked. I swiped code from the SensorBaseClient isHost() method to both perform the ping and receive the response, but put it in a separate test class for the HelloPing. I also wrote a JUnit test (my first!), which ran (I think) when I ran ant -f junit.build.xml, but there was so much junk in that output that I couldn't find my little message in all of it.

I've also got the beginnings of an API framework for this server critter that I am building.

Friday, July 3, 2009

Built the sensorbase from source on three different computers today. The only difficulties I ran into are that in some versions of ubuntu, the installed java is not Java Enterprise edition, and therefore doesn't include javx or javamail. However, those things were not difficult to remedy. I have also found the part of the server that responds to a ping!