Sunday, August 23, 2009

GSoC: Hackystat August 17-23 and A Summer in Review

Whew. Well. Where to start?


The Last Week

This week was something of a trial. It seemed like every time I fixed a problem, that solution would cause a cascade of new problems. But, I FINALLY have the Hackystat sensor up and running. It's available as a featured download on the project page.

The biggest issues that I ran into where a result of the solution that I applied to the XML problem from last week. Because I had my own version of the Telemetry XML objects, which were identical to the Hackystat Telemetry objects but in a different package, getting things from the TelemetryClient into a form that I could send to my database was a real trial. It produced a lot of angst and some decidedly unattractive code. However, my decidedly nasty code could be ditched permanently if the Hackystat schemas were given namespaces and prefixes. That would be awesome!

The other problem that I had relating to the TelemetryClient was really strange. getChart (the one that doesn't take extra parameters) didn't work. So, I had to dig up the default parameters for all of the charts that I needed and use the getChart method that did take extra parameters. Thanks to Shaoxuan for all his help debugging that problem!

I've saved all the default parameters for the charts I'm using as String constants, so they can be accessed easily.

Another note: it was significantly non-obvious to me that the Telemetry service runs on a different port on the server, and that one would need to specify this. I'm sure it's in the documentation somewhere, but where is it?

Finally, I realized at some point yesterday that I hadn't covered a couple of the cases for the Hackystat client, such as computer crashes and so on, so I had to add those in. In other nasty surprises, as I suspected, the lack of Findbugs and PMD errors was too good to be true. So I'm not quite ready to be pulled into the Hackystat server umbrella.

What I didn't accomplish this week, that I had really hoped to accomplish, was the visualization tool. It's still next on the roster, but it doesn't get to be under the GSoC hat, I suppose.


The Summer as a Whole

This project became so much more gargantuan than I had intended! Looking back on the time line that I wrote at the beginning of this whole thing makes me laugh. Knowing what I know now, I might have come up with something considerably less insane!

It ended up being sort of a trial by fire! I certainly didn't expect to be standing at the end of the summer, having put together my own server! (The sensorbase was a good template to work off of, definitely!) The whole thing just ended up being so much larger and more complicated than I anticipated.

That, maybe more than anything, is what I have learned from this experience. Actual use by real people in the real world complicates even simple tasks. Really, until this summer I've been working on mostly toy projects. Nothing huge, and nothing really meant to be used by people. When making something that is supposed to be used, there are so many more things that one has to consider that I am accustomed to considering; the end result, then, is that even simple things (like pulling data from one server and sending it to another server) can spiral endlessly into complication.

Stopping that complication from overwhelming the project (and the coder!) is a really important thing, I think. It's somewhat difficult when one is sort of programming in an echo chamber of one's own mind. Because the GSoCers were all working on individual projects, it was somewhat more difficult for us to bounce ideas off one another. I feel that the current release of my project would be of significantly higher quality if I had spent more time communicating my ideas and having the designs vetted as I went. I say this because I was describing a bug to an unrelated third party, and he was like, "WTF were you thinking, designing that this way?" Coming from a place of such inexperience with real software development, it's hard to know if you're making a good design decision. How do you choose the right way when you don't know the wrong way? So I know now to take better advantage of my available resources. Perhaps--gasp, the horror!--request a code review? Spend more time talking design with my mentor? All of these things!

Beyond the new understanding of how much time things actually take, and how important collaboration is to the end product, I also learned a metric boatload of new skills and tools. JAXB, for instance, is still, to my mind, the coolest thing since the internet. Java Property files are also really awesome. Not to mention suddenly understanding how the internet really works. I mean, sure, I've done some very low level networking stuff. I've implemented go-back-n, etc. But one day after getting the SocNet server up and running I abruptly realized that when you call for a web page, it's a GET request! Some server somewhere returns a representation of the HTML to you! That was a very exciting moment for me.

What I Want to Do Now

I would like to take a little break from the SocNet project, now the GSoC is over. But not a break from Hackystat! I would like to get a NetBeans sensor up and running. I know one was tried in the past; but that has been several years and I would like to give it another shot. Eclipse and I parted ways mid-summer, so a I've been missing out on some of the hackystat data that I would really like to have had. I think I might have figured out a way to ghetto-rig such a sensor, at least, using the hints capability.

After that, back to SocNet! The next two big chunks of it that need doing are the Ohloh sensor and the visualizations/analysis tool. I may do the Ohloh sensor first--the hackystat sensor taught me that anything run from a user's computer (ie, one that I do not have direct control over) is much more difficult and painful. After this week, I would gladly accept a slightly less painful task!

I would also like to start work on the sensor discussed over the list, which crawls a repository to determine how familiar a coder is with a particular concept. It was suggested as a Master's thesis--perhaps I can make it mine?


A Word of Thanks

Now that the summer has drawn to an end, I would like to thank all of the Hackystat Hackers who helped me through my first GSoC! Special thanks to Aaron, Philip, and Shaoxuan, without whom I may never have surived.

Tuesday, August 18, 2009

GSoC: Hackystat August 10-August 17

This week

Hackystat app, now and forevermore.

REST API support for the Hackystat App is up and running.

Had a lot of trouble with XML. I wanted to write a complex type that contained TelemetryStreams as elements. Something along these lines:

xs:element name="XMLContributesToRelationship">
xs:complexType>
xs:sequence>
xs:element ref="Type" minOccurs="1" maxOccurs="1"/>
xs:element ref="ID" minOccurs="1" maxOccurs="1"/>
xs:element ref="StartTime" minOccurs="1" maxOccurs="1"/>
xs:element ref="EndTime" minOccurs="0" maxOccurs="1"/>
xs:element ref="XMLNode" minOccurs="2" maxOccurs="2"/>
xs:element ref="TelemetryStream" minOccurs="9" maxOccurs="9"/>
/xs:sequence>
/xs:complexType>
/xs:element

But I couldn't figure out how to include the telemetry definitions in the file. There were a lot of namespace problems. The full rundown is available on the dev list, but I shall repeat the solution I arrived at.

My final solution:

1. Give telemetry.resource a namespace and a prefix. Append the prefix where necessary to elements and complex type definitions in telemetry.resource.
2. Give my schema a namespace and a prefix. Append the prefix where necessary to elements and complex type definitions in my schema.
3. Import telemetry.resource.
4. Drop the element declaration of TelemetryStream in my schema.

I'm not sure which of these are necessary and which are superstitious fluff, but at least I got it working. I know you can import things without a namespace, but I couldn't make that fly with this.

However, this solution to my server-side problem caused a client-side issue that I am still working with. Now I have to choose whether to use my telementry schema (the same as Hackystat's, but with namespace and prefixes) for the client-side stuff, or hackystat's. I was coding merrily along until I realized I had imported half of the telemetry stuff from the hackystat library and half from my jaxb folder.

However, once I clear that up and do some testing, then we have Hackystat Application LAUNCH! Exciting, n'est-ce pas?

Then: visualizations and analysis tool hardcore.

Also: test coverage, documentation, continuous integration

In other words, it's going to be a busy week. I'll keep you posted.

Wednesday, August 12, 2009

GSoC: Hackystat August 3-August 10

Sorry for the delay in posting--I landed myself a nice bronchial infection and have spent most of the last week coughing like a sea lion barks. It's awesome! However, it will probably also contribute to my brevity today, which I imagine many of you in the audience will appreciate.

This week:

Hackystat app. (Still. Possibly forever more.)

Working on the hackystat app makes me feel like I might be the only person who has ever tried to access Telemetry data who was not intimately familiar with the workings of the system. Much time has gone towards trying to find a constant or list or SOMETHING that includes the names of all of the Telemetry charts. The test cases for the Telemetry chart stuff don't seem to use them--they just use hard-coded strings, which makes me suspect that there is no such set of constants. For those with commit access--man, would that be handy! Judging by the test cases and the list of telemetry stuff in the project browser, I decided that the names must be the same as the list in the project browser.

I will be storing these charts in SocNet:
Build
Churn
CodeIssue
Commit
Coverage
CyclomaticComplexity
DevTime
Issue
UnitTest

If anyone has a favorite chart they would like to see stored, speak now (or soon) or forever hold your peace. (Just kidding. But do speak up, because knowing would be good.)

The app is mostly finished (if my assumptions about the names were correct)--now I'm implementing its REST API support.


Visualizations:

TouchGraph is out, because it has virtually no documentation, and the code is a relatively old version. (They don't know when they will be releasing the new one.) I haven't been able to figure out how to use it, so I have moved on to other options.

Jung, which I mentioned last week, has better documentation than TouchGraph by a long shot. However, I am working most seriously with Giny (http://csbi.sourceforge.net/). Giny is a LOT easier to work with than Jung, and implements a bunch of handy graphing algorithms that will make rudimentary analysis that much easier.

What I can't decide is how to host the visualizations. It would be easiest (from my perspective), to run them on the user's computer. However, it would probably be best to do a project browser style thing, visualization via web browser. My concern is that I will not be able to manage that in two weeks.


Library problems:

I am running into trouble using the hackystat client libraries. For instance, with the Telemetry client most recently added to my system, I pulled the ivy retrieve target from the telemetry system build file. Somewhat lazy, I know, but why duplicate work? The problem is that the target only works if you've compiled and built from source hackystat-utilities, hackystat-sensorbase-uh, hackystat-sensor-shell, and hackystat-dailyprojectdata. Which is fine if you've done it, but not great if you haven't. I think this is because the individual projects don't have modules in ivy-roundup or in my module repository.

Since I don't want individual users to have to compile the entire hackystat system from sources just to be able to use my stuff, it would be awesome if the sensorshell jar and the telemetry jar were added to ivy roundup. If they can't be added to ivy roundup, is it cool if I add them to my module repository?


Next week:

My plan from last week was sort of a general overview for the next two weeks, so it still stands. So I'll be gluing the hackystat sensor to the socnet server and working on visualizations with Giny.

Something I'd like to do that MIGHT not be such a big deal would be to start one-way hashing the passwords.

Tuesday, August 4, 2009

GSoC: Hackystat July 27-August 3

This week:

Has not been as productive as I needed it to be. Mostly, it has been consumed with Ivy frustrations. I will be trucking along and realize that I need another library, and have to stop and futz with the Ivy stuff until it works. This week, that also involved updating Ant, since apparently the version of Ant running on this release of Ubuntu is two years old. In a couple of the cases I was having difficulty figuring out which jar I would need. For instance, I needed the SensorBaseClient class, but I didn't feel like it was a good idea to have the hackystat client dependent on the whole sensorbase. So, I looked at how the eclipse sensor did it, and saw that the eclipse sensor pulls the sensorshell jar, and assumed that it was all wrapped up in the sensorshell jar. I copied that little bit from the sensorshell build file and put it in the build file for my project, ran it.... Break. Fail. Lose. This did not make sense to me. So I downloaded the sensorshell and tried to build it. Fail, but because of Ant.

In the end, I downloaded the new releases and built everything from the source, so all of the necessary jars would be in my cache. I didn't experience any problems building the system from source, though I do have a question about the ant -f jar.build.xml publish-all command. Does it only build the projects that are immediately dependent upon the project you are building? Or does it cascade? I mean, does it follow the chain rule?

Like, if project x depends on project y which depends on projext z, and you invoke the publish all on project z, does it only build project y, or does it also build project x? My working understanding is that it only builds project y. It would be super nice if it also build project x.

So, while I love Ivy a lot for installation purposes, and for downloading and organizing purposes, writing something to use Ivy as you go is an enormous pain, particularly if something is not already in the roundup.

Other library related concerns. The sensorshell jar seems to contain pretty much the entire sensorbase. Why? It seems like I tried to avoid a sensorbase dependency and ended up with one anyway.

Anyway, here's a preview of the app. Note that where it currently says, "Item 1.... " etc on the list will actually be a list of the user's projects in the SensorBase. This is just the preview of the GUI.



also with tool tip texts.





Not that this will run right now. Have two more libraries to ivy-ify.


Visualizations:

These are the network visualization libraries I'm experimenting with.

http://sourceforge.net/projects/touchgraph/

http://jung.sourceforge.net/

I am really excited about touchgraph.


Splitting the Project:

This is going to have to wait until I've finished writing the code, more or less. It's enough trouble to update one set of build files. I don't look forward to having to update 4 or so.


Sprinting to the finish!

The firm pencil's down date is in just two weeks' time. Things that need to get done before then:

1. meaningful registration process, by which a user can link the email address they used to register with the sensorbase to all of their various socnet stuff, so that I can limit who accesses the data in an appropriate way.

2. useful visualizations and initial analysis tools
Other than displaying the network and allowing the user to navigate through it in a sort of physical manipulate-y way (touchgraph!), this will probably also include some summarization of the other information in the network. I don't, however, want to simply mirror what the dailyprojectdata and telemetry services already do. Still thinking on what that will be.

then, once that is done, splitting the project into smaller, manageable chunks, and making sure my documentation is nice, etc.

Also, on a personal note, I graduated from college this week.