Monday, June 1, 2009

GSoC: Hackystat, May 25-June 1

This week, by and large, has focused on Hackystat, Developer Style! This has involved a lot of reading and planning. I have been anxious to start pounding out code, but after the reading I have done this week, anything beyond simple system testing seemed a little premature.

This week, I have:
1. Watched developer screencasts, including the new hackystat-developer-example.

I sometimes feel that Hackystat is likely to spoil us with its user-friendliness from an open source development standpoint. I feel confident that most open source projects don't have handy little video tutorials. However, I am not at all complaining! They are immensely helpful.

I really like the ivy integration a lot. I didn't have an iota of trouble with the developer example.

2. Contemplated issues of hosting, data retrieval, and storage.

The bulk of my mental energy went to this problem this week. There are several issues involved here. The first: what data am I allowed to access to mine, and is there a way I can get it that isn't on a per user/ per project basis? Aaron says there is a way to do without having to know a user's name and password. (I was contemplating an approach in which users interested in including their data in the mining project would download my app, which would access use the info. in their sensorshellproperties file to access their data, but this has the distinct disadvantages of requiring effort from the user, which may significantly limit the amount of data I have access to.)

So that's the data retrieval issue. Now for the hosting question.

The Facebook application has to be hosted. Joylent is offering free hosting, but only for a year, and only for applications that have more than 50 users. So that might be pushing it for our purposes. I would potentially like to host it on the Hackystat server. Aaron has asked for a more detailed specification of what the application will do, at which point we will be discussing it with The Powers that Be.

Storage is related to hosting.

It occurred to me while I was trying to pin down a design that I would definitely need to be storing all of this information somewhere. Aaron suggested creating a generic SocialMedia SensorDataType to do this.

3. Read boatloads of Facebook developer documentation

Which, to my disappointment, is not quite as easy to work with as the Hackystat documentation, or as pleasant and well-organized as the Java API. (Like I said, I'm getting spoiled.)

I am still deciding what language to use for the Facebook app. FB seems to lean towards PHP. I've had some experience with Ruby on Rails, so I'm thinking about using it instead. The initial Facebook application is not a terribly complex creature, in my head. Basically, it asks your permission to access your FB profile information, friends, etc, and then it takes that information, bundles it into the SDT, and sends that to Hackystat. I suppose it's really very little different than other development sensors, other than that the events are related to friends and interests.

4. (Somewhat unrelated) Solved the environment variable issue that was causing me such grief.

See this entry.

5. Started thinking about a potential SensorDataType for social networking data.

I'm having some difficulty coming up with something generic enough to cover ohloh, twitter, and facebook relationships and attributes. Not to mention that Aaron suggested adding SVN or mailing lists relationships (an idea that I freaking LOVE, but am not sure how to implement). The SDT needs to be generic enough to be easily expanded on--I can come up with many different additions to this idea, so i want this to be really easy to extend. For me, the most natural way to represent this data is two objects and a relationship. Initially, I though that didn't exactly jive with the key/value setup of the SDT. However, now, I think it could work reasonably well. Something along the lines of:

SDT Key Example Values
Social Media Object1 “Person”, “File”, “Bug”, “Project”, etc.

Object1ID Some unique integer id

Object2 “Person”, “File”, “Bug”, “Project”, etc.

Object2ID Some unique integer id

Relationship “Friend”, “Follows”, “Contributes to”, “Edited”, etc.

RelationshipID Some unique integer id


I was informed that I am working on a vision document, which was news to me, particularly after being told that I already had such a thing? I suppose that might be the original proposal.

What I had not included in my original proposal (in which I was mostly excited about the immediate data-mining prospects), but that I now definitely see as something worthwhile, is making sure that the setup I'm working on is 1. easily extensible and 2. easy to query for future projects as they arise. Aaron's idea of making the social media sdt really fits in nicely with that.

Questions that I have:

What kinds of questions should be directed to the dev list?

Direction for the coming week:
  • Start a google code project, even if I don't actually have any code yet.
  • Run a couple toy facebook applications on Joylent hosting to have a better sense of the requirements of my facebook app, and what hosting it will require.
  • Develop Social Media SDT
  • Read Twitter API
  • Play some more with hackystat as a user (Hackystat, user style!)
  • Get hackystat services to run locally (Hackystat, Admin style!)
You know, they say that minutes of planning saves hours of coding. Hoping that turns out to be right.

Hours spent this week: 17


aaron said...

re: What kinds of questions should be directed to the development list?

I'd direct any type of hackystat system type questions. how to implement SDTs, how to get data, how Hackystat projects work. Etc. That way Philip (and the rest of the Hackystat community) can see what people are struggling with. Thats my opinion.

Your SDT seems okay. Its hard for me to judge cause we've never done anything like this. Hm.. thinking about it more:
* i'm a little unsure what your resource field would be. a resource field is something like "file://home/johnson/svn/TestData/build.xml" this is important because using this field we can determine whether a SensorData instance is associated with a Project. Without that there is no way to share Project data.
* In some ways the unique identifiers could use the REST resource URL for example, Object1ID could be "" and the Object1 can contain the type "Person".
* i don't really know what RelationshipID contains.

anyway, i'll think about it more. hopefully philip will chime in too.