what to do with blog posts: another test

With my confirmation seminar next week (Tuesday to be precise, more details on that later), I’ve been thinking about what I’m trying to get out of this research project, which bits of the data to study, and how these might be represented within my thesis (and any other outcomes). Because I am quite possibly insane, over the last two days I’ve grabbed (manually) the full text of each blog post made on Pineapple Party Time – a blog hosted on Crikey and run by Mark Bahnisch of Larvatus Prodeo, William Bowe of the Poll Bludger, and Possum (Scott Steel) from Pollytics. I chose this blog mainly because it had a brief, and complete, lifespan – it ran for a month, being launched on Tuesday 24 February 2009, when the Queensland state election was called, until Monday 23 March (two days after the election itself, enough time for a few final analyses). Of course, that didn’t mean there were only a few posts, around 130 in total (having copied and pasted each one into its own document), of which Bahnisch contributed the most.

So, with all the posts in raw(ish) text format (except for the election day liveblog – see below), and not worrying about links or comments just yet (I didn’t save comments, but I’ll probably get some graphs happening comparing number of posts per day and comments per day, both for the whole blog history and per author), what should be done with this data? Well, textual/content analysis of some description, but something quick would be preferable for the moment. I’m going to run everything through Leximancer a bit later, but earlier in the week ManyEyes (featured here previously) added a new data visualisation tool to its range of options: phrase net. This method allows you to upload your data set of many words and find common combinations of phrases along the lines of ‘x is y’, ‘x’s y’, ‘x of the y’, ‘x and y’, and so on. So, in the name of research, I’ve been testing it out. Here’s the visualisation (currently of the ‘x is y’ format) for posts from the entire blog:

ppt [ManyEyes]

Given the general themes of the election coverage – Premier Anna Bligh calling it early, the LNP looking to gain a big swing of voters away from the ALP, polls being seen as giving the LNP a slim victory or making the contest too close to call – some of the combinations showing up are unsurprising (‘Labor is worried/scared/vulnerable’ for example).

Going on an author by author basis, though, this changes a little, given Possum and Bowe’s focus on, for example, poll analysis and electoral data.

possum [ManyEyes]

bowe [ManyEyes]

bahnisch [ManyEyes]
Of course, this is just testing things out, and proper analysis would be more rigorous (and take longer) – many of the phrases shown above only appear once, so some investigation of the methods using in visualising the text will also need to take place (and whether this is a particularly valid approach to analysing this particular dataset). One thing to note, though, and an important aspect for studies of blogs during elections was the use of liveblogging on election night by the Pineapple Party team (following on from Crikey doing a similar thing for the US presidential election). Given that the liveblog used CoverItLove embedded within a post for its coverage, grabbing the data from that is a bit trickier than the rest of the post text (and then sorting out authors from the various contributions may also be hard – given my time constraints, I didn’t try it, and as I’m not looking at blogs during election periods I may not have to deal with this during the project, but it will probably still need to be covered at some point).

One final thing: the post text is up on ManyEyes, so if you want to play around with it, go ahead – let me know if you find anything interesting!