A while ago, Sky asked me for suggestions for mapping/visualisation tools for one of her chapters, and she’s since been testing out IssueCrawler, which she discusses here. While writing a quick list of possible tools, I came across a couple of new visualisation programs that I hadn’t tested, so this morning is all about seeing what the various software and online tools can do.
For today’s experiments, I’m not using either IssueCrawler or ManyEyes – I’ve discussed both previously, anyway, and IssueCrawler is not actually useful in this context – I’ll try a second entry about crawls, scraping, and visualisation (the likes of IssueCrawler and VOSON) later this week, hopefully. For this post, though, I’m going to take data acquired by hand and put into a two-column spreadsheet in Excel (I know, I’m a terrible person for not doing it in Calc, but this will be relevant a little later). I’m using the spreadsheet I created manually from blogroll links of the Wikio.fr Top 100 French Political blogs in May 2008, rather than crawling the internet looking for connections. ManyEyes will be used as a reference, but as I’ve already visualised the data being used, I’m not going to redo that process today. I’m also not going to go through what the visualisations show from the data involved, but (however shallow this may be) I’m focussing more on the aesthetics, what the maps look like and how this can be customised, exported, and embedded.
For the purposes of comparison, here is the original ManyEyes visualisation of the blogroll links between blogs on the list of the top 100 French Political blogs (May 2008):
To create the visualisation above from the data was straightforward, a simple select the relevant cells in the two columns, copy-paste, and let ManyEyes do the work. However, the customisation of the visualisation is an issue – the layout can be recomputed and the diagram embedded in other sites online, but any other changes are limited. So, in the interests of comparing tools, and the likelihood of working with other data types later on in my research, I looked for other resources.
There is an add-on for Excel (2007 only, though) called .NetMap, which allows users to generate network maps from their data (the standard Excel chart options don’t do this, and neither do those in Calc). After a bit of playing around with options and updates to get everything working, I generated the above visualisation. The display options are heavily customisable – from vertex colour and shape to edge colour and opacity – but, for some reason, as you can see in the screenshot, the vertex labels did not show up. This is fine when using .NetMap itself, as the diagram is next to the spreadsheet itself, and when you select a vertex, it shows the edges connecting it to other vertices and highlights the relevant cells in the spreadsheet. Beyond that context, though, such as when I use the screenshot elsewhere, there was important information missing (admittedly, my brief tests may have just overlooked some settings, as is possible with any of the programs discussed here). [Edit: Indeed, after a helpful email and a bit more playing around, I’ve managed to display the labels alongside the vertices. This is what you get from not thoroughly exploring all settings…] A more useful aspect of .NetMap is the ability to generate subgraph images; basically, each vertex’s individual map, ignoring all the vertices it is not connected to. However, as .NetMap only works at the moment with Excel 2007, and my computer is destined to take on a Linux flavour around Christmas time, .NetMap is not an ideal long-term option for my personal visualisation needs. Nevertheless, for my research it will still be useful, and it’ll still be running on my work computer.
The above visualisation was created using Cytoscape, which has so far worked ideally – again, I haven’t tested it thoroughly, but it also allows display customisation and a range of layout algorithms. Importantly, it also allows direct import of data from an Excel spreadsheet. In the program itself I haven’t quite worked out how to get more information displayed, but the resulting visualisation is very pleasing and clear. I will be using Cytoscape more often, I think.
One of the reasons I chose to use the reduced blogroll list is the focussed nature of edges and vertices – the first spreadsheet, of nearly all blogroll links, has many vertices that are only connected to one blog, which created rather large, messy maps. In addition, it’s easy to compare these maps through their small sample size and the presence of the tiny ‘island’ of five blogs not connected to the main network. After the Cytoscape test, I moved onto the ‘big two’ programs for social network analysis, UCINET (Netdraw) and Pajek. These two programs will be used for larger-scale analysis, using data from the crawling and scraping processes, for which the data will be in different formats. Excel spreadsheets, of course, are not preferred formats for either of these programs, so a bit of conversion had to take place. Luckily, this was not as problematic as trying to get an xml file from an Excel 2007 spreadsheet. Indeed, UCINET itself allows data to be imported from spreadsheets and saved as a matrix that Netdraw will be able to read. The above map, then, is the resulting Netdraw visualisation, using the Spring embedding option in Graph-Theoretic layout. Again, there are options for customising display and layout, and plenty of analytical tools that I haven’t tested yet (going for the visualisation angle first). A bit of refreshing the layout was required, though, to not have the vertices of the island lying on top of each other, thus only having three, rather than five, blogs visible (of course, you can also manually alter the position of vertices).
From Netdraw, the data could be converted into a Pajek-friendly format, although there is the risk that the layout used by Netdraw can influence that created by Pajek. A bit of playing around and recomputing different layouts negated that, though. Pajek also has the ability to draw the network in 3D, which is a nice option especially when dealing with the implied-three dimensions of the ‘blogosphere’. Similar customisation options to the other programs, although from an aesthetic perspective there’s something rather pleasing about the thin lines and stark colours of the small version of the map. Again, as with UCINET, I’m more likely to use Pajek for larger-scale projects than small maps like this, which I’d probably use a quicker option to go from a spreadsheet to (such as Cytoscape or ManyEyes), but the 3D aspect is handy (especially once I master the export options).
Finally, an accidental visualisation. I was testing some of UCINET’s export settings, and ended up somehow revisualising the network in Mage – which, like Pajek can be, uses a 3D layout. I have hardly gone through the options with this visualisation, but after generating all those maps, I was rather taken with the easy ability to rotate the network, including various degrees of shading to further emphasise the position of vertices in the 3D layout. The screenshot doesn’t really do it justice, but again I still need to go through the export options.
All of the tools I tested generated usable maps, with various degrees of customatisation. All except ManyEyes work offline, and all except UCINET (which has a free trial version) are freely available for download (however, .NetMap does require the rather not-free Excel 2007 for most of its stuff, although I think there is a standalone version too…). I imagine there are many other visualisation options available, too, although having more than five or so working options is possibly overkill. Nevertheless, the amount of data and the format used will dictate which visualisation program I use for my work. The ease of going from a basic two-column spreadsheet to the above maps is very pleasing, though, and even with my non-existant background in networks, informatics, stats, and other mathematical abilities, the ability to generate these will help my research.
[I also wanted to test out Gephi, but even after adding extensions to the version of Excel running on here, the xml file exported from Excel with the blog links has not yet been imported successfully by Gephi. Still, it’s another program that I will keep an eye on and try to get working later.]