A network of Ice and Fire (eng)
One thing that's remarkable about George RR Martin's "A Song of Ice and Fire" is the complex cast of characters involved in the various storylines. The fan-made wiki A wiki of Ice and Fire lists an evergrowing account of 2293 characters (With 22 of them named Jon), though a fan-made spreadsheet lists a total of 7288 characters: 3078 named, 4087 unnamed and 123 nicknamed. As I've recently started learning to use Gephi, this is the software I'm going to use to try to create a graph of the various character's relations.
Gathering the data
My first, more ambitious idea after learning of Gephi's processing power was to map every single concept in ASOIAF on a graph, by crawling every 22,259 pages of the wiki. Though I never managed to do this because my methodology wasn't adapted to working with such a large amount of data. I've then decided to only focus on the characters and their interactions with each others. Though such a project has already been done multiple times, I thought it would be a fun and interesting challenge to do it by myself and hopefully get better at creating networks through something I'm passionate about.
This Github repository from a similar project, created a dataset by connecting two characters appearing within 15 words of eachothers, allowing to weight the edges to the number of interactions. I want to mention this sepcific dataset because I found it more interesting on a smaller scale.
The way I did it was to crawl every character page from the wiki to a 1-click depth. I found the wiki to be precise enough in itself to list every character interaction accurately. Not only that, but every character page also gets a family tree linking the characters to their family with the same 1-click depth. This method, however, has its limits, as I'll explain later.
I used Science-Po's Hyphe demo version to crawl the data, though the wiki prevented such crawling activity, Imanged to circumvent the issue by extracting only the character pages using wget with
wget -r https://awoiaf.westeros.org/index.php/List_of_characters and using Hyphe to crawl on my own instance of the wiki. This process if quite lengthy however and I had to let Hyphe work overnight before the crawling was done. After that, I was able to extract my data, either in csv, or as a standalone .gxf file for Gephi, which I used. You can access my data here if you want to play with it.
Working with Gephi
Once the data is into Gephi, let's first filter the least relevant nodes with a filter on their indegree. In an inderected graph like this one, the indegree is defined by the number of edges directed at a node, in simpler terms, the characters with a large indegree are connected to a large amount of people, whereas the characters with a low indegree are connected to few people. In my data, the character with the largest indegree is Jaime Lannister, while a character with a very low indegree is, for example, Garrison Prester. I've excluded every character with an indegree inferior to 5, leaving me with 1044 characters to deal with. Each character has an average degree of 14.281, and the character with the largest is of 348 connections. The average path length between two characters is 3.263, which is the average number of connections there is between two characters.
Now, let's find out if the links and indegrees in the data have any meaning and relevance by using a modularity algorithm. In a network, the modularity measures how well the network decomposes into modular communities. In other words, how my network is compartmentalized into sub-networks. With a resolution of 5, the algorithm found 21 communities in the network.
To my surprise, those communities are actually super accurate when compared to the storylines of the books. They're mainly divided into Families, Geographically, or related to big events.
|1||Tyrell Family||Margaery Tyrell, Mace Tyrell, Loras Tyrell|
|2||Dragonstone||Stannis Baratheon, Davos Seaworth, Selyse Florent|
|3||Whispering Wood and aftermath||Jaime Lannister, Robb Stark, Rickard Karstark|
|4||Stark Family (2nd generation)||Eddard Stark, Catelyn Stark, Benjen Stark, Brandon Stark|
|5||Essos||Daenerys Targaryen, Barristan Selmy, Jorah Mormont|
|6||Targaryen Kings||Aegon V Targaryen, Baelor I Targaryen, Viserys II Targaryen|
|7||Riverlands||Arya Stark, Brienne of Tarth, Amory Lorch, Beric Dandarrion|
|8||Martell Family||Oberyn Martell, Arianne Martell, Quentyn Martell|
|9||King's Landing||Petyr Baelish, Varys, Qyburn, Pycelle|
|10||North||Ramsay Snow, Jeyne Poole, Wyman Manderly, Hodor|
|11||Red Wedding||Edmure Tully, Roslin Frey, Jayne Westerling|
|12||Greyjoy Family||Euron Greyjoy, Balon Greyjoy, Asha Greyjoy|
|13||Frey Family ||Lothar Frey, Morya Frey, Dickon Frey|
|14||Frey Family & Red wedding involvement||Walder Frey, Roose Bolton, Merrett Frey|
|15||Dunk & Egg side characters||Eustace Osgrey, Addam Osgrey, Alysanne Osgrey|
|16||Lannister Family||Tyrion Lannister, Cersei Lannister, Tywin Lannister|
|17||Frey Family ||Jared Frey, Luceon Frey, Alys Frey|
|18||At the wall||Jon Snow, Samwell Tarly, Mance Rayder|
|20||Vale||Lysa Arryn, Robert Arryn, Yohn Royce|
After tweaking the graph a bit with layouting algorithms, we can make the nodes on the same modular communities be of the same colour and get this:
Here below, I isolated some modular communities to give a better overview of how well they were separated:
I am mostly happy with the result, even though there still are some issues I noticed only after editing the whole thing:
- Ramsay appears twice, both as Ramsay Snow and Ramsay Bolton (Due to the redirect on the wiki page), same goes for Brienne.
- There are some location names here and there that I forgot to remove when editing the dataset: (ex: Tor, Greenblood...)
Keep in mind, this is my first time working with either hyphe of Gephi, I haven't managed to weight the edges. In my current iteration, every link between the characters weight the same. But in reality, Cersei has a much more intimate relation with Jaime than, let's say, Osmund Kettleblack (Or Moon Boy, for all I know). I potentially should have looked at how many times in a page was a character mentionned, and work my way from here.
Obviously there are some issues with how I handled the modularity, or how the modularity handles itself, though I found it to be mostly very precise, some choices are questionables, as in, why are the Freys so sparse, or why is King Robert included in the "Norhern" cluster.
This was my first time working with such a large dataset, and I'm very happy of how it turned out.