Jean David

A Network of Ice and Fire

Note aux lecteurs francophones

Merci pour l'intérêt que vous portez à mon travail. Cet article a initialement été rédigé en langue anglaise, et je n'ai pas le projet de le traduire pour l'instant. Notez que le reste de mon site est généralement écrit en Français.

Introduction

My first, more ambitious idea after learning of Gephi's processing power was to map every single concept in ASOIAF on a graph, by crawling every 22,259 pages of the wiki. Though I never managed to do this because my methodology wasn't adapted to working with such a large amount of data. I've then decided to only focus on the characters and their interactions with each others. Though such a project has already been done multiple times, I thought it would be a fun and interesting challenge to do it by myself and hopefully get better at creating networks through something I'm passionate about.

Gathering the data

My first, more ambitious idea after learning of Gephi's processing power was to map every single concept in ASOIAF on a graph, by crawling every 22,259 pages of the wiki. Though I never managed to do this because my methodology wasn't adapted to working with such a large amount of data. I've then decided to only focus on the characters and their interactions with each others. Though such a project has already been done multiple times, I thought it would be a fun and interesting challenge to do it by myself and hopefully get better at creating networks through something I'm passionate about.

This Github repository from a similar project, created a dataset by connecting two characters appearing within 15 words of eachothers, allowing to weight the edges to the number of interactions. I want to mention this sepcific dataset because I found it more interesting on a smaller scale.

The way I did it was to crawl every character page from the wiki to a 1-click depth. I found the wiki to be precise enough in itself to list every character interaction accurately. Not only that, but every character page also gets a family tree linking the characters to their family with the same 1-click depth. This method, however, has its limits, as I'll explain later.

I used Science-Po's Hyphe demo version to crawl the data, though the wiki prevented such crawling activity, Imanged to circumvent the issue by extracting only the character pages using wget with wget -r https://awoiaf.westeros.org/index.php/List_of_characters and using Hyphe to crawl on my own instance of the wiki. This process if quite lengthy however and I had to let Hyphe work overnight before the crawling was done. After that, I was able to extract my data, either in csv, or as a standalone .gxf file for Gephi, which I used. You can access my data here if you want to play with it.

Working with Gephi

Once the data is into Gephi, let's first filter the least relevant nodes with a filter on their indegree. In an inderected graph like this one, the indegree is defined by the number of edges directed at a node, in simpler terms, the characters with a large indegree are connected to a large amount of people, whereas the characters with a low indegree are connected to few people. In my data, the character with the largest indegree is Jaime Lannister, while a character with a very low indegree is, for example, Garrison Prester. I've excluded every character with an indegree inferior to 5, leaving me with 1044 characters to deal with. Each character has an average degree of 14.281, and the character with the largest is of 348 connections. The average path length between two characters is 3.263, which is the average number of connections there is between two characters.

Modularity

Now, let's find out if the links and indegrees in the data have any meaning and relevance by using a modularity algorithm. In a network, the modularity measures how well the network decomposes into modular communities. In other words, how my network is compartmentalized into sub-networks. With a resolution of 5, the algorithm found 21 communities in the network.

To my surprise, those communities are actually super accurate when compared to the storylines of the books. They're mainly divided into Families, Geographically, or related to big events.Note: In the original blogpost I wrote few years ago, I isolated visually the communities, but it seems like these files have been lost into the ether now. Oh well.

Description Notable members
1 Tyrell Family Margaery Tyrell, Mace Tyrell, Loras Tyrell
2 Dragonstone Stannis Baratheon, Davos Seaworth, Selyse Florent
3 Whispering Wood and aftermath Jaime Lannister, Robb Stark, Rickard Karstark
4 Stark Family (2nd generation) Eddard Stark, Catelyn Stark, Benjen Stark, Brandon Stark
5 Essos Daenerys Targaryen, Barristan Selmy, Jorah Mormont
6 Targaryen Kings Aegon V Targaryen, Baelor I Targaryen, Viserys II Targaryen
7 Riverlands Arya Stark, Brienne of Tarth, Amory Lorch, Beric Dandarrion
8 Martell Family Oberyn Martell, Arianne Martell, Quentyn Martell
9 King's Landing Petyr Baelish, Varys, Qyburn, Pycelle
10 North Ramsay Snow, Jeyne Poole, Wyman Manderly, Hodor
11 Red Wedding Edmure Tully, Roslin Frey, Jayne Westerling
12 Greyjoy Family Euron Greyjoy, Balon Greyjoy, Asha Greyjoy
13 Frey Family [1] Lothar Frey, Morya Frey, Dickon Frey
14 Frey Family & Red wedding involvement Walder Frey, Roose Bolton, Merrett Frey
15 Dunk & Egg side characters Eustace Osgrey, Addam Osgrey, Alysanne Osgrey
16 Lannister Family Tyrion Lannister, Cersei Lannister, Tywin Lannister
17 Frey Family [2] Jared Frey, Luceon Frey, Alys Frey
18 At the wall Jon Snow, Samwell Tarly, Mance Rayder
19 TBD
20 Vale Lysa Arryn, Robert Arryn, Yohn Royce
21 TBD

After tweaking the graph a bit with layouting algorithms, we can make the nodes on the same modular communities be of the same colour and get this:

Conclusion

I am mostly happy with the result, even though there still are some issues I noticed only after editing the whole thing:

  • Ramsay appears twice, both as Ramsay Snow and Ramsay Bolton (Due to the redirect on the wiki page), same goes for Brienne.
  • There are some location names here and there that I forgot to remove when editing the dataset: (ex: Tor, Greenblood...)

Keep in mind, this is my first time working with either hyphe of Gephi, I haven't managed to weight the edges. In my current iteration, every link between the characters weight the same. But in reality, Cersei has a much more intimate relation with Jaime than, let's say, Osmund Kettleblack (Or Moon Boy, for all I know). I potentially should have looked at how many times in a page was a character mentionned, and work my way from here.

Obviously there are some issues with how I handled the modularity, or how the modularity handles itself, though I found it to be mostly very precise, some choices are questionables, as in, why are the Freys so sparse, or why is King Robert included in the "Norhern" cluster.

This was my first time working with such a large dataset, and I'm very happy of how it turned out.