Gephi Guidelines for Cuneiform Archives. Part 2: Cleaning your Dataset

 

This is Part 2 in a series of blog posts that describe how to use Gephi for social network analysis (SNA) when using cuneiform sources. This blog post describes how to clean up a dataset acquired from Prosobab (https://prosobab.leidenuniv.nl), but it can also be helpful for other datasets.

Part 1: Acquiring a Dataset via Prosobab
Part 2: Cleaning your Dataset
Part 3: Data Preparation for Gephi
Part 4: Data Import for a 2-mode Network
Part 5: Transforming a 2-mode Network into a 1-mode Network

See also the theoretical guide that describes how Assyriologists can apply SNA to cuneiform sources — Social Network Analysis and Cuneiform Archives.

Introduction

By following the steps described in the box below, you will end up with a clean standard dataset that forms the basis of Social Network Analysis data. In case you work with another program than Gephi (e.g. Vizone or UCINET), then this might be all you need to do with your data in terms of cleaning it up and preparation for import. If you choose to use Gephi, then you need additional data preparation that is described in Part 3: Data Preparation for Gephi.

Tips for cleaning up Prosobab data in Excel:
1. Filter your data

By using the filter, you will not mess up your data.

There are two ways to filter your data:

a) Select all tables and rows (Ctrl+A) and select the tab Home > Sort&Filter > Filter.

b) From the menu bar of the program, select the tab Data > Autofilter.

Filtered data columns get little buttons with arrows next to their heading and they will keep the information in one row together when any of the columns is reordered.

2. Delete the duplicates of the documents

Your data might contain documents which are written on two or more cuneiform tablets (i.e. copies). If you downloaded the data from Prosobab, then the duplicates are found in those rows which contain no name attestations and therefore no individuals, and are marked with a “-” for Personal name, Patronym and Family name. Such duplicate tablets without name attestations should automatically be the first rows of your data unless you have reordered it. It is better to delete them now as they will not add to your network but might complicate the results you will see.

3. Clean up the Type and objects column

In the search screen of Prosobab, the Type and objects are combined. That is why the word “Subtype” finds itself before the document subtype, i.e. “subtypepromissory note” or “subtypeoath”. If you were to display the subtype of the document in Gephi as an edge, the lengthy descriptions can make the graph crowded. You can skip this step if that does not concern you.

1) Select the whole column of Type and objects.

2) Click on the tab Edit > Find > Replace

3) Type in “Subtype” (without quotation marks) and replace it with nothing by using the “Replace All” button.

4. Delete Probability column

This column will be exported from Prosobab automatically with the PID column, but you will not need it.

5. Delete the kings

These are not individuals who actually interact within the document — they are merely mentioned as part of the date formulae or in the oaths.

1) Reorder the column Role alphabetically.

2) Delete all rows that contain roles “king in date” or “king in oath”.

6. Check the PID numbers

In principle, the public data in Prosobab has all the name attestations identified as individuals. However, you should check if all your name attestations have an entry (i.e. a number) in the PID column.

1) Reorder PID column and check what is the highest number used.

2) Give even a higher unique number to those individuals who do not have a PID number.

7. Merging the columns with names

If you downloaded your data from Prosobab, you will have the names of the individuals in three columns (Name, Patronym, Family name). These should be merged into one. According to Assyriological convention, a person’s full name is e.g. written as Bēl-rēmanni/Mušebši-Marduk//Šangû-Šamaš, which is read as Bēl-rēmanni, son of Mušebši-Marduk, descendant of Šangû-Šamaš (family).

Excel can combine the data from several columns with a formula. To do this, follow the steps below that describe how to use the ampersand symbol (&) for merging cells: 

1) Create another column as the first column of the sheet.

2) Use the formula to merge cells.

> Select the cell A1 in the empty column you created.

> In this cell, type = and select personal name from this row.

> Type &”/”& and select patronym from this row.

> Type &”//”& and select family name from this row.

Your full formula should look like this: =B2&”/”&C2&”//”&D2. 

> Press “enter”. Now the three columns should be combined in the column in which you entered the formula. 

 

3) Apply the formula to the rest of the column. Click on the cell which now holds a successful combination of names, and drag the cell down to fill the whole column with merged values.

4) Copy the columns as values not formulas. The merged column might cause problems because it relies on formulas. Create another column to copy this data into. Note that you have to copy the values, not the formulas, into this new column. To do this, select all the rows containing formulas, and click on the newly created empty column. Then go to the main menu and select Edit > PasteSpecial… > a new window will open. From there, click on the button Values. This pastes the values of the merged columns into a newly created column. By this point, you are ready to delete both the column with the formulas and the original columns which you used to merge your data.

5) Select all data (Ctrl + A) and put a Filter on it again!

Now that the initial cleaning of your data is done, you can continue with data preparation for Gephi in Part 3.

If you encounter any problems or find information that needs to be updated, you can let me know via email: m.seire[at]hum.leidenuniv.nl.

Author: Maarja Seire
Published on 28 January 2020