I am trying to learn network analysis, so I am using Hillary Clinton’s emails online to see who emailed who.
My data is in a dictionary called hrc_dict. I have a tuple of the sender and receiver followed by the frequency of the emails. This is part of the dictionary:
{('Hillary Clinton', 'Cheryl Mills'): 354, ('Hillary Clinton', 'l'): 1, ('Linda Dewan', 'Hillary Clinton'): 1, ('Hillary Clinton', 'Capricia Marshall'): 9, ('Phillip Crowley', 'Hillary Clinton'): 2, ('Cheryl Mills', 'Anne-Marie Slaughter'): 1}
I am using Networkx in Jupyter to create a graph. My code is below:
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_nodes_from(hrc_dict)
for s, r in hrc_dict:
G.add_edge((s,r), hrc_dict[(s,r)])
G.add_edge((s,r), hrc_dict[(s,r)])
When I call nx.Graph(), nothing prints out and when I call G.nodes(), not all the nodes are showing up. I have pasted some of the output here:
[1, 2, 3, 4, 5, 6, 7, 8, 'Mark Penn', 10, ('Todd Stern', 'Hillary Clinton'), 12,]
When I call G.edges(), I get the below, which seems right
[(1, ('Hillary Clinton', 'l')), (1, ('Linda Dewan', 'Hillary Clinton')), (1, ('Hillary Clinton', 'Thomas Shannon')), (1, ('Cheryl Mills', 'Anne-Marie Slaughter')), (1, ('Christopher Butzgy', 'Hillary Clinton’))]
Does anyone know how I can add nodes correctly to my graph. I assume that each person needs to be a node, so how do I break up the tuple and add the names separately? Are the edges showing correctly or do I need to enter them differently?
To add each person as a node, you also need to change the use of add_nodes_from
.
Something like this:
srcs, dests = zip(* [(fr, to) for (fr, to) in hrc_dict.keys()])
G.add_nodes_from(srcs+dests)
now means that the list of nodes from G.nodes()
will be:
['Cheryl Mills',
'Capricia Marshall',
'Anne-Marie Slaughter',
'Phillip Crowley',
'Hillary Clinton',
'l',
'Linda Dewan']
(you don't get any duplicates because networkx stores graphs as a dictionary).
Note: if you use the method below for adding the edges, there isn't any need to add the nodes first -- but in case there is some reason why you might have nodes that have no neighbours (or another reason why nodes only is important), this code will do it.
Then add the edges basically as per Joel's answer; but also note the use of the attribute "weight", so the layout can make use of information directly.
import networkx as nx
import matplotlib.pyplot as plt
hrc_dict = {('Hillary Clinton', 'Cheryl Mills'): 355, ('Hillary Clinton', 'l'): 1, ('Linda Dewan', 'Hillary Clinton'): 1, ('Hillary Clinton', 'Capricia Marshall'): 9, ('Phillip Crowley', 'Hillary Clinton'): 2, ('Cheryl Mills', 'Anne-Marie Slaughter'): 1}
G = nx.Graph()
# To add the a node for each of the email parties:
srcs, dests = zip(* [(fr, to) for (fr, to) in hrc_dict.keys()])
G.add_nodes_from(srcs + dests)
# (but it isn't needed IF the following method is used
# to add the edges, since add_edge also creates the nodes if
# they don't yet exist)
# note the use of the attribute "weight" here
for (s,r), count in hrc_dict.items():
G.add_edge(s, r, weight=count)
# produce info to draw:
# a) if weight was used above, spring_layout takes
# into account the edge strengths
pos = nx.spring_layout(G)
# b) specifiy edge labels explicitly
# method from https://groups.google.com/forum/#!topic/networkx-discuss/hw3OVBF8orc
edge_labels=dict([((u,v,),d['weight'])
for u,v,d in G.edges(data=True)])
# draw it
plt.figure(1);
nx.draw_networkx(G, pos, with_labels=True)
nx.draw_networkx_edge_labels(G,pos,edge_labels=edge_labels)
plt.axis('equal') # spring weighting makes more sense this way
plt.show()
And this is what we might see:
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments