Tuesday 15 January 2013

python - networkx - read edgelist in chunks (pandas) -


I have found a very large network to read and analyze in Network X (around 500 million lines), weighted edgelist (Node 1 node 2 weights) So far I try to read it:

Open with the # gzip.open and read the file (Network, 'RB') as FH: # Read Weighted Edge List G = NX read_weighted_edgelist (FH, Create_using = nx.DiGraph ())

But since it's too big I have some memory problems. Wondering whether a file is a method of reading in sections with the length set in the "Pando" style? Thanks for your help.

Edit:

This is a small extraction of my adGList file (node ​​1 node 2 weight):

  30879005 5242 11 44608582 2295986 4 24935102 737450 1 42230925 1801294 1 20926179 2332390 1 40959246 1100438 1 3291058 3226104 1 23192021 5818064 1 16328715 7695005 1 11561383 2102983 1 1886716 1378893 2 23192021 5818065 1 2060097 2060091 1 7176482 3222203 2 46586813 1599030 1 35151866 35151866 1 12420680 1364416 5 612044 92878 1 16260783 3373725 1 26475759 85310 1 21149725 17011789 1 1312 9 90 105320 1 23898296 1633222 3 3635610 2103011 1 12737940 4114680 1 18210502 10816500 1 45999903 45999903 1 8689446 1977413 1 5998987 3453478 3 Advertising ID Read the data as a CSV in a Pendase DF: < / P> 
  df = pd.read_csv (pa Th_to_edge_list, sep = '\ Now create an NX diagram and compile a list to prepare a list of tueply.Wi (as node1, node2, weight) data as:  
  [150] NX G = nx.DiGraph () G.add_weighted_edges_from (imported networkx [df.values ​​for x in tuple (x)] G.edges () out [150] [(16,328,715, 7,695,005), ( 42,230,925, 1,801,294), (40,959,246, 1,100,438), (12,737,940, 4,114,680), (3,635,610, 2,103,011), (16,260,783, 3,373,725), (45999903, 45999903), (7176482, 3222203), (8689446, 1977413), (11561383, 2102983), (21149725, 17011789), (18210502, 10816500), (3291058, 3226104), (23898296, 1633222), (46586813, 1599030), (2060097, 2060091), (5998987, 3453478), (44608582, 2295986) , (12420680, 1364416), (612,044, 92,878), (30,879, 005, 5242), (23192,021, 5,818,064), (23192,021, 5818065), (1312990, 105,320), (20,926,179, 2,332,390), (26,475,759, 85310), (24,935,102, 737,450), (35151866, 35151866), (1,886,716, 1,378,893)]  

We have a special weight of weight:

 in  [153]: g.get_egez_data (30879005,5242) out [153]: {'Weight': 11}  

To read the list of edges in chunks, set param read_csv and each section Add the edges and weight using the code above.

Edit

To read the quantity, you can:

  As NX = Pd.read_csv for nx.DiGraph () (path_to_edge_list September = import networkx '\ r +', header = none, name = ['node 1', 'node2', 'weight'], chonkees = 10000): G.Ad_Watched_Age_frame ([x] xdvalues ​​x))  

1 comment:

  1. This is my first time visit here. From the tons of comments on your articles,I guess I am not only one having all the enjoyment right here!
    cs代写

    ReplyDelete