**Purpose**
The main purpose is to be able to compute the share of resources used by node `i`

in relation to its neighbors:
`r_i / sum_j^i{r_j}`

where `r_i`

are node i resources and `sum_j^i{r_j}`

is the sum of i’s neighbors’ resources.

I am open to any R, python or eventually stata solutions, that are able to achieve this task on which I am almost giving up… See below snippets with my previous attempts.

To achieve this goal, I am trying to perform a search of this type:

node | col1 | col2 | col3 |
---|---|---|---|

i | [A] | [list] | list |

j | [A, B , i] |

search i in col 1 if found update col1

node | col1 | col2 | col3 |
---|---|---|---|

i | [A, j] | [list] | list |

j | [A, B , i] |

**Data**
Dataframe is about 700k rows and lists can be of max 20 elements. Lists in col1-col3 may be empty. Entries look like ‘1579301860’ that are stored as strings.

The first 10 entries of the df:

df[["ID","s22_12","s22_09","s22_04"]].head(10) ,ID,s22_12,s22_09,s22_04 0,547232925,[],[],[] 1,1195452119,[],[],[] 2,543827523,[],[],[] 3,1195453927,[],[],[] 4,1195456863,[],[],[] 5,403735824,[],[],[] 6,403985344,[],[],[] 7,1522725190,"['547232925', '1561895862', '1195453927', '1473969746', '1576299336', '1614620375', '1526127302', '1523072827', '398988727', '1393784634', '1628271142', '1562369345', '1615273511', '1465706815', '1546795725']","['1550103038', '547232925', '1614620375', '1500554025', '1526127302', '1523072827', '1554793443', '1393784634', '1603417699', '1560658585', '1533511207', '1439071476', '1527861165', '1539382728', '1545880720']","['1529732185', '1241865116', '1524579382', '1523072827', '1526127302', '1560851415', '1535455909', '1457280850', '1577015775', '1600877852', '1549989930', '1528007558', '1533511207', '1527861165', '1591602766']" 8,789656124,[],[],[] 9,662539468,[1195453927],[],[]

**What I tried: R Attempts**
Exploding the lists and put in a long format.
Then I tried two main approaches in R:

- loading long data into
`igraph`

and then apply to the nodes’ graph neighbors(), saving into lists and using plyr to have a neighbor_df (works but 2 nodes gets done in 67 seconds)

# Initialize the result data frame result <- data.frame(Node = nodes) #result <- as.data.frame(matrix(NA, nrow = n_nodes, ncol = 0)) neighbor_lists <- lapply(nodes, function(x) { neighbors <- names(neighbors(graph, x)) if (length(neighbors) == 0) { neighbors <- NA } return(neighbors) }) neighbor_df <- plyr::ldply(neighbor_lists, rbind) names(neighbor_df) <- paste0("Neighbor",1:ncol(neighbor_df)) result <- cbind(result,neighbor_df)

- read the long format with
`data.table`

, split, lapply dcast on the splits (<- memory overload)

result_long <- edges[, .(to = to, Node = from)][, rn := .I][, .(Node, Neighbor = to, Number = rn)][order(Number),] result_long[,cast_cat:=findInterval(Number,seq(100000,6000000,100000))] # reshape to wide result_wide <- dcast(result_long, Node ~ Number, value.var = "Neighbor", fill = "") #Only tested on sample data, target data is 19 mln rows and dcast shall be split, but then it consumes 200Gb of ram result_wide[, (2:ncol(result_wide)) := lapply(.SD, function(x) ifelse(x == "", NA, x)), .SDcols = 2:ncol(result_wide)] result_wide = na_move(result_wide, cols = names(result_wide[,!1]) ) result_wide<- Filter(function(x)!all(is.na(x)), result_wide)

I posted as per Andy request, yet I think it clutters the question.

## Advertisement

## Answer

Thanks to the comment of @Stefano Barbi:

# extract attributes characteristics: r <- vertex_attr(g,"rcount",index=V(g)) #create a dgC sparse matrix from graph m <- get.adjacency(g) # premultiply the adj matrix to find the sum of the neighbors resources sum_of_rj = r %*% m # add node's own resources sum_of_r = sum_of_rj + r #find the vector of shares share = r / sum_of_r@x sh_tab = data.table(i = sum_of_r@Dimnames[[2]], sh = share) sh_tab