I taught a course on complex networks this fall, and one component of the course is a hands-on session where students use the SNAP C++ and Python libraries for graph analysis, and Gephi for visualization. One available dataset is DBLP, a large publication database in computer science, that actually includes a lot of electrical engineering as well.
In a small experiment I filtered DBLP for papers with both “massive” and “MIMO” in the title, and analyzed the resulting co-author graph. There are 17200 papers and some 6000 authors. There is a large connected component, with over 400 additional much smaller connected components!
Then I looked more closely at authors who have written at least 20 papers. Each node is an author, its size is proportional to his/her number of “massive MIMO papers”, and its color represents identified communities. Edge thicknesses represent the number of co-authored papers. Some long-standing collaborators, former students, and other friends stand out. (Click on the figure to enlarge it.)
To remind readers of the obvious, prolificacy is not the same as impact, even though they are often correlated. Also, the study is not entirely rigorous. For one thing, it trusts that DBLP properly distinguishes authors with the same name (consider e.g., “Li Li”) and I do not know how well it really does that. Second, in a random inspection all papers I had filtered out dealt with “massive MIMO” as we know it. However, theoretically, the search criterion would also catch papers on, say, MIMO control theory for a massive power plant. Also, the filtering does miss some papers written before the “massive MIMO” term was established, perhaps most importantly Thomas Marzetta’s seminal paper on “unlimited antennas”. Third, the analysis is limited to publications covered by DBLP, which also means, conversely, that there is no specific quality threshold for the publication venues. Anyone interested in learning more, drop me a note.
Hey Erik,
This is incredibly interesting, thank you so much!
Could you do one graph where each node is still an author but its size is proportional to the number of citations in the massive MIMO papers?
Citations should reflect impact better, I believe.
Thanks a lot –
yes, size in proportion to citations might be better. But I don’t have the data for that. In principle one could harvest the information from Google Scholar, ISI Web of Science, etc. but I do not know whether their user agreements allow that…
Thanks for the quick reply! For completeness sake I will echo it here: the problem seems to be that the DBLP database does not contain information on citations.
This information can be harvested from e.g. Google Scholar – I checked that their user agreement allows it.
One way of doing this is to use a python module released a couple of days ago called scholarly. I will update this comment thread if I ever do it, but it won’t happen for the next two months. Please update this if you think of doing it because it really should be an interesting and insightful graph.
Seems like a fun hacking project… drop me a line if you want to work on it!