Guide to Using Power BI and R for Social Network Analysis: Custom Visuals

The use of social network analysis is rapidly growing across various professional fields. It can provide valuable information for achieving business objectives, including targeted marketing, and help identify potential security or reputational risks. Moreover, social network analysis can assist companies in achieving internal goals by providing insights into employee behavior and the relationships between different departments.

Numerous software solutions are available for organizations to perform social network analysis, each with its own strengths, limitations, and suitability for specific purposes. This article concentrates on Microsoft Power BI, a widely used data visualization tool. Although Power BI offers several social network add-ons, this article will explore the use of custom visuals created in R to achieve more visually appealing and adaptable results.

This tutorial presumes familiarity with basic graph theory, specifically directed graphs. Additionally, subsequent steps are best suited for Power BI Desktop, which is only compatible with Windows. Mac OS or Linux users can utilize the Power BI browser, but it lacks support for certain features, such as importing Excel workbooks.

Organizing Data for Visualization

The creation of social networks begins with gathering connection (edge) data. This data consists of two main fields: the source node and the target node representing the endpoints of the edge. To obtain more comprehensive visual insights, we can collect additional data, usually depicted as node or edge properties:

1) Node properties

  • Shape or color: Represents the user type, such as their location/country.
  • Size: Indicates the user’s importance within the network, like their follower count.
  • Image: Serves as a unique identifier, such as the user’s avatar.

2) Edge properties

  • Color, stroke, or arrowhead connection: Signifies the connection type, such as the sentiment of the connecting post or tweet.
  • Width: Depicts the connection strength, such as the number of mentions or retweets between two users within a specific timeframe.

Let’s examine a sample social network visual to understand the function of these properties:

A graph of circles connected by lines of varying widths appears with three distinct sections. The left of the graph has six green shapes of various sizes labeled 1, 2, 3, 4, 5, and 6 in a hexagon. Numbers 1-5 are circles, while 6 is a diamond. They are interconnected by green arrows of varying widths and directions, and some arrowheads are filled green while others are not filled. To the right of the green shapes is the next section: three dark blue shapes arranged in a triangle that are labeled 7, 8, and 9, and are interconnected by blue arrows of varying widths and directions (with some arrowheads filled blue). Nodes 7 and 9 are connected to nodes 3 and 4 with gray arrows of varying widths and directions (with some arrowheads filled gray). In the middle of the graph, below the first two shape groups, is a single light blue diamond labeled 10. It is connected to nodes 5, 4, and 9 by dotted gray arrows of varying widths and directions (with some arrowheads filled gray).
Green, light blue, and dark blue nodes and varying circle or diamond shapes demonstrate different node types. Numbers with transparent backgrounds act as the node image identifiers, and larger nodes (such as Node 4) are more important in the network. Different edge types are indicated by color (green, blue, or gray), stroke (solid or dotted), and arrowheads (empty or filled); edge width shows strength (for example, the connection from Node 8 to Node 9 is strong).

Hover text can also be utilized to complement or substitute the aforementioned parameters, as it accommodates supplementary information that might not be easily conveyed through node or edge properties.

Evaluating Power BI’s Social Network Extensions

Having defined the various data characteristics of a social network, let’s assess the advantages and disadvantages of four commonly used tools for network visualization in Power BI.

ExtensionSocial Network Graph by Arthur GrausNetwork NavigatorAdvanced Networks by ZoomCharts (Light Edition)Custom Visualizations Using R
Dynamic node sizeYesYesYesYes
Dynamic edge sizeNoYesNoYes
Node color customizationYesYesNoYes
Complex social network processingNoYesYesYes
Profile images for nodesYesNoNoYes
Adjustable zoomNoYesYesYes
Top N connections filteringNoNoNoYes
Custom information on hoverNoNoNoYes
Edge color customizationNoNoNoYes
Other advanced featuresNoNoNoYes

Social Network Graph by Arthur Graus, Network Navigator, and Advanced Networks by ZoomCharts (Light Edition) are all suitable extensions for creating basic social networks and beginning your social network analysis journey.

Many dark blue, light blue, and orange circles (50+ circles) are connected by thin gray lines on a white background. The circles have a solid color border and are filled with small images of various Pokémon that have a white background, and the circles block the view of most of the gray lines. They form a circular shape overall.
An example visualization made using the Social Network Graph by Arthur Graus extension.
Many blue, purple, and gray circles (50+ circles) are connected by thin gray lines on a white background. The circles are solid and filled, and block the view of some of the gray lines. They form a circular arrangement overall.
An example visualization made using the Network Navigator extension.
Many large teal and small orange circles (50+ circles) are connected by thin gray lines on a white background. The circles are solid and filled, and most of the gray lines are visible. They form a horizontal wedge shape overall, with more densely populated circles appearing on the right side. On the bottom left of the chart, there are a few widget icons and two labeled circles: a teal circle labeled "From User," and an orange circle labeled "To User." On the bottom right of the chart is a logo reading "zoomcharts."
An example visualization made using the Advanced Networks by ZoomCharts (Light Edition) extension.

However, to bring your data to life and uncover groundbreaking insights with captivating visuals, or if your social network is particularly intricate, developing custom visuals in R is recommended.

Many green, blue, and purple circles (50+ circles) are connected by thin lines of varying colors (green, gray, and red) on a white background. The circles are solid and filled with a Pokémon image at their center, and most of the thin lines are visible. They form a spread-out circular shape overall, with the green circles frequently branching out toward smaller blue or purple circles. The top right corner of the chart has the text "Social Network," and below the chart is a legend of lines and circles with related text: a green line with the text "Positive," a gray line with the text "Neutral," a red line with the text "Negative," a blue circle with the text "Mention," and a purple circle with the text "Retweet."
An example visualization made using custom visuals in R.

This customized visualization represents the final outcome of this tutorial’s social network extension in R, showcasing the extensive range of features and node/edge properties offered by R.

Constructing a Social Network Extension for Power BI Using R

Creating an extension for visualizing social networks in Power BI using R involves five distinct steps. However, before constructing our social network extension, we need to import our data into Power BI.

Prerequisite: Gathering and Preparing Data for Power BI

You can follow this tutorial using a provided test dataset](https://docs.google.com/spreadsheets/d/1_ws7hiEo8_Nnmbg95GBMnGLziTuxeDf4/edit?usp=sharing&ouid=118320780115282327017&rtpof=true&sd=true) based on Twitter and Facebook data or utilize your own social network data. Our data has been randomized; you may [download real Twitter data if desired. After you collect the required data, add it into Power BI (for example, by importing an Excel workbook or adding data manually). Your outcome should resemble the following table:

A table with thirteen alternating gray and white rows appears. It has a title---"Social Network"---with headers below it. The first column is labeled "From User" and has eight "1" texts followed by five "2" texts. The second column is labeled "To User" and reads (from top to bottom): 2, 3, 5, 6, 7, 8, 9, 10, 7, 8, 11, 13, 14. The third column is labeled "Number of Connections" and reads (from top to bottom): 12, 46, 29, 79, 49, 11, 90, 100, 66, 29, 62, 13, 45. The fourth column is labeled "Type" and reads (from top to bottom): Retweet, Mention, Mention, Retweet, Retweet, Retweet, Mention, Mention, Retweet, Retweet, Retweet, Retweet, Mention. The fifth column is labeled "Sentiment" and reads (from top to bottom): Positive, Positive, Negative, Neutral, Positive, Negative, Positive, Neutral, Neutral, Negative, Negative, Negative, Negative. The sixth column is labeled "From User Name" and has eight "Aaliyah" texts followed by five "Aaron" texts. The seventh column reads "To User Name" and reads (from top to bottom): Aaron, Abel, Abraham, Ace, Adalyn, Adalynn with two Ns, Adam, Addison, Adalyn, Adalynn with two Ns, Adeline, Adriel, Aidan. The eighth column reads "From Avatar" and has eight "https://raychemmedica.com/SampleImages/Pokemon/1.png" texts followed by five "https://raychemmedica.com/SampleImages/Pokemon/2.png" texts. The ninth column reads "To Avatar" and each entry has the text "https://raychemmedica.com/SampleImages/Pok" and appears to be cut off.

Once your dataset is prepared, you are ready to create a custom visualization.

Step 1: Establishing the Visualization Template

Developing a Power BI visualization is intricate—even rudimentary visuals necessitate thousands of files. Thankfully, Microsoft provides a library called pbiviz, which offers the necessary infrastructure-supporting files with minimal code. The pbiviz library also bundles all our final files into a .pbiviz file, which can be directly imported into Power BI as a visualization.

The most straightforward way to install pbiviz is with Node.js. After pbiviz is installed, we need to initialize our custom R visual using our computer’s command-line interface:

1
2
3
4
pbiviz new toptalSocialNetworkByBharatGarg -t rhtml
cd toptalSocialNetworkByBharatGarg
npm install 
pbiviz package

Remember to replace toptalSocialNetworkByBharatGarg with your desired visualization name. The -t rhtml argument instructs the pbiviz package to generate a template for developing R-based HTML visualizations. Errors will appear because we haven’t specified details like the author’s name and email in our package, but we will address these later. If the pbiviz script fails to execute in PowerShell, you might need to enable scripts using Set-ExecutionPolicy RemoteSigned.

Upon successful code execution, you will see a folder with the following structure:

A File Explorer listing containing eight subfolders (.tmp, .vscode, assets, dist, node_modules, r_files, src, and style) and eight files (capabilities.json, dependencies.json, package.json, package-lock.json, pbiviz.json, script.r, tsconfig.json, and tslint.json). All of the files are 1 KB, except for capabilities.json (2 KB) and package-lock.json (23 KB).

With the folder structure in place, we can proceed to write the R code for our custom visualization.

Step 2: Writing the Visualization Code in R

The directory created in the previous step contains a file named script.r, which has default code. (This code generates a simple Power BI extension using the iris sample database in R to plot a histogram of Petal.Length by Petal.Species.) We will modify this code while retaining its default structure, including the commented sections.

Our project utilizes three R libraries:

Let’s replace the code in the Library Declarations section of script.r to reflect our library usage:

1
2
3
libraryRequireInstall("DiagrammeR")
libraryRequireInstall("visNetwork")
libraryRequireInstall("data.table")

Next, we’ll replace the code in the Actual code section with our own R code. Before creating our visualization, we need to read and process our data. We’ll take two inputs from Power BI:

  • num_records: A numeric input N used to select the top N connections from our network (to limit displayed connections).
  • dataset: Our social network’s nodes and edges.

To determine the N connections for plotting, we need to aggregate the num_records value because Power BI provides a vector by default instead of a single numeric value. We can achieve this using an aggregation function like max:

1
limit_connection <- max(num_records)

Now, we’ll read dataset as a data.table object with custom columns. We’ll sort the dataset by value in descending order to prioritize displaying the most frequent connections. This ensures that we select the most relevant records to plot when limiting connections with num_records:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
dataset <- data.table(from = dataset[[1]]
                      ,to = dataset[[2]]
                      ,value = dataset[[3]]
                      ,col_sentiment = dataset[[4]]
                      ,col_type = dataset[[5]]
                      ,from_name = dataset[[6]]
                      ,to_name = dataset[[7]]
                      ,from_avatar = dataset[[8]]
                      ,to_avatar = dataset[[9]])[
order(-value)][
seq(1, min(nrow(dataset), limit_connection))]

Next, we need to prepare our user information. This involves creating and assigning unique user IDs (uid) to each user, storing them in a new table. We’ll also calculate the total number of users and store it in a variable called num_nodes:

1
2
3
4
user_ids <- data.table(id = unique(c(dataset$from, 
                                     dataset$to)))[, uid := 1:.N]

num_nodes <- nrow(user_ids) 

Let’s enrich our user information with additional properties, such as:

  • The number of followers (represented by node size).
  • The number of records.
  • The user type (indicated by color codes).
  • Links to avatars.

We will employ R’s merge function to update the table:

1
2
3
4
5
6
7
8
9
user_ids <- merge(user_ids, dataset[, .(num_follower = uniqueN(to)), from], by.x = 'id', by.y = 'from', all.x = T)[is.na(num_follower), num_follower := 0][, size := num_follower][num_follower > 0, size := size + 50][, size := size + 10]

user_ids <- merge(user_ids, dataset[, .(sum_val = sum(value)), .(to, col_type)][order(-sum_val)][, id := 1:.N, to][id == 1, .(to, col_type)], by.x = 'id', by.y = 'to', all.x = T)

user_ids[id %in% dataset$from, col_type := '#42f548']

user_ids <- merge(user_ids, unique(rbind(dataset[, .('id' = from, 'Name' = from_name, 'avatar' = from_avatar)],
      dataset[, .('id' = to, 'Name' = to_name, 'avatar' = to_avatar)])),
      by = 'id')

We’ll also add our generated uid to the original dataset to retrieve the from and to user IDs later in the code:

1
2
3
4
5
6
7
dataset <- merge(dataset, user_ids[, .(id, uid)],
                                by.x = "from", by.y = "id")

dataset <- merge(dataset, user_ids[, .(id, uid_retweet = uid)],
                                by.x = "to", by.y = "id")

user_ids <- user_ids[order(uid)]

Next, we’ll create node and edge data frames for visualization. We’ll choose the style and shape of our nodes (filled circles) and select the appropriate columns from our user_ids table to populate our nodes’ color, data, value, and image attributes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
nodes <- create_node_df(n = num_nodes, 
                        type = "lower",
                        style = "filled",
                        color = user_ids$col_type, 
                        shape = 'circularImage',
                        data = user_ids$uid,
                        value = user_ids$size,
                        image = user_ids$avatar,
                        title = paste0("<p>Name: <b>", user_ids$Name,"</b><br>",
                                       "Super UID <b>", user_ids$id, "</b><br>",
                                       "# followers <b>", user_ids$num_follower, "</b><br>",
                                       "</p>")
                        )

Similarly, we’ll choose the columns from the dataset table that correspond to our edges’ from, to, and color attributes:

1
2
3
4
edges <- create_edge_df(from = dataset$uid,
                        to = dataset$uid_retweet,
                        arrows = "to",
                        color = dataset$col_sentiment)

Finally, with our node and edge data frames ready, let’s generate our visualization using the visNetwork library and store it in a variable called p, which the default code will use later:

1
2
3
p <- visNetwork(nodes, edges) %>%
  visOptions(highlightNearest = list(enabled = TRUE, degree = 1, hover = T)) %>%
  visPhysics(stabilization = list(enabled = FALSE, iterations = 10), adaptiveTimestep = TRUE, barnesHut = list(avoidOverlap = 0.2, damping = 0.15, gravitationalConstant = -5000)) 

Here, we’ve customized a few network visualization settings in visOptions and visPhysics. You are encouraged to explore the documentation and modify these options as per your requirements. Our Actual code section is now complete, and we should update the Create and save widget section by removing the line p = ggplotly(g); as we’ve coded our own visualization variable, p.

Step 3: Preparing the Visualization for Power BI

With the R coding complete, we need to make some adjustments to our supporting JSON files to prepare the visualization for use in Power BI.

Let’s begin with the capabilities.json file. This file contains most of the information displayed in the Visualizations tab for a visual, such as our extension’s data sources and other settings. Firstly, we need to modify dataRoles and replace the existing value with new data roles for our dataset and num_records inputs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# ...
  "dataRoles": [
    {
      "displayName": "dataset",
      "description": "Connection Details - From, To, # of Connections, Sentiment Color, To Node Type Color",
      "kind": "GroupingOrMeasure",
      "name": "dataset"
    },
    {
      "displayName": "num_records",
      "description": "number of records to keep",
      "kind": "Measure",
      "name": "num_records"
    }
  ],
# ...

Within the capabilities.json file, let’s also update the dataViewMappings section. We’ll add conditions that our inputs must meet, and update the scriptResult to match our new data roles and their conditions. Refer to the conditions section, along with the select section under scriptResult, for changes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# ...
 "dataViewMappings": [
    {
       "conditions": [
        {
          "dataset": {
            "max": 20
          },
          "num_records": {
            "max": 1
          }
        }
      ],
      "scriptResult": {
        "dataInput": {
          "table": {
            "rows": {
              "select": [
                {
                  "for": {
                    "in": "dataset"
                  }
                },
                {
                  "for": {
                    "in": "num_records"
                  }
                }
              ],
              "dataReductionAlgorithm": {
                "top": {}
              }
            }
          }
        },
# ...

Moving on to our dependencies.json file, we’ll add three additional packages under cranPackages. This enables Power BI to identify and install the necessary libraries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
    "name": "data.table",
      "displayName": "data.table",
      "url": "https://cran.r-project.org/web/packages/data.table/index.html"
},
{
    "name": "DiagrammeR",
      "displayName": "DiagrammeR",
      "url": "https://cran.r-project.org/web/packages/DiagrammeR/index.html"
},
{
    "name": "visNetwork",
      "displayName": "visNetwork",
      "url": "https://cran.r-project.org/web/packages/visNetwork/index.html"
},

Note: Power BI should automatically install these libraries. However, if you encounter library errors, try running the following command:

1
install.packages(c("DiagrammeR", "htmlwidgets", "visNetwork", "data.table", "xml2"))

Finally, let’s add relevant information about our visual to the pbiviz.json file. Updating the following fields is recommended:

  • The visual’s description field.
  • The visual’s support URL.
  • The visual’s GitHub URL.
  • The author’s name.
  • The author’s email.

Now that our files are updated, we need to repackage the visualization from the command line:

1
pbiviz package

Upon successful code execution, a .pbiviz file should be generated in the dist directory. The complete code covered in this tutorial is available on GitHub.

Step 4: Importing the Visualization into Power BI

To import your newly created visualization into Power BI, open your Power BI report (either an existing one or one created during our Prerequisite step with test data) and navigate to the Visualizations tab. Click the [more options] button and select Import a visual from a file. Note: You may need to first select Edit in a browser for the Visualizations tab to appear.

A pane appears with the title "Visualizations" and two ">" arrows to its right. Below, the text "Build visual" with two images below it: two yellow rectangles and a line on the left, and a paper and paintbrush on the right. The two yellow rectangles image is selected and below it has a panel of more than 30 various graph icons. The last icon is an ellipsis, which has the hover text "Get more visuals." Below the icons panel, the text "Values" with a line of text below that reads: "Add data fields here." Below that, the text "Drill through," followed by "Cross-report" with an "Off" radio button selected next to it.

Navigate to the dist directory within your visualization folder and select the .pbiviz file to seamlessly import your visual into Power BI.

Step 5: Creating the Visualization in Power BI

Your imported visualization is now available in the visualizations pane. Click on the visualization icon to add it to your report, and then add the relevant columns to the dataset and num_records inputs:

A pane appears with a selected tools icon that has the hover text "toptalSocialNetworkByBharatGarg." Below the icon, the text "dataset" appears with various drop-down rectangles below it (from top to bottom): From User, To User, Number of Connections, color, color, User Name, User Name, Avatar, Avatar.

You can enhance your visualization with additional text, filters, and features based on your project needs. It’s recommended that you review the detailed documentation for the three R libraries used to further customize your visualizations, as this example project cannot encompass all possible use cases for the available functions.

Enhancing Your Next Social Network Analysis

Our final result demonstrates the power and effectiveness of R in creating custom Power BI visualizations. Consider utilizing social network analysis with custom visuals in R for your next dataset to make more informed decisions based on comprehensive data insights.

The Toptal Engineering Blog would like to thank Leandro Roser for reviewing the code samples presented in this article.

From top to bottom, the words "Gold" (colored gold), "Microsoft," and "Partner" (both in black) appear followed by the Microsoft logo.
Licensed under CC BY-NC-SA 4.0