Network Science

Unraveling Sweet Connections through Network Science

Cyril Benedict Lugod
12 min readAug 6, 2023
Photo by Jonathan Borba on Unsplash

The clock has just struck 5 in the afternoon. The sun is quickly setting, and the evening crowd is starting to gather.

Meet Aya.

Photo by Brett Wharton on Unsplash

Aya has finally reached her dream of owning and running her own dessert cafe, Sweet Escape. As she started preparing her third batch of buko pandan for the evening, she realized she was all out of condensed milk! Having no one else to look after the cafe, she had to regretfully put a customer’s enthusiasm for buko pandan down.

As she closed for the night, this minor inconvenience got her thinking. She has always considered exemplary customer service her mantra in operating her cafe, and she found this minor hiccup untolerable. All her years of studying and working as a data scientist have taught her a lot of ways to deal with problems like this.

Network science studies connections in a system, revealing hideen insights to help drive business decisions.

So Aya got to work. She wants to study the relationships between all the ingredients needed for all the desserts available on her menu. Aya will use all of the insights she can uncover in her network analysis of her dessert menu and all the necessary ingredients to transform Sweet Escape into a haven of success full of happy and sweet-toothed patrons.

Let us join her on this amazing journey!

About the Dataset

Screenshot of the Panlasang Pinoy website

The data for this interesting use case was manually scraped from the Desserts section of Panlasang Pinoy [1].

Looking at the Bipartite Network

Aya knows that modeling the dynamics between the desserts on her menu and their respective ingredients entails the use of a bipartite network.

Dessert nodes (pink) linked to their respective ingredient nodes (blue)

A bipartite network reveals how desserts and their ingredients are interconnected. From the graph, we can easily see which desserts share common ingredients and, likewise, which ingredients are present in multiple desserts. It is almost like deciphering a secret code that has information about how the different desserts in Sweet Escape are created and what essential ingredients they have in common.

It is worth noting that in a bipartite graph, only direct connections between nodes from the left set (desserts) and nodes from the right set (ingredients) are allowed. No dessert is allowed to directly connect to another dessert, nor is any ingredient allowed to directly link to another ingredient.

Examining the Projections

While nodes from the same set cannot directly connect to another node from the same set, it does not mean that Aya cannot find and study any relationships between all the desserts on her menu or, similarly, between all the ingredients in her kitchen at Sweet Escape.

In order to see the relationships between either set, Aya has to look at their individual projections.

Consider this subset of the original bipartite network

Projections simply tell you to ignore the node from the other set between a certain node and another node from the same set.

Projections involve ignoring the other set of nodes

From this figure, getting the ingredient projection involves ignoring all the dessert nodes along the way. Hence, you can connect brown sugar directly with cooking oil even if, in actuality, both of them are only directly connected to bananacue. Aya will have to do this across all the ingredient nodes to get the ingredient projection and also across all the dessert nodes to get the dessert projection.

Edges in the projections are weighted based on how many ways two nodes can be linked

Because many desserts can have both brown sugar and cooking oil, the link between those two can exceed the default count (let’s call it weight moving forward) of 1. In this case, those two ingredients are also indirectly connected through kamotecue. Here, we can expect the weight to be 2. The weight of the connections (which we shall call edges from here on) indicates the strength or importance of that edge. As with brown sugar and cooking oil, their edges have a weight of 2 since they are both present in these two desserts.

Dessert projection
Ingredient Projection

Essentially, projections are just views of the same bipartite network from earlier but from two different perspectives: through the lens of the desserts and that of the ingredients.

Analyzing the Projections

Aya is initially overwhelmed with all the analysis she can perform on the network to uncover potential insights. However, she realizes that the first step is always the one most visible on the projections. She needs to check the connectivities of the nodes and figure out which nodes have more edges going out to other nodes and which do not.

Degree Distribution

Aya wants to investigate the degrees of her nodes for either projection. Degrees simply refer to how many edges are connected to a certain note in the network. It measures the connectedness of a node to other nodes.

In the bipartite graph from earlier, the degree of a dessert is simply the number of ingredients it has. Meanwhile, the degree of an ingredient is just a count of how many desserts use that ingredient.

However, what Aya is more interested in is how the degrees among the dessert nodes and ingredient nodes vary in the dessert projection and ingredient projection, respectively.

The degree of an ingredient node measures how many other ingredients are often used with it in different dessert recipes, while also considering the weights of the edges due to some ingredient pairs appearing more than once in the dessert recipes.

The degree distribution of the ingredient projection follows a power law distribution. Not surprisingly, the ingredients with very high degrees at around 70–80 are the basics for any dessert: sugar, water, and salt. Meanwhile, most ingredients appear sparingly on the menu.

On the other hand, the degree of a dessert node is a measure of how many other desserts share at least one common ingredient with that given dessert.

There is a different distribution observed with the dessert projection. This may be due to only having 38 desserts on the menu. However, it is worth noting that there are two desserts with degrees at around 70: egg pie and buko pie. This can be attributed to both of them sharing most of their numerous ingredients.

Centrality Measures

Another aspect Aya wanted to examine was how “central” the nodes were in her projections.

Now, there are multiple ways to define how central a node is. Aya is considering degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality.

Degree Centrality

This measures how central a node is based on how many connections it has to other nodes.

Ingredients with the highest (left) and lowest (right) degree centralities

An ingredient node with a high degree centrality, such as water and condensed milk, is an ingredient that is paired with many other ingredients. Ingredients with low degree centralities, like whipped cream, are specialty ingredients only used in select desserts.

Likewise, a dessert node with a high degree centrality is a dessert whose ingredients are shared with a lot of other desserts on the menu. Those with a lower degree centralities are desserts using uncommon ingredients rarely shared with other desserts.

Betweenness Centrality

This centrality measure is indicative of how much this node serves as a critical connector between other nodes. More specifically, it is a measure of how many of the shortest paths between pairs of nodes pass through this node.

Ingredients with the highest (left) and lowest (right) betweenness centralities

Ingredients with a high betweenness centrality like condensed milk, are basically kitchen essentials. In our case, condensed milk is necessary to create a variety of desserts, ranging from your no-bake salads to your sweet kakanin goodies. Low betweenness centrality implies that it’s not as common as other dessert varieties.

On the other hand, dessert nodes with a high betweenness centrality are those with a complex selection of ingredients. It does not necessarily mean many ingredients, but simply put, it has a very diverse mix of ingredients ranging from fruits and nuts to all other possible types of ingredients present in the kitchen. Low betweenness centrality for dessert nodes indicates desserts that have fairly simple ingredients.

Closeness Centrality

This is a measure of how close a node is to all other nodes in the network in terms of how many edges it has to cross.

Ingredients with the highest (left) and lowest (right) closeness centralities

High closeness centrality for ingredients highlights the versatility of the ingredients. They can partner up with a lot of other ingredients to whip up different varieties of desserts.

For the desserts, high closeness centralities imply they have at least one shared ingredient with all other desserts on the menu.

Eigenvector Centrality

This measures the connectedness of a node based on the connectedness of the nodes it is connected to (let’s call them neighbors).

Ingredients with the highest (left) and lowest (right) eigenvector centralities

Ingredients such as sugar and salt have high eigenvector centralities since they are typically the foundation ingredients of pretty much any dessert. Hence, they usually coexist with almost every dessert on the menu.

While there are fewer implications in eigenvector centralities for dessert nodes, it might be worth noting that having desserts with these high eigenvector centralities means that they will be sharing ingredients with other desserts that share a lot of ingredients as well.

Clustering Coefficients

Clustering coefficients tell you how friendly your own neighbors are. It provides a measure of how connected your own neighbors are with each other.

Clustering coefficient of green node drops when neighbors are less connected (Photo from GeeksforGeeks) [2]

Aya knows that for bipartite networks, clustering coefficients are automatically zero since nodes from the same set cannot directly link with each other. Desserts can only be directly connected to ingredients, and these ingredients cannot be directly linked with each other. Thus, there are zero connections between any two ingredient nodes surrounding our dessert node. Hence, clustering coefficients must be analyzed separately for each projection.

Ingredients with the lowest clustering coefficients

In the ingredient projection, having low clustering coefficients implies that the neighboring ingredients of a specific ingredient are fairly unrelated or are used in specific types of desserts with little to no sharing. Aya found out that condensed milk has the lowest clustering coefficient among the bunch. This implies that condensed milk is used for many desserts that themselves have few shared ingredients amongst each other.

This greatly relates to condensed milk being the ingredient with the highest betweenness centrality. Condensed milk is necessary for making your creamy salads, soft flans, and sticky kakanins.

Ingredients with a clustering coefficient of 1.0

On the other hand, those with a clustering coefficient of 1.0 highlight one-off ingredients that only have uses for a single dessert or closely related group of desserts in the menu. The clustering coefficient is high since the ingredient is connected to other ingredients for that specific dessert, so all its ingredients are automatically connected to each other.

Desserts with low clustering coefficients imply that they use bits of ingredients from various desserts that are almost unrelated to each other. Customers may find these desserts enticing from a flavor standpoint since they use all kinds of ingredients that are also used in different types of desserts.

Community Detection

Now that Aya has looked at the dynamics of the connectivities in each projection, she is curious if there are certain groupings of desserts in her menu based on their ingredient requirements. She has decided to employ the Louvain method.

The Louvain method is a hierarchical clustering algorithm that recursively merges communities starting from one-node communities until it finds the best partition for the communities that maximizes the modularity score for each community. Modularity is a measure of how dense the connection between nodes within a community is compared to the connection between nodes belonging to different communities.

Louvain-based community detection on the dessert projection

Aya found out that she can somehow categorize her dessert offerings into possibly four groups (or communities).

The yellow nodes are desserts made with lots of flour and are typically hard and crumbly in nature. Turquoise nodes have either coconut, banana, or cassava but have strong notes of vanilla. Blue nodes are considerably fewer and feature jackfruit or corn. Lastly, gray nodes are the creamiest desserts, requiring a large amount of condensed milk.

Louvain-based community detection on the ingredients projection

Likewise, she also examined the partitioned communities for her ingredient base. The violet nodes are mostly fundamental ingredients and bases, such as essential sweeteners and thickening agents such as glutinous rice. Green nodes are used in fruity, creamy, and cheesy desserts. Turquoise nodes pertain to those used with ube, jackfruit, and banana with lumpia wrappers. Red nodes are the traditional kakanin, which are mainly flour-based. Lastly, yellow nodes appear to be diverse and widely used in various desserts.

Robustness Analysis

Finally, Aya wanted to simulate the potential effects should she run out of one random ingredient she has in her cafe kitchen. She wanted to see how robust (or vulnerable) her current dessert lineup is based on the necessary ingredients on Sweet Escape’s menu.

Effects on dessert unavailability of random ingredient node deletion follows a power law distribution

After 100,000 simulations of a random ingredient outage (just one ingredient), she found out that, on average, 3–4 desserts could become unavailable. Sure, most of the time it would just be 1 or 2 desserts affected, as illustrated by the histogram above. However, for Aya, one unavailable dessert during a cafe run is one too many.

She further identified which desserts are most likely to be affected by a random ingredient removal.

Desserts most likely to be affected mainly due to ingredient complexity

Aya was able to identify that the most vulnerable desserts in her menu were part of the yellow node community from the dessert community detection she performed earlier. It made sense that the pastries would be vulnerable, considering the number of ingredients necessary to make them.

Moving forward…

Aya has realized that she has to ensure a steady stream of essential ingredients with high betweenness centrality such as condensed milk and coconut. These are always present in various desserts on her menu. She has decided to forge collaborations with local suppliers and even neighboring cafes to ensure a steady supply of ingredients in her kitchen during Sweet Escape’s operating hours.

Now that she has also identified items with low clustering coefficients, she is doubly aware that these ingredients, such as condensed milk, sugar, and coconut, are high-stakes.

Understanding the robustness of her ingredient network also helped Aya optimize her supply chain. Considering the vulnerability of desserts in the yellow node community, Aya will develop contingency plans in case of ingredient shortages. She will identify any possible alternative ingredients and possible backup recipes for those desserts to avoid unavailability in dire times.

Aya is also considering dropping the desserts in the yellow node community (mostly hard pastries) from her regular menu. She might instead offer them as time-limited offerings to minimize cases where a customer orders and the dessert is unavailable.

In this next chapter of Sweet Escape, Aya has vowed to commit to excellence and never compromise customer satisfaction. Thanks to profound insights using network science, her sweet reality is waiting to unfold. With greater passion and wisdom, Aya invites you to a greater journey with network science so that together, all of us can savor the sweetness of success!

The dataset and Jupyter notebook can be found in my GitHub repo.

References

[1] Dessert Recipes Archives. Panlasang Pinoy. (n.d.). https://panlasangpinoy.com/categories/recipes/dessert-and-pastry-recipes/

[2] GeeksforGeeks. (2022, October 31). Clustering coefficient in graph theory. GeeksforGeeks. https://www.geeksforgeeks.org/clustering-coefficient-graph-theory/

[3] Network science by Albert-László Barabási. BarabásiLab. (n.d.). http://networksciencebook.com/chapter/8

--

--

Cyril Benedict Lugod

Aspiring Data Scientist | MS Data Science @ Asian Institute of Management