How to Analyze Subscriber Behavior with Kentik
Summary
Learn how to analyze subscriber behavior using Kentik. In this post, we focus on the challenges and solutions of identifying and tracking the customers in an IP network while complying with regulations such as GDPR, show how Kentik Custom Dimensions and Data Explorer provide the analysis, and finally touch on how the associated APIs help automate and ease the entire process.
Introduction
Running an access network? Knowing what your customers are doing with the product or service they buy from you is key to success – a universal truth that applies to almost every business, but maybe even more for those who sell access to the internet.
In this previous blog post, we discussed the various learnings of broadband subscriber behavior analysis and its uses. Here we will explore how Kentik can help perform these analyses.
The challenge: Identifying the subscriber
The first step in analyzing customer behavior is identifying the subscriber. In IP networks, the usual method is through the IP address assigned to the customer’s connection. However, since these IP addresses are often dynamically assigned, the challenge is tracking which customer is using which IP address at any given time. Once you decide to track, it is crucial to do so in a way that complies with regulation in place where your network operates. For example, inside the EU these actions fall under GDPR regulation. At a high level, this means that:
- You must be transparent with the customer that you do this.
- The customer must be able to refuse to let their data be included.
- They must be able to get the data you collect about them.
Address assignments and logging
No matter what method you use to assign IP addresses to your customer – DHCP, Radius, PPPoE, or combinations – the servers involved will have the ability to log which customers are assigned which IP address. So you can use the logs to build a system where a customer’s current IP address is tracked and where changes would trigger an update to Kentik to ensure that the traffic flows will be assigned to the right customer. Part of this system should also include anonymization of the customer ID, so the privacy of the individual users is protected.
A practical and secure method to achieve this is through hashing of customer IDs. Hashing transforms the original customer ID into a unique, fixed-size string of characters, which is nearly impossible to reverse-engineer.
By replacing the actual customer IDs with their hashed counterparts before they are stored or processed, you can ensure that all subsequent analyses are conducted on anonymized data. This approach not only enhances data security but also aligns with privacy regulations like GDPR, ensuring that the customer’s privacy is maintained without compromising the integrity of the analysis.
Custom dimensions
Kentik’s custom dimensions feature is the key to this approach. Custom dimensions allow you to add custom columns to your organization’s main tables in the Kentik Data Engine. The Kentik Data Engine is where the flow data is enhanced and stored so it can be queried for all the applications of the data the system offers.
Like Kentik-provided dimensions, a custom dimension may be used for group-by in a query or as a filter in a filter group.
Implementation
A dimension in the Kentik Data Engine can be viewed as a column in the main table of the database. A Custom Dimension is populated by a set of rules, called “Populators,” and it is the ability to specify these yourself that defines the “custom” in Custom Dimensions.
A Populator consists of a “value” and a set of criteria to match predefined fields to each flow in order to determine whether or not the value should be assigned to the flow when the flow is ingested into Kentik.
In this specific case, we will use the hashed CustomerID as “value” for a Populator for each customer and add the corresponding IP address to the “address” field. We will also specify that the direction of the flow can be either destination or source.
But how would this work in real life? Even with a small customer base using static addresses it would be a tedious task to configure the one populator per customer using the UI. Like many other tasks, this can be automated by using the Kentik API.
Automated management via the API
The Kentik API offers a number of methods that enable programmatic control of Kentik. In this case, we need to use one of the Customization APIs, namely the Batch API.
This API supports batch updates of flow tags or populators for Custom Dimensions and we will dive into the use for updating populators.
The Batch API is constructed to make it easy to do batch updates. Batch updates are needed in this scenario where we want to build a system where we need to keep a very large set of data updated – the mapping of the hashed CustomerID to IP address.
The Batch API uses a single POST method called “Batch Request” to add, update, or delete a set of populators and a GET method called “Batch Get Request” to get the status updates of the progress of the batch operations.
For the create/update/delete method, multiple requests – each referred to as a “part” – are supported within one batch operation. This way, very large datasets can be managed in a staged and controlled manner.
The POST method requires you to specify which criteria needs to be matched for a specific value to be assigned to a flow when it is ingested into the platform. It is constructed such that there is no need for you to keep track of the populator IDs, but will match on the values to identify the populator that needs to be either created, modified, or deleted. You will only need to keep track of the Batch Global Unique Identifier so you can control which parts belong to which Batch request and keep track of the progress.
A sample of the JSON to do this could look like this:
"replace_all": false,
"complete": true,
"upserts": [
{
"value": "USERZUDW4GB0",
"criteria": [
{
"addr": ["1.2.3.4"]
}
]
"value": "USERV0R2OOLU",
"criteria": [
{
"addr": ["1.2.3.7"]
}
]
}
],
"deletes": [
{
"value": "USERU7EEATTH"
}
]
}
Note that the direction is not specified. The default value is either which is exactly what we need for this use case.
Data analysis and application
Once the flow data is enhanced with hashed customer IDs, you can start the analysis. Kentik’s Data Explorer is the tool for this, capable of creating complex queries.
Example
An example query could be identifying the top 50 users of the top 100 most-used OTT services. Such insights can be instrumental for tailored customer outreach and special offers based on their OTT service usage.
Once you are happy with the results, you can use the API to automate the queries to get the data for post processing – for example, creating an outreach to the customers to offer products based on their OTT services use.
Need more analysis?
While Kentik’s Data Explorer is powerful, some analyses might require more complex statistical calculations. For such cases, the Kentik Firehose feature allows for the export of data from the Kentik Data Engine to external analytics systems or data lakes, facilitating deeper analysis.
Conclusion
In summary, analyzing subscriber behavior using Kentik offers a comprehensive and detailed approach to understanding customer behavior when using their internet access. By effectively identifying subscribers through dynamic IP address tracking and ensuring compliance with regulations like GDPR, the operator can gather essential insights while respecting customer privacy and use these to find new revenue streams and enhance the customer experience.