Using Customer Segmentation to Create Customer Personas

A Case Study Using K-Means Algorithm in the Banking/Financial Industry.

Lucas O
11 min readJul 1


Image Source:

In this analysis project, I explore the use of k-means algorithm to perform customer segmentation in the banking/financial industry. The output of the customer segmentation is then used to create customer personas and also recommend personalized products based on such personas.

Disclaimer: Please note that I am not a banking/financial industry expert. This data analysis project is based on a general understanding of customer analytics. The recommendations provided regarding financial products are based on online research. Implementing these recommendations in the real world requires expert opinion and consideration of specific regulatory, market, and business factors.

Problem Statement

The banking industry struggles with understanding and meeting the diverse needs of customers, resulting in generic marketing approaches to financial product offerings that fall short of creating personalized experiences. To address this challenge, a data-driven solution is required to effectively segment customers and create targeted personas for improved customer engagement and satisfaction, thus leading to increased revenue for banks/financial institutions.


Enter Customer Segmentation. Customer segmentation refers to the process of dividing a customer base into distinct groups or segments based on shared characteristics, behaviors, preferences, or needs. This can be achieved through the powerful k-means algorithm. By leveraging data analytics and machine learning techniques, financial institutions, including banks and credit card companies, gain valuable insights into customer behavior, preferences, and characteristics. This enables the identification of distinct customer segments and the development of personalized offerings. This has several advantages including enhanced customer understanding, improved customer engagement with product offerings, a healthier customer base, and increased revenue for financial institutions.

Implementation Strategy

To implement customer segmentation using the k-means algorithm, financial institutions should follow these steps:

1. Data Collection and Integration: Gather customer data from various sources, including transaction records, demographic information, and customer interactions.

2. Preprocessing and Feature Selection: Cleanse and preprocess the data, selecting relevant features for analysis, such as transaction frequency, average balance, age, and customer preferences.

3. K-Means Clustering: Apply the k-means algorithm to segment customers based on similar behavioral patterns.

4. Persona Creation for Targeted Marketing: Develop personas for each customer segment, incorporating demographic information, behaviors, preferences, and goals.

5. Implementation and Iteration: Implement personalized marketing campaigns and track the performance of the tailored strategies. Continuously evaluate and refine the segments and personas based on customer feedback and evolving market dynamics.

The following sections, walk through an analysis project on customer segmentation and persona creation using the k-means algorithm.

Data Understanding / EDA

The dataset used in this is part of the resource files for the Udemy course
Data Science for Business. It contains 8950 observations and 18 features including customer id, balance, purchase frequency, cash advance amount, etc. For full data description, see detailed write up in Github project.

What are some potential ways to segment customers? We can explore a few options by performing simple explanatory data analysis.

Option 1: Segment by Credit Utilization

Although not present in the dataset, one feature I was particularly interested in was credit utilization. Credit utilization is a measure of how much of a customer’s available credit they are currently using. It is calculated by dividing the amount of credit currently being utilized by to total available credit. Credit utilization is an important factor for banks and credit card companies because it is one of the key metrics used to assess a borrower’s credit worthiness.

The histogram below shows the distribution of credit utilization for customers in the dataset, split by credit utilization below 100% and above 100%.


  • 2.6% of customers have credit utilization above 100%. These customers would be considered extremely risky and have high chances of defaulting on their payments.
  • A general threshold for healthy credit utilization is 30% or below. This would mean about half (51%) of customers in this dataset are above the healthy threshold.

Option 2: Segment by Cash Advance

Another way we may want to segment customers is by looking at cash advance Given the high credit utilization we saw for some customers, it is obvious that some customers might be using their credit cards as a
means to take out loans. The plot below shows average cash advance for 3 cohorts of customers taking out cash advances. Low = less than $1000, Medium = $1001 — $5000, High = above $5000.


  • We can see certain customers with cash advance upwards of $10,000. The max cash advance for a customer is $47,000. These customers appear to be using their credit card as means to take out loans for other purposes. Cash advances are generally considered bad financial practice for several reasons including high fees, high interest rates and negative impact on credit scores. Financial institutions may need to target these customers with better financial products, perhaps low interest loans. We’ll explore this later on.

Option 3: Segment by Purchase/Balance Amount & Frequency

Taking a look at purchase/balance amounts and frequency of purchases helps understand the spending habits of customers. The plot below categorizes customers into high/low purchase/balance cohorts based on median values of purchases and balance, for customers with 0 cash advance.

Observation: We can observe a few trends:

  • High Purchase / High Balance; These customers are potentially using their credit card for all of their purchases to perhaps accumulate points, and are maintaining a high balance.
  • Low Purchase / High Balance; These customers are potentially making some high value purchases on their credit card and are taking time to payoff the balance.
  • High Purchase / Low Balance; These customers are potentially making frequent purchases with their credit card but are paying off the balance immediately.
  • Low Purchase / Low Balance; These customers are potentially making very few purchases and paying off the balance immediately. These could be young customers looking to build their credit.

We have used explanatory data analysis to understand some characteristics about customers and get ideas of how we could segment customers. This initial step will help to shed light on how a k-means algorithm may cluster customers later on in the project.

K-Means Algorithm

K-Means algorithm is a popular unsupervised machine learning technique used for clustering data. The algorithm works by dividing a set of data points into k clusters based on their similarity. The k-means algorithm is widely used in various applications, such as customer segmentation,
and anomaly detection. To learn more about the k-means algorithm, you can read up on the following sources:

Data Prep

To prepare the data for k-means, a few feature engineering steps were applied —

  1. Data Filtering: Before proceeding with k-means, the dataset was filtered down a particular cohort of customers, customers with tenure = 10 months. This was done to speed up computational time and make it easier to analyze/visualize the clusters.
  2. Add credit utilization feature.
  3. Removing unwanted features (such as customer id).
  4. Normalization: This prevents features with larger values from dominating the clustering results.
  5. Dimensionality Reduction (PCA): Reducing the dimensionality of the data can reduce the computational complexity and improve the interpretability of the clustering results. For this analysis, I used a PCA threshold of 80%, meaning generating enough components to capture 80% of the variability in the features.

After applying PCA, 5 principal components were generated.

Optimal Number of Clusters

A skree plot was used to determine the optimal number of clusters. A skree plot displays the within-cluster sum of squares (WCSS) on the y-axis,
plotted against the number of clusters on the x-axis, and helps identify the elbow point, or the point on the plot where adding more clusters does not significantly improve the clustering performance, i.e, the WCSS is not significantly reduced by adding more clusters.

Observation: We can see that the elbow point appears to be 5 clusters. However after running the first iteration of the k-means algorithm, I noticed that only 1 customer was being assigned to one of the clusters. I would assume that this customer has certain characteristics that stand out for the rest of the cohort. For this particular analysis, I just decided to
exclude this customer and re-cluster again, using 4 clusters.

Cluster Results

Observation: Although not perfect, the k-means algorithm does a decent job of segmenting customers. We can see 4 distinct segments with a few customers that should probably be re-clustered, although a 3 dimensional plot may present a different perspective.

Next we can analyze these clusters in-depth.

Cluster Analysis

Now that we have our clusters of customers, we can begin analyzing these clusters to understand trends in customer characteristics of the different clusters. Note that we have already seen some of these trends during the explanatory data analysis phase, thus we can better understand how the k-means algorithm is working. To start, we can plot the distribution of some features of interest by cluster using boxplots. Let’s start with balance, cash advance credit utilization and purchases.

Observation: 2 distinct clusters stand out at first glance, cluster 4 and cluster 2:

  • Cluster 4: Include customers with high balance, high cash advance and high credit utilization. However, these customers have very low purchases. As we outlined earlier, these are customers that appear to be using their credit cards as a means to take out loans.
  • Cluster 2: Includes customers with high credit utilization, slightly high balance and high purchases. These customers appear to be using their credits strictly for making purchases, and tend to pay off the balance frequently.
  • Cluster 3: These customers have lower values for all these features. However we know that this analysis cohort includes customers with tenure of 10 months, so these customers have had their credit cards for a while. This clusters appears to include customers who are cautious
    about using their credit cards and tend to make some low value purchases.
  • Cluster 1: These customers are similar to those in cluster 2, however they tend to have higher credit utilization and lower credit limit (median of $2,500) compared to cluster 1 ($5,000) and cluster 3 ($3,500). Cluster 2 has the lowest median credit limit ($1,350).

For cluster analysis using other features like credit limit, minimum payments, purchase transactions, etc, see detailed write up in Github project.

Customer Personas / Product Recommendations

We have thus seen how the k-means algorithm helps with customer segmentation by grouping similar customers together based on shared attributes or behaviors. Thus we can come up with strategies to tailor product offerings, and customer experiences to specific segments by building customer personas.

Cluster 4: The Loanee Customer

These customers tend to use their credit cards a means for taking out loans. They use their credit card to bridge the gap between their financial needs, thus resulting in high cash advances, high balance and high credit utilization. Financial product recommendations for this group could include;

  • Balance Transfer Card Options: Option to transfer their balance to 0% APR cards to save on interests rates.
  • Debt Consolidation / Personal Loans: Provide debt consolidation loans with lower interest rates, allowing these customers to combine their high-interest debts into a single manageable payment, thus reducing their interest charges and simplifying their repayment strategy.
  • Credit Counseling: Provide guidance on managing debt, creating budgets and financial improvements.

Cluster 2 : The Purchasing Customer

These customers use their credit cards as a means for making frequent
purchases. This may be due to the convenience and rewards earned by using their credit cards. Financial product recommendations for this group could include;

  • Rewards Programs: Reward programs where customers earn points, cash back, or airline miles based on their spending. These programs encourage customers to use their credit cards and provide incentive for loyalty.
  • Premium Cards: This bank could offer premium cards with enhanced benefits for frequent users. While these cards typically come with higher annual fees (value for the bank), they also provide additional perks such as airport lounge access, concierge services, travel insurance, etc. (value for the customer).
  • Installment Plans: This bank could offer installment plans to these customers, enabling them to convert larger purchases in smaller, more manageable monthly payments.

Cluster 3: The Prudent Customer

These customers take a cautious and thoughtful approach to their credit card usage. They have a sense of responsible financial behavior and consider spending decisions carefully. Financial products for this group could include;

  • Low Interest Cards: Offer credit cards with low APRs, enabling customers to manage balances with minimal interest charges. Banks benefit from interest earned, while customers can maintain a cautious approach without worrying about excessive fees.
  • Secured Credit Cards: Customers can build or rebuild credit with control over spending by providing a security deposit as collateral. This helps establish a positive credit history, while banks benefit from the security deposit and potential conversion to unsecured cards in the future.
  • Fraud Protection and Monitoring: Banks provide enhanced fraud protection, including real-time alerts, identity theft protection, and advanced security features. Customers gain peace of mind with closely monitored accounts, while banks reduce the risk of financial losses from fraud.

Cluster 1: The Selective Customer

These customers also have a careful and deliberate approach to credit card usage, primarily for low/medium value infrequent purchases. They are selective in their spending choices and prioritize making low/medium value purchases rather than frequent small transactions. Financial products for this group could include;

  • Value Purchase Financing: Banks can offer specialized financing options for infrequent purchases, allowing customers to pay for these purchases over time with low or 0% interest rates. This benefits customers by providing affordable installment plans and flexibility in managing their cash flow, while banks generate interest revenue and foster customer loyalty.
  • Extended Warranty Protection: Banks can provide extended warranty protection as an added benefit for customers making certain infrequent high-value purchases. This extends the manufacturer’s warranty of these high-value purchases, offering coverage against unexpected repairs or replacements, providing peace of mind to customers.
    The bank benefits from increased card usage and customer satisfaction, potentially leading to long-term loyalty.
  • Personalized Spending Insights: Banks can offer personalized spending insights and analysis for customers who make infrequent purchases. This includes detailed transaction categorization, spending trends, and recommendations tailored to their unique spending patterns. It benefits customers by helping them make informed financial decisions and optimize their spending, while banks enhance customer engagement and strengthen relationships by providing valuable financial insights.

These a just a few examples of how a financial institution can craft customer personas and product recommendations based on segmentation. In the real world, crafting financial products that align with customer personas derived from various segments/clusters requires deep industry expertise. It is essential to ensure that these products are thoughtfully designed to create a win-win scenario, benefiting both the bank and the customers. By applying industry knowledge and understanding customer needs, financial institutions can develop tailored offerings that meet customer expectations while driving business growth.


In conclusion, customer segmentation and persona creation using the k-means algorithm have emerged as powerful tools in the banking industry. By leveraging data analytics and machine learning techniques, banks can gain a deeper understanding of their customers, tailor their offerings to specific segments, and deliver personalized experiences.


For code used to produce this analysis and more detailed write, checkout the Github project.



Lucas O

Analytics professional, passionate about using data to solve business problems. Interested in Marketing Analytics, AB Testing and Causal Inference.