MeltData
Best Marketing Research Company in India
LinkedinnBlogtwitterOrkut
info@meltdata.org
+91-928-912-9281
MeltData
MeltData
MeltData

Resources


Cluster Analysis

Cluster Analysis is a statistical technique which is widely used in Market Research industry to develop business insights by analyzing huge volume of data and variables. Cluster Analysis is a technique used for combining observations into groups / clusters having similar properties.

Two most important characteristics of the cluster formed are Each group should be homogenous in terms of certain characteristics ie the observations in a group should be similar to each other with respect to certain attributes. Also, each group should be different from other groups with respect to the same characteristics i.e. different groups should be heterogeneous.

Before starting the details of how cluster analysis is performed, I would like you to understand the scenarios in which cluster analysis is applicable and can produce beautiful insights/ results.

Few of the examples are:

1. The business development manager of watch company want to identify the groups of consumers with similar preferences in terms of taste, durability, brand, looks etc.

2. Financial analyst of a leading investment bank wants to identify group of potential firms whom they can target for making partners.

3.  An electronic gadget company wants to identify similar zones/ cities in the country as a pre launch activity of a new product.

Hope you got a feel of where exactly cluster analysis is done. Mostly it is applicable when you are interested in dissecting the data into various clusters with respect to some characteristics. It helps in reducing the huge volume of data into groups which is easy to analyze. The definition of similarity varies from analysis to analysis and it depends on the objectives of the study.

Now we will proceed further with the details of what are steps involved in the cluster analysis and various methods used for statistical analysis:

Let us consider a Plot of hypothetical data which will help you to understand in a better way.
Below given diagram is showing a geometrical view of cluster analysis. In general each observation can be shown in n dimensional space, where n is the number of characteristics. Here we can have plotted a 2 dimensional space as we are interested in only 2 characteristic and we also have 6 observations. You can clearly see that there are 3 groups with two observations each. We call the characteristics as the clustering variables.

img002


Steps in cluster analysis:


  • Select a measure of similarity.
  • Type of clustering technique to be used (Hierarchal or Non Hierarchal, we will consider them in detail later)
  • Type of clustering method
  • Hw many clusters to be chosen and interpretation of cluster solution

We will detail with all the above given steps one by one

STEP 1:

As we have seen in the geometrical interpretation, we were looking at the closeness of points/observations as the measure of the similarity. So here we will calculate the distance between 2 observations which we call as Euclidian distance as a measure of similarity. Lesser the value more closes hence possibility of combination.

Euclidean distance between 2 points is        D212       = (X i – X j) 2 + (Y I - Y J) 2 

We make a matrix by calculating Euclidean distance between all combinations out of various observations. Distance between various observations shows how closely those observations are related to each other.

STEP 2:

Now we will move forward to method used to group the various observations. Basically these techniques are divided into 2 broad categories named as Hierarchal and Non Hierarchal clustering’s. Here we will deal with only first one.  

 

Now we must use some rule to combine the observations to various groups. Some of the popular methods are:


  • Centroid Method
  • Farthest-neighbor or complete – linkage method
  • Nearest-neighbor or single linkage method
  • Average linkage method
  • Ward’s method

For now we will deal mainly with the Centroid method

In the Centroid close set of observations will be replaced by an average object which is the Centroid of that group. We will take the average of various variables close to each other and form a new cluster with average out values. Consequently we will reduce the observations gradually into clusters thus reducing the number of observations into few groups/ clusters. We will step be step combine the various observations like first forming the clusters of 2 observations then 3 and then higher.

Consider        “S1, S2 “      “S3, S4”     “S5, S6”    as similar groups at the first step

img004
 



Other method apart from Centroid methods:

Single Linkage Method: In single linkage method the distance between 2 clusters is represented by the minimum of distance between all possible combinations of subjects in the two clusters.

Complete Linkage Method: It is exact opposite of single linkage method as we consider maximum distance in this case opposite to single linkage.

Average Linkage Method: In this method we take average distance between the between all pairs of subjects in the two clusters.

images.jpgShort TIP: The above details are important build the understanding of the solution. But you can use    SAS / SPSS to generate the output with 7-8 lines of code. I will provide you the syntax of SAS for generating cluster solution.

 

Interpretation of SAS output for cluster analysis


Statistic

 

Measured Value

 

Comments

RMSSTD

 

Homogeneity of new cluster

 

Value should be small

SPR

 

Homogeneity of new cluster

 

Value should be small

RS

 

Heterogeneity of new cluster

 

Value should be high

CD

 

Homogeneity of new cluster

 

Value should be small


  • RMSSTD: Root mean square standard deviation of the new cluster
  • SPR: Semi partial R – Squared
  • RS: R squared
  • CD: Distance between 2 clusters

 

You can use SAS for doing hierarchal clustering. Here is a piece of SAS code you can try yourself over some data.


DATA TABLE;
INPUT    TAT $ 3-4    AGE 5-6 WEIGHT 6-7
CARDS
Insert data here
PROC CLUSTER SIMPLE NOEIGN METHOD = CENTROID RMSSTD RSQUARE NONORM OUT = TREE;
ID TAT;
COPY AGE WEIGHT;
PROC SORT; BY CLUSTER;
PROC PRINT; BY CLUSTER;

VAR   TAT AGE WEIGHT;

 

For any queries or more explanation mail at info@meltdata.org.


Watch out for more very soon!

Web link : Market Research Data Analytics Solutions provider



« back

MeltData Latest News MeltData
 
Our research reports match high industry standards.
13-07-2009
Meltdata incorporates a robust process of planning, designing,analysis.
Our research reports match high industry standards.
13-07-2009
Meltdata incorporates a robust process of planning, designing,analysis.
Our research reports match high industry standards.
13-07-2009
Meltdata incorporates a robust process of planning, designing,analysis.
 
MeltData   MeltData

MeltData Industry News MeltData
   
MeltData   MeltData
RSS Feed RSS Feed | About Us | Services | Careers | Useful Resources | Contact Us