Navigating in the science of data visualization we can detect patterns, trends and correlations that might be undetectable if we analyze our data directly in a CSV file reader software (Microsoft Excel, generally).
In our investigation we’re going to work with 2 software’s that we recommend you:
GlueViz or simply call Glue is a Python library to explore relationships within and between related data sets
Here I left you 2 videos related with this 2 amazing tools:
Let’s move on!
In this first crossing variables process, I decided to use the variable Sessions with our O.V (Objective variable = Incomes) trying to identify some patterns in the data points.
Why use Sessions and Incomes in this first crossing variables?
In the e-commerce universe sometimes the digital experts express the idea that Sessions are related with the Incomes, so they’ve created the rule: higher Sessions higher Incomes
Here I present you some data visualizations done in Glue:
Only to remember, each data point is an activity done within the e-commerce website by one person.
Analyzing the 4 charts above, 2 of them called my attention:
1.- The Sessions vs Avg.Order has a particular behavior; We can see how the most occurrences of Avg. Order value happen when the lower average of Sessions occur. If we make a zoom in the chart, we can see in this particular case that higher Session are not necessarily generating higher Avg. Orders.
2.- The second interesting chart is Sessions vs conversion rate. In this amazing representation of each data point, we can see how all most occurrences are right-skewed, were the lower average of Sessions occur.
Also in the section of lower average of Session we can see high conversion rates values (circle):
If we put a line in the y-axes (at 1.09% of conversion rate) and we compare the number of data points between low and high Session:
27 occurrences for the purple block (high Sessions) vs 30 occurrences in the blue block(low Session)
In this particular case we have more occurrences of conversion rate higher than 1.09% in the section of low Sessions, giving us as a result that higher Session not necessarily generate high conversion rates.
For those of you who want to know more about conversion rate here I left you a definition from Wikipedia:
Now, it’s time to move further in exploratory data analysis:
During a lot of data correlation exercises done in GlueViz (in assumption, based in data visualization only), we present the most interesting results:
First interesting correlation:
The conversion rate variable (x axes) presents a very similar behavior making a cross analysis with incomes and transactions (y axes). It’s a fact that higher the conversion rate, higher the incomes and transactions since this metric was created (by digital marketing experts) like this:
Conversion rate = (Objectives or transactions / Total visits) * 100.
In the case of our e-commerce site a objective fulfilled is a transaction. However, it is interesting to note that when we analyzed (in the first phase) the cross of variables with sessions over incomes and conversion rate, any of them presented any correlation behavior.
Second interesting correlation:
At this point, Have you noticed how conversion rate is a really interesting variable due all the correlations with other variables?
Another interesting finding was, once again, the conversion rate variable analyzed with page views per session and the average session duration.
If we analyze the graphs logically we would say that the people who visit more pages within the e-commerce site are because they are interested in one or more products at the same time, which makes them stay longer on the site, increasing the conversion rate and generating more income.
From the point of view of digital marketing these graphics are good signals since the content presented within the site is very relevant for the audience:“the more pages the users visit and stay longer within the site, the more transactions and income they do.”
Third interesting correlation:
A very important metric for e-commerce sites is the bounce rate variable, since this metric tells us if our site is relevant to our visitors or not.
The higher the bounce rate the less interaction with customers our e-commerce site is doing, generating fewer transactions and income.
In the graphs above, we can see how the behavior between income and transactions are almost identical when these variables are analyzed with the the bounce rate.
In this particular case, we see that the highest transactions and revenues are generated when the bounce rate is less than 30%.
With these techniques we’ve detected some important correlation between all our variables:
Income and transactions with Conversion Rate
Pages/Sessions and Avg. session duration with Bounce rate
At this point we assume that all these variables are correlated and are going to be useful when we design our Machine Learning model, but nevertheless in the Part 2 of this investigation we’re going to run statistical techniques to move further and confirm or deny if all those variables are correlated or not and which of them are going to help us in predict the incomes and/or sales of the e-commerce website.