Chief Data and Analytics Officers Mexico 2019

On May 29-30, 2019 was carried out the CDAO (Chief Data and Analytics Officers) Mexico 2019 and I had the privilege to attend this amazing summit.

CDAO Mexico 2019

The CDAO Mexico is the most Senior Event for the Growing Data and Analytics Community in Mexico, as described by Corinium Connected Thinking (the organizer).

In CDAO Mexico 2019, we saw diverse topics related to Data and Analytics, from databases and structure of data to Machine Learning and Artificial Intelligence

Of course, the most important and relevant topics for the audience was A.I. and Machine Learning .

Talk about this amazing topic, let us the opportunity to discuss more in fact about, What is in real manner A.I and ML? Because most of the time those terms are being misused in the industry.

A.I. proposed by Adobe

A.I. from IBM – Watson

In personal, one of the most interesting discussions was that of John J. Thomas, a distinguished Engineer & Director at IBM Data and AI, because he presented the challenges he has overcome working with A.I.

He mentioned that not always being an expert coding in Python for example, we’re going to develop great Machine Learning and A.I. algorithms.

What do you think?

In personal I love code in Python and R to create my Machine Learning models.

Of course we have great companies like Amazon Web Services, IBM – watson, Microsoft Azure Machine Learning and Google Cloud Machine Learning to help us in develop models in a higher scale, but at the moment I feel really comfortable coding my Machine Learning Models with Python 🙂

Another one of the biggest and most mentioned topic in Chief Data & Analytic Officer Mexico 2019 was the Digital Transformation.

Digital Transformation is a big concept that everyone like to use nowadays but in essence What is digital transformation?

Most of the companies think that digital transformation is only jump in the digital marketing and focus in the new consumer behavior (adopting new technologies), but those big fields are only part of the digital transformation ecosystem.

For me Digital Transformation is like this picture proposed by Microsoft News Center:

You have to work in this 4 pillars to run a successful Digital Transformation and have clear that Digital Marketing is only a part of the process.

The Chief Data and Analytics Officers Mexico 2019 for me, was a really great experience. I had the opportunity to talk with collegues of other industries and to make networking with strategic partners like:

I hope the event in 2020!

Data Day in ISDI Digital Business School

Given the current importance of the analysis of data, the last April of 2019,

DAP – Data Analytics Programs ISDI

The Data Analytics Programs in ISDI, are a series of Masters programs that includes the analysis, use and management of the data with a strategic and business vision.

In these programs you see various important topics in Data Analytics like:

  • Strategy in Analytics
  • Digital Marketing Analytics
  • Fundamentals
  • Tools and Business Analytics
  • Big Data & Cloud
  • Artificial Intelligence

Obtaining, analyzing and using data for decision making

Given the new needs of the market and all industries today, ISDI created the analytical portfolio consisting of two Masters with a methodology that allows its students to position themselves with a high professional competitive advantage and the necessary skills to cope with changes in this era.

Here I left you the full interview that ISDI did to me after my Masterclass:

Thank you so much to ISDI Digital Business School to let dataismm.ai and myself be part of this great experience raising awareness of the importance of data science to future students.

More info:

https://www.isdi.education/es/master-data-analytics-programs-mexico

Data Science and Machine Learning – Google Analytics

Data Science and Machine Learning with e-commerce

If you’ve worked with digital marketing tools like Google Analytics, maybe you have noticed that the current outputs of data provided by GA (Google Analytics) are unstructured data or data not ready to be processed directly in Machine Learning and Data Science tools.

In this first investigation of 2 articles, we pretend to apply Data Science and Machine Learning techniques with Google Analytics data from an e-commerce website.

The purpose of the e-commerce website is to find insights related with the incomes and sales

So, let’s get started!

Step 1.- Origin of Data

We’re going to work with a CSV file extracted directly from Google Analytics 360.

To have a good data granularity, we’re going to download the activity of users daily, from the origin of the e-commerce website to YTD

The variables to use are going to be the next ones:

  • Date
  • Sessions
  • Users
  • New Users
  • Bounce Rate
  • Pages / Sessions
  • Avg. Session Duration
  • Conversion Rate
  • Transactions
  • Avg. Order
  • Revenue – Dependent variable or Objective Variable

So, our raw CSV file is going to look like this (Sorry, the CSV file is in Spanish) :

No alt text provided for this image

At this point, I did a little data pre-processing process directly in the CSV file, converting the % variables into integers (removing the % sign), in the bounce rate and conversion rate columns.

No alt text provided for this image

Those simple changes are going to be really helpful when we use this CSV file like input in our Data Science and Machine Learning tools.

We’re ready to move on into the FIRST step of our Data Science process, the data visualization and exploratory analysis

Step 2.- Data visualization & Exploratory data analysis

The data visualization process is fundamental because help us to interpreter in a better way our data and understand the significance of variables.

Navigating in the science of data visualization we can detect patterns, trends and correlations that might be undetectable if we analyze our data directly in a CSV file reader software (Microsoft Excel, generally).

In our investigation we’re going to work with 2 software’s that we recommend you:

GlueViz or simply call Glue is a Python library to explore relationships within and between related data sets

https://www.cs.waikato.ac.nz/~ml/index.html

Weka is a suite of machine learning software written in Java, developed at the University of WaikatoNew Zealand.


Here I left you 2 videos related with this 2 amazing tools:

Glue (Linked – View Visualization in Python)

Getting Started with Weka – Machine Learning Recipes #10


Let’s move on!

In this first crossing variables process, I decided to use the variable Sessions with our O.V (Objective variable = Incomes) trying to identify some patterns in the data points.

Why use Sessions and Incomes in this first crossing variables?

In the e-commerce universe sometimes the digital experts express the idea that Sessions are related with the Incomes, so they’ve created the rule: higher Sessions higher Incomes

That’s true?

Here I present you some data visualizations done in Glue:

Only to remember, each data point is an activity done within the e-commerce website by one person.

Analyzing the 4 charts above, 2 of them called my attention:

1.- The Sessions vs Avg.Order has a particular behavior; We can see how the most occurrences of Avg. Order value happen when the lower average of Sessions occur. If we make a zoom in the chart, we can see in this particular case that higher Session are not necessarily generating higher Avg. Orders.

2.- The second interesting chart is Sessions vs conversion rate. In this amazing representation of each data point, we can see how all most occurrences are right-skewed, were the lower average of Sessions occur.

Also in the section of lower average of Session we can see high conversion rates values (circle):

If we put a line in the y-axes (at 1.09% of conversion rate) and we compare the number of data points between low and high Session:

27 occurrences for the purple block (high Sessions) vs 30 occurrences in the blue block(low Session)

In this particular case we have more occurrences of conversion rate higher than 1.09% in the section of low Sessions, giving us as a result that higher Session not necessarily generate high conversion rates.

For those of you who want to know more about conversion rate here I left you a definition from Wikipedia:

https://en.wikipedia.org/wiki/Conversion_marketing

Now, it’s time to move further in exploratory data analysis:

During a lot of data correlation exercises done in GlueViz (in assumption, based in data visualization only), we present the most interesting results:

First interesting correlation:

The conversion rate variable (x axes) presents a very similar behavior making a cross analysis with incomes and transactions (y axes). It’s a fact that higher the conversion rate, higher the incomes and transactions since this metric was created (by digital marketing experts) like this:

Conversion rate = (Objectives or transactions / Total visits) * 100.

In the case of our e-commerce site a objective fulfilled is a transaction. However, it is interesting to note that when we analyzed (in the first phase) the cross of variables with sessions over incomes and conversion rate, any of them presented any correlation behavior.

Second interesting correlation:

At this point, Have you noticed how conversion rate is a really interesting variable due all the correlations with other variables?

Another interesting finding was, once again, the conversion rate variable analyzed with page views per session and the average session duration.

If we analyze the graphs logically we would say that the people who visit more pages within the e-commerce site are because they are interested in one or more products at the same time, which makes them stay longer on the site, increasing the conversion rate and generating more income.

From the point of view of digital marketing these graphics are good signals since the content presented within the site is very relevant for the audience: “the more pages the users visit and stay longer within the site, the more transactions and income they do.”

Third interesting correlation:

A very important metric for e-commerce sites is the bounce rate variable, since this metric tells us if our site is relevant to our visitors or not.

The higher the bounce rate the less interaction with customers our e-commerce site is doing, generating fewer transactions and income.

In the graphs above, we can see how the behavior between income and transactions are almost identical when these variables are analyzed with the the bounce rate.

In this particular case, we see that the highest transactions and revenues are generated when the bounce rate is less than 30%.

Well, so far we’ve done the first step of a Data Science process, the Data visualization andExploratory data analysis.

With these techniques we’ve detected some important correlation between all our variables:

Income and transactions with Conversion Rate

Pages/Sessions and Avg. session duration with Bounce rate

At this point we assume that all these variables are correlated and are going to be useful when we design our Machine Learning model, but nevertheless in the Part 2 of this investigation we’re going to run statistical techniques to move further and confirm or deny if all those variables are correlated or not and which of them are going to help us in predict the incomes and/or sales of the e-commerce website.

See you there!