New Users by month chart – BigQuery

One of the objectives we can set for a new marketing campaign, whether online or offline, may be to attract new users to our website or to acquire new customers for our company.

In this article, I’ll show you how to create a chart that tracks new users visiting our website. Keep in mind that the same code can be applied if we have a list of all the orders placed in previous years and we want to keep track of how many new customers we have had each month.

What do I mean by “new users”? In fact, it is possible to customize the underlying SQL query according to your own personal definition, but in this example I mean “users who have visited the ecommerce site in the last month, but have not done so in the previous 11 months”.

This means that if a user visited us 2 years ago and then disappeared into the cosmic void, only to reappear in the last month, they will be considered a “new user”.

Benefits:

  • In this way, it is possible to keep track of the results of activities carried out with the aim of attracting new users/customers.

Data:

The tools I used for this analysis are BigQuery and a few lines of Python to obtain the final chart.

New users in BigQuery – The function

The first step towards creating this report is to obtain the number of new users for a single month, and then repeat the same procedure for the previous months.

Precisely because of this “repetition”, I decided to use a User Defined Function in BigQuery. It is possible, in fact, to create functions, save them and then use them in a subsequent query.

Here is the complete function, we will break it down immediately afterwards.

CREATE OR REPLACE FUNCTION `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set INT64, date_last_day DATE) RETURNS INT64 AS (


(#all users
  with t1 as (
    select distinct PARSE_DATE("%Y%m%d", event_date) as data_ordine, user_pseudo_id as ID_Utente
    from `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.events_*`
  )


  select count(distinct t1.ID_Utente) as new_users_count
  from t1
  where t1.ID_Utente in (
    #users last month
    SELECT distinct ID_Utente
    FROM t1
    WHERE t1.data_ordine BETWEEN DATE_SUB(date_last_day, INTERVAL 1+off_set MONTH) AND DATE_SUB(date_last_day, interval off_set MONTH)
  )

  and t1.ID_Utente not in (
    #users of the 11 months before the previous month
    SELECT distinct ID_Utente
    FROM t1
    WHERE t1.data_ordine BETWEEN DATE_SUB(date_last_day, INTERVAL 12+off_set MONTH) AND DATE_SUB(date_last_day, INTERVAL 1+off_set MONTH))
  )
);

The parameters that the function expects as input are 2:

  • off_set -> it is an INT64, so it only accepts integers as a value. This variable is used to tell the function how many months to move back from the current date to start its calculations. Perhaps it is not easy to understand why this is necessary now, but trust me, it will be very clear later on.
  • date_last_day -> here we pass the date of the last day of the current month in DATE format

With the first query we save in a table, which I creatively named t1, all the users who have visited our website together with the date on which they did so.

We don’t need duplicates in this case, so we use a SELECT DISTINCT.

If we wanted to extract only the data related to a specific period, we can use the _table_suffix variable that is created when we use an asterisk in the FROM parameter, when we want to search multiple tables with a single query.

SELECT distinct PARSE_DATE("%Y%m%d", event_date) as data_ordine, user_pseudo_id as ID_Utente
FROM `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.events_*`
WHERE REGEXP_CONTAINS(_table_suffix, '^(2020|2021).*')

Here we tell BigQuery to give us the date and user code, without duplicates, of all users who have visited our website between 2020 and 2021.

Note:

In this example, I treat every GA4 event as if it were a ‘purchase’ for simplicity

If you want to learn more about what the _table_suffix variable is and how it can be used, here is the link to the Google documentation:

https://cloud.google.com/bigquery/docs/querying-wildcard-tables

Now let’s move on to the first subquery:

#users last month
SELECT distinct ID_Utente
FROM t1
WHERE t1.data_ordine BETWEEN DATE_SUB(date_last_day, INTERVAL 1+off_set MONTH) AND DATE_SUB(date_last_day, interval off_set MONTH)

Here we select only the users who visited the ecommerce site last month. To do this, we use the BETWEEN control, which tells us if t1.data_ordine, i.e. the date on which the user placed the order, is between the two dates we specify.

However, to make the chart more understandable to humans, we do not simply subtract 30 days from the current date to calculate the month, but we maintain the normal division of months throughout the year. This is where the previously defined parameter date_last_day comes into play. By always using end-of-month dates and the DATE_SUB function of BigQuery to subtract months from those dates, we are sure to maintain the calendar division between months.

The second parameter used in the DATE_SUB function is off_set, i.e. the number of months we want to shift our function back. Let me try to explain with the help of a diagram:

diagram function python

As we can see here, off_set is a sort of block that we put between the current end-of-month date and the date from which we want to start the function with its calculations. In this way, it is possible to always call the same function, but going back one month at a time.

Now let’s see the second subquery.

#users of the 11 months before the previous month
SELECT distinct ID_Utente
FROM t1
WHERE t1.data_ordine BETWEEN DATE_SUB(date_last_day, INTERVAL 12+off_set MONTH) AND DATE_SUB(date_last_day, INTERVAL 1+off_set MONTH))

The reasoning is the same as the previous subquery, only this time we pick up the users who have visited the website in the 11 months prior to the last month. We can see that the values of the second parameter passed to the DATE_SUB function are slightly different.

Finally, everything is taken up by the main query, where we select the user codes of users belonging to the first group (website visited in the last month) but not to the second (website visited in the 11 months prior to the last month).

The function is now complete, let’s now see how to call it from the query we will use to create the table with the final results.

New users in BigQuery – Final query

Now, to get the data for the chart, we need a table with the results of the SQL code seen above but related to the past 12 months.

Precisely because we created a function, we can call it 12 times but with the off_set value always different.

However, let’s also remember the second parameter, date_last_day. With the first line of code we create it using the LAST_DAY function, already present in BigQuery.

declare date_to_use date default LAST_DAY(CURRENT_DATE(), MONTH);

SELECT 
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(28,date_to_use) as month_0,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(29,date_to_use) as month_1,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(30,date_to_use) as month_2,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(31,date_to_use) as month_3,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(32,date_to_use) as month_4,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(33,date_to_use) as month_5,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(34,date_to_use) as month_6,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(35,date_to_use) as month_7,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(36,date_to_use) as month_8,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(37,date_to_use) as month_9,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(38,date_to_use) as month_10,
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(39,date_to_use) as month_11

The resulting table is as follows:

Note:

month_0 is the present month. Therefore, month_1 does not refer to January, but to the previous month for which the function started doing the calculations (therefore with offset_month=1)

Perfect, we have everything we need. Let’s move on to the final step.

New users in BigQuery – The chart

Now all we have to do is take any other tool to create the chart. In my case, I use a Python library. One small peculiarity here is given by “invert_xaxis()”. Precisely because we want to give the idea of going backwards with the months, we start the chart from the right instead of the left.

import matplotlib.pyplot as plt
plt.bar(range(12), result.iloc[0])
plt.title('Bar Chart')
plt.xlabel('Offset from current month')
plt.ylabel('New users')

plt.gca().invert_xaxis()

plt.show()

And here it is, in all its monochromatic beauty. Let’s imagine that we launched this query in August and pushed hard with strong summer discounts in a July campaign.

Thanks to this chart, we can see that the effect was actually there given the difference between July and June, but that already in August we are returning to the lower pre-marketing campaign values.

This is where the article should have ended, but in rereading it, I imagined all the people who have never seen the code behind this chart and who would therefore have a hard time understanding which months we are talking about. So I improved everything, always thanks to Python and its countless free libraries that allow us to do anything.

So let’s import the necessary libraries:

import calendar
from datetime import date
from dateutil.relativedelta import relativedelta
off_set_months = 28

Then let’s modify the query we pass to BigQuery through the API:

query = """

declare date_to_use date default LAST_DAY(CURRENT_DATE(), MONTH);
declare off_set_months int64 default """+str(off_set_months)+""";

SELECT 
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+1,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-1)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+2,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-2)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+3,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-3)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+4,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-4)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+5,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-5)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+6,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-6)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+7,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-7)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+8,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-8)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+9,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-9)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+10,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-10)).month])+""",
  `analyticstraining-358318.ga4_obfuscated_sample_ecommerce_copy.func_users`(off_set_months+11,date_to_use) as """+str(calendar.month_name[(date.today()+relativedelta(months=-off_set_months-11)).month])+"""



        """

result = client.query(query).result().to_dataframe()

What are the changes?

First of all we see how the variable off_set_months, called in the same way In SQL and in Python, is now initialized in Python and then passed to BigQuery inserting it inside the query manually. Why this? Because having that number available in Python allows us to do something very interesting that otherwise would not be possible in SQL: give dynamic names to columns.

We see in fact that the column names are no longer “month_0, month_1” etc, but are dynamically created by Python through this matryoshka of functions.

calendar.month_name[(date.today()+relativedelta(months=-off_set_months)).month]

It sounds complicated but it really isn’t.

To explain what this line of code does, let’s start from what should be the final goal: without remembering the off_set that we have inserted (or that another guy from the IT department has inserted) we must be able to immediately understand what months the graph refers to. In fact this is possible since Python and SQL work directly with dates.

First I use date.today()+relativedelta(months=-off_set_months) to get the date the SQL query is pointing to and then I use the method “.month” in order to get the number of that month.

It might be enough, but with a simple library we can print the name of that month, so that the graph is even more readable. So I use the calendar.month_name function.

Let’s see the final result:

import matplotlib.pyplot as plt
plt.bar(result.columns, result.iloc[0])
plt.title('Bar Chart')
plt.xlabel('Offset from current month')
plt.ylabel('New users')

plt.gca().invert_xaxis()
plt.xticks(rotation=45)

plt.show()

Now this graph is much easier to read💪

The article ends here, this time for real. I hope you enjoyed this reading. For any feedback or questions, I leave you to the comments area.

Lascia un commento