Started week 8 with new courses in the digital analytics Mini degree course.
The name of the course in this mini degree program was Introduction to the Google Cloud Platform and Google BigQuery.
Start exploring Google BigQuery (without entering your credit card), using Sandbox and public datasets. And learn how to build your data schema to prepare your own BigQuery data structure.
Topics covered are:
1) Examine the free BigQuery trainings available through Google Cloud
2) Learn to use free BigQuery features for your SQL and data management skills
3) Build your own data structure for your future BigQuery data warehouse
Thanks to this course, you will be able to know how to consolidate your data in one place, to automate your marketing reports, to learn how to be a better data-drive marketer, to learn more about Google Cloud solutions, and to discover the full analytics process, from data collection, to data visualization. So, let’s examine the journey of this course. So, we will start on how to get all your data into BigQuery, how to work inside BigQuery, and then how to build the marketing reports based on your BigQuery data. During this lesson, you will review the analytics structure and BigQuery place in it. You will create and navigate your Google Cloud account. You will use BigQuery SandBox and public datasets. You will discover the BigQuery data structure and BigQuery Interface. During this lesson, you will also be able to build your data schema to start working with BigQuery using your company data. So, let’s talk about data analytics infrastructure. Usually, if you work with your analytics tool, like Google Analytics or Adobe Analytics, usually, you have only two parts. You have data collection and you have data visualization. You cannot interfere on how your data exactly is collected, what happens in the data warehouse where your data is stored, and what happens afterwards. Well, you basically have the first process and the last process and you cannot interfere in other parts of the analytics process. So, with Google BigQuery, you are able to collect the data, consolidate it in one place, have a data warehouse, then work with this data to use some formula, some functions to transform this data to what you need. And then, you can visualize this data. So, you can find plenty of different schema of data analytics structure in BigQuery. But, basically, it’s just you have BigQuery in the middle. You can put every data you can imagine to BigQuery. And then, you can retrieve only the data you need, combining different tables, dataset, etc. And, you can visualize this data to do your marketing analytics.
In this lesson, we are going to learn about the different possibilities of getting your data into BigQuery, set up a manual or automatic import to BigQuery, and learn a little bit about data flow or data transfer tools .Let’s discover some Google services that can help you to get your data into BigQuery. So first, this is a Google Data Transfer tool. You can get your data from Google Storage, Google Ads, YouTube, and other tools to Google BigQuery automatically. So you will need to configure it exactly in the BigQuery interface, and we will see during our screencast how exactly you can do it. Another option, you can use Google Cloud Function to automatically transfer your data from Google Storage to Google BigQuery. Of course, to get your data into Google Storage, to get your files into Google Storage, you will need to ask your developers to send automatically the files into Google Storage. There is also a Google Dataflow. We will not cover it fully during our lesson or during this course because it’s more for streaming data. It means when you need to pool a lot of data every time something happens, like data from web application from mobile application, for example, you will need to configure a dataflow. This is a pipeline that says, hey, we got some data there, we will take it and then we will send it to BigQuery. So you can load the data to BigQuery manually, but this is not an option because our main goal is to automate our reports. So to load data automatically to Google Cloud Storage or to Google BigQuery, we will need to use an API.API basically is the language which you can use to speak to BigQuery. So for example, connecting to BigQuery API, your developers can send automatically the files to BigQuery or to Cloud Storage. So when you’re sending the data to BigQuery or to Cloud Storage, there are two options: send it in batch, so like one time per week, one time per day, one time per hour, and there’s streaming option. Streaming option is like you send an event every time something happens. We will not cover the streaming option in this lesson because it’s quite complicated for this level. So basically, if you’re a marketer or digital analyst, usually, you will need a batch option. It will be quite sufficient. The data transfer option, you will find it in BigQuery interface. As I said previously, it gives you a possibility to connect to some of the Google tools, some of the Amazon tools as well, to transfer your data automatically to Google BigQuery. We will see how it works on our screen shares. So when we talk about BigQuery Data Transfer, it’s quite simple. You will see in the BigQuery interface, you can configure the sending, automatic sending of data from Cloud Storage or Google Ads, YouTube, and some other tools, as well as some Amazon tools. You can configure automatically the data transfer from these tools to Google BigQuery. It’s quite simple so you can play with it with your data transfer in your BigQuery interface. If I’m not mistaken, you will pay like two euros per Google Ads transfer or YouTube transfer. Some of the transfers are free. For BigQuery data streaming, you can develop your own solutions. You can use some created solutions like OWOX, Stitch Data, Fivetran, and others. You can also use some Google solution as Dataflow, but still you will need to develop something from your site to do the streaming of the data, even through the Dataflow. Some of the use cases for data streaming is mobile applications. There are plenty events that happen in mobile applications. You will need to stream this data every time something happens. You will need to try this data in the real time. So this is one of the use cases where, well, you will need the streaming of the data. Usually, Google Analytics data is streamed to Google BigQuery as well because you will want to analyse your Google Analytics data practically in real time, so the Google Analytics data is streamed. If you want to go further with streaming data, you will need to check on some tools like Pub/Sub or Dataflow. These are Google Cloud tools. Pub/Sub is basically a tool that receives messages from an application and publish these messages to another Google Cloud tools like Dataflow. Dataflow takes this information and creates a pipeline that gives you the possibility to transfer correctly your streaming data to Google BigQuery. I wanted to share with you a real use case on how you can use your Google BigQuery data in your marketing reports.
You will learn some basic SQL language, and you will also learn how to prepare your data for analysing it in your reports. SQL, or SQL, is a language, programming language that gives you the possibility to communicate with your data. It’s quite straightforward, this language. Basically, if you need an example, you will usually see that you have to put SELECT some data, for example, you put the name of the columns, FROM this table. So this is just an example over an SQL query. It’s quite simple if you know the English language. In this lesson, we will learn some basic SQL functions, like SELECT, SELECT EXCEPT, ORDER BY,GROUP BY, WHERE, et cetera. The important thing in SQL language is JOIN. So JOINs gives you the possibility to join your tables in order to retrieve some data that you need from several tables. For example, LEFT JOIN, it means that you take the data from the first table, and then you add the data from the second table but only the data that is present in the first table. When you have a RIGHT JOIN, so it’s quite the same, it’s just that you prioritize the right table, the B table. You will only retrieve the data that is present in B table and the data that corresponds to it in the A table. When you use a FULL OUTER JOIN, you will combine the two tables, and it doesn’t matter if you don’t have some data in the A table, or you don’t have some data in B table. It doesn’t matter. You will join the two tables anyway. I will give you a real use case example of a JOIN. For example, I have my user IDs in my Google Analytics data, and I have the visits, the information about this user’s ID in my Google Analytics data. I have this user’s ID in my BackOffice data as well, but some people, they block Google Analytics sometimes, so I will have more user IDs in my BackOffice data than in my Google Analytics data. So when I combine this data, I will use LEFT JOIN because I will prioritize my BackOffice data and I will retrieve the data from Google Analytics only where user IDs are available and only for user IDs that correspond to my BackOffice data. So then I can say that this real person who visited my website, Buddy House, came from Google Ads, for example. If you’re used to Google Analytics or Adobe Analytics, we can compare, for example, BigQuery queries with Google Analytics segments. Basically you go to your data, and you filter only the data that you need, like in segments, you will filter it by saying, I want to know only visitors from this country.
And if ever Google want to, delete your Google Analytics, your property for any reasons, you will be sure to have your data stored in Google BigQuery. Google Analytics 360, gives you the possibility to store your data in Google BigQuery. If you have a Google Analytics standard version, you will need to use external connectors to put your data from Google Analytics to BigQuery. You can use some of them like a works super metrics, Fivetran, stitch or others. For this lesson, I choose works. But you can always choose another one if you want. So partition tables. These are not used only for Google Analytics, but for every data, you can always store your data in partition tables. It means that BigQuery, are using your injection time or data column. It starts a table like sub tables, like segments, divided per date. So you have one table, but you have this one table segmented by date. So this is a partition table. So whenever you want to query data for example, last 30 days, it will choose only the 30 last tables and you’ll retrieve the data from them. This is made for making a BigQuery query cheaper, and more performer. Nested fields. So the advice usually given by a BigQuery expert is that you have to denormalize your data. It means that in your usual database, you will have a one table for addresses, one table for orders, etc. But the BigQuery recommendation is to have, all data in one place. So the nested fields, is the fields that are repeated and nested in one record. For example, you can have visits, you can have information about every visit. But in every Google Analytics visit, you have several hits, you have page views, you have events, etc. So this events, this hits pages, etc, they are stored as nested fields, for this particular session. So let’s dive in, and let’s do some queries with your Google Analytics data, using Google Analytics sample in the public data sets. Let’s learn how to work with a Google Analytics data in your Google BigQuery. So we can, we can use Google Analytics data set, which you can find in the public data set. If you search for Google Analytics, you will see that there’s a Google Analytics sample data. And this is a Google merchandise store, or which gives you the access to the BigQuery data, which you can explore. If you click on view data set. You can check here, if you search for, analytics. You can see here Google Analytics sample data set. So let’s explore this data set, which is interesting about this one, is that, it built with partition table. It means that these are the tables segmented by the date of injection or by the date, indicated in the date column in the table. So, you will need to work with partition tables in some of your data sets. Which is interesting, for this data set is that, there are plenty of nested fields.
To create your reports in Google Spreadsheet, you will have two possibilities. You can use a natural data connector with BigQuery that is available only for G Suite user. So, if you have a paid version of Google Drive, you will see this available and you can connect to BigQuery directly from your Google Spreadsheet like this. So, you will click on Connect to BigQuery. You will select a project for which the billing is activated. And then, you will see the query editor where you can edit a query. And then, click on Insert results and you will see the results here. Here it is. So, whenever you see this icon, it means that this data is from BigQuery. So, now you have your data in Google Sheet and you can reuse it to create monthly reports like this one. Here, I have the data from Google Analytics directly. This one is from Google Analytics as well. And, this one is from Google BigQuery, not from Google BigQuery directly, but from these Google Sheets that are connected to BigQuery. So, I will say, hey, some O, the Contacts from where the brand is Brand1when the month is January because in my data, I have months that is equal to January is one, February is two, and so on, so forth. And, this hits, so I put a formula and I retrieve the data. So, for every first month, it will calculate all the data here. So, here, I indicated the real brand name, so I have the data. So, it’s simple as this. The disadvantage of the natural data connector from Google Sheet to BigQuery is that it is only available for G Suite users, so if you have a free version of Google Drive, it’s not possible. And, the second is that you cannot really schedule the queries in this data connector. So, it is better now if I’m not mistaken, so maybe Google will add this possibility, but for now, you will have to actualize your queries every time you need new data in your Google Sheet.