AI Content Chat (Beta) logo

Google Cloud Manual

1 2 3 4 5 6 Getting Started 7 8 With Google Cloud 9 10 11 12 13 14 15 16 17 18

1 2 Cloud OnBoard 3 4 { 5 (’Module 1’) Introducing Google Cloud Platform Page 1 - 11 6 (’Module 2’) Compute & Storage Fundamentals Page 11 - 21 7 (’Module 3’) Data Analysis�on the Cloud Page 22 - 35 8 (’Module 4’) Scaling Data Analysis Page 35 - 49 9 (’Module 5’) Machine Learning Page 50 - 70 10 (’Module 6’) Data Processing Architecture Page 70 - 79 11 Summary | Continue learning with Google Cloud Page 79 - 85 12 13 14 } 15 16 17 18

Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 Introducing 7 7 8 8 9 Google Cloud Platform: 9 10 10 11 Big Data and Machine Learning 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 Version #1.1 16 17 Big Data & Machine Learning 1 2 Agenda 3 5 What is Google Cloud Platform 6 7 Google Cloud Big Data products 8 9 10 11 12 13 14 15 16 17 1

Big Data & Machine Learning Cloud OnBoard 1 2 1 Cloud computing is a continuation of a long-term 3 shift in how computing resources are managed 2 5 3 First Generation 6 Cloud Virtualized 5 7 data centers 6 First Wave 8 You don’t rent hardware and 7 Server on-premises space, but still control and configure virtual 9 You own everything. machines. Pay for what 8 you provision. It is yours to manage. 10 9 11 2000 Next 10 12 11 13 1980s Now 12 14 13 Second Wave Third Wave Data centers Managed service 15 14 You pay for the hardware Completely elastic storage, 16 15 but rent the space. processing, and machine Still yours to manage. learning so that you can 1716 invest your energy in great apps. Pay for what you use. 17 18 Big Data & Machine Learning 1 2 Agenda 3 5 What is Google Cloud Platform 6 7 Google Cloud Big Data products 8 9 10 11 12 13 14 15 16 17 2

Big Data & Machine Learning Cloud OnBoard 1 2 1 3 2 Google’s mission is to organize 5 3 6 5 the world’s information and make 7 6 8 it universally accessible and 7 9 8 useful 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 3 2 To organize the world’s 5 3 6 5 information,Google has been 7 6 8 building the most powerful 7 9 8 infrastructure on the planet 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 3

Big Data & Machine Learning Cloud OnBoard 1 2 In terms of hardware, Google Cloud has the largest cloud network, with 1 over 100 points of presence, and 100,000s of miles of fiber optic cable. 3 2 5 3 6 5 FASTER (US, JP, TW) 2016 7 6 8 7 9 8 10 9 11 Unity (US, JP) 2010 10 12 PLCN (HK, LA) 2019 11 SJC (JP, HK, SG) 2013 13 12 14 Monet (US, BR) 2017 13 Network 15 Network sea cable investments 14 Junior (Rio, Santos) 2017 16 Edge points of presence >100 15 Tannat (BR, UY, AR) 2017 1716 Edge node locations >1000 Indigo (SG, ID, AU) 2019 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 The network connects 15 regions, 1 with 3 more coming 3 2 5 3 6 3 Finland 5 Netherlands 7 2 2 6 London 3 Oregon 3 3 3 Frankfurt 8 3 7 Iowa 4 3 Montreal Belgium 9 3 Los Angeles 3 3 N Virginia 8 Tokyo 3 S Carolina 10 9 HongKong 3 3 Taiwan 11 33 10 Mumbai 12 11 2 Singapore 13 12 14 13 Future region and 3 15 number of zones 14 3 Sydney Current region and São Paulo 16 15 number of zones 1716 17 18 4

Big Data & Machine Learning Cloud OnBoard 1 2 1 In terms of software, organizing the world’s information 3 has meant that Google needed to invent data processing methods 2 5 3 6 5 7 Flume 6 8 MapReduce Dremel Millwheel TensorFlow 7 9 8 10 9 GFS Megastore TPU 11 Pub/Sub 10 Bigtable Colossus 12 Spanner 11 F1 13 12 14 13 15 14 16 2002 2004 2006 2008 2010 2012 2014 2016 2018 15 1716 http://research.google.com/pubs/papers.html 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Google Cloud opens up that innovation and infrastructure to you 3 2 5 3 6 5 7 6 Dataflow 8 7 Dataproc BigQuery Dataflow ML Engine Auto ML 9 8 10 9 11 Cloud Storage Datastore 10 Bigtable Cloud Storage 12 Pub/Sub Cloud Spanner 11 13 12 14 13 15 14 16 2002 2004 2006 2008 2010 2012 2014 2016 2018 15 1716 17 18 5

Big Data & Machine Learning Cloud OnBoard 1 2 1 A suite of products that can be put together for data processing 3 2 5 3 Data-handling 6 Foundation Databases Analytics and ML frameworks 5 7 6 8 7 Compute Cloud BigQuery Cloud Pub/Sub 9 Engine Spanner 8 10 9 11 Cloud Cloud 10 Storage Cloud SQL Datalab Cloud Dataflow 12 11 13 12 Cloud ML APIs Cloud Dataproc 14 Bigtable 13 15 14 16 15 1716 ... 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Spotify illustrates the typical journey of companies that come to 3 Google Cloud: From lower costs to increased reliability to business 2 5 transformation 3 6 5 7 Spend less 6 No-ops, Pay 1 8 7 for use, Secure 9 8 10 9 11 Flexible 10 Complete 2 12 11 13 12 14 13 15 Innovative 14 Powerful 3 16 15 1716 17 18 6

Big Data & Machine Learning Cloud OnBoard 1 2 1 A suite of products that can be put together for data processing 3 2 5 3 6 Improve scalability 5 Change where you compute Change how you compute 7 and reliability 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Atomic Fiction lowered their costs with per-minute 3 (now per-second) billing 2 5 3 6 5 7 6 8 7 9 8 10 9 11 Change where you compute 10 12 11 13 12 14 13 15 14 16 15 1716 https://www.youtube.com/watch?v=mBY-RjE15WA 17 18 7

Big Data & Machine Learning Cloud OnBoard 1 2 1 FIS was able to improve reliability and scalability 3 on a massive data-processing challenge 2 5 3 6 5 7 6 8 7 1.7 GIGs 10 BN 1.7 GIGABYTES 9 6 BILLION 8 PER SECOND WRITTEN PER SECOND 10 MARKET EVENTS 9 PER HOUR 10 TERABYTES 11 BURSTS 10 WRITTEN PER HOUR 6 TBs PER HOUR 12 PER HOUR 11 13 12 14 13 15 14 The Consolidated Audit Trail (CAT) is a data repository of all equities and options 16 orders, quotes, and events; FIS processed the CAT to organize 100 billion market events 15 into an “order lifecycle” in a 4-hour window using Cloud Bigtable. 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Rooms to Go transformed its business with data and machine learning 3 2 5 3 6 5 7 6 8 completely 7 Google Analytics 9 Rooms Premium designed room 8 to Go Collect landing pages, Combine 10 data views data packages 9 11 BigQuery 10 Analyze 12 11 13 12 CRM 14 13 Customer Relationship Manager 15 customer demographics, past purchases 14 16 15 1716 https://www.thinkwithgoogle.com/case-studies/rooms-to-go-improves-the-shopper-experience.html 17 18 8

Big Data & Machine Learning Cloud OnBoard 1 2 1 In summary, Google Cloud offers you ways to… 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 Spend less Incorporate real- Apply machine Become a truly 11 13 on ops and time data into learning broadly data-driven 12 14 administration apps and and easily company 13 architectures 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 7 7 8 8 Module Review 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 9

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Google Cloud Platform is: 6 5 (select all of the correct options) 7 6 8 7 9 Operated by Google on the same Most cost-effective if you pre- 8 10 infrastructure it uses purchase instances on a yearly 9 basis 11 10 12 11 A set of modular services from A platform on which to host 13 which you can compose cloud-based scalable and fast distributed 12 14 applications applications 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Google Cloud Platform is: 6 5 (select all of the correct options) 7 6 8 7 9 Operated by Google on the same Most cost-effective if you pre- 8 10 infrastructure it uses purchase instances on a yearly 9 basis 11 10 12 11 A set of modular services from A platform on which to host 13 which you can compose cloud-based scalable and fast distributed 12 14 applications applications 13 15 14 16 15 1716 17 18 10

Big Data & Machine Learning 1 2 Resources 3 Google Cloud Platform https://cloud.google.com/ 5 6 Datacenters https://www.google.com/about/datacenters/ 7 8 Google IT security https://cloud.google.com/files/Google- 9 CommonSecurity-WhitePaper-v1.4.pdf 10 Why Google Cloud https://cloud.google.com/why-google/ 11 Platform? 12 13 Pricing Philosophy https://cloud.google.com/pricing/philosophy/ 14 15 16 17 Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 7 7 Compute & Storage 8 8 9 9 Fundamentals 10 10 11 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 Version #1.1 16 17 11

Big Data & Machine Learning 1 2 Agenda 3 5 CPUs on demand + Demo 6 7 A global filesystem + Demo 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Google Cloud provides an earth-scale computer 3 2 5 3 6 5 Networking 7 6 Data storage 8 7 9 8 10 9 11 10 12 11 13 12 Compute power 14 13 15 14 16 15 1716 17 18 12

Big Data & Machine Learning Cloud OnBoard 1 2 1 Custom/changeable machine types, preemptible machines, 3 2 and automatic discounts lead to simplicity and agility 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Create a Compute Engine instance 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 13

Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo : Create a Compute Engine Instance 3 2 5 3 6 5 In this demo, we will : 7 6 8 7 1. Create a Compute Engine instance 9 8 10 9 2. SSH into the instance 11 10 12 11 3. Install the software package git 13 12 (for source code version control) 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 2 Agenda 3 5 CPUs on demand + Demo 6 7 A global filesystem + Demo 8 9 10 11 12 13 14 15 16 17 14

Big Data & Machine Learning Cloud OnBoard 1 2 4 1 Use Cloud Storage for persistent storage and as staging 3 Load 2 ground for import to other Google Cloud products 5 3 6 5 7 6 1 2 3 Cloud SQL 8 7 Ingest/ Extract Transform Store/Stage 9 8 10 9 11 10 12 BigQuery 11 13 Cloud Storage 12 Compute 14 Engine + Disk Raw data (any format) 13 15 14 16 15 Dataproc 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Create a bucket and copy the data over using the Cloud 3 2 SDK; blobs are referenced through a gs://.../ URL 5 3 6 Google Cloud Platform Project 5 7 6 8 Bucket Bucket 7 9 Copy 8 10 9 11 10 Objects Objects 12 11 Data and Data and 13 metadata metadata 12 14 13 15 14 16 15 1716 gsutil cp sales*.csv gs://acme-sales/data/ 17 18 15

Big Data & Machine Learning Cloud OnBoard 1 2 1 Cloud Storage gives you durability, 3 2 reliability, and global reach 5 3 6 Control access at project, 5 bucket and/or object level 7 6 Publish 8 Transfer Services 7 9 are useful for ingest 8 10 9 11 10 Ingest Store Import 12 Cloud 11 Storage 13 12 14 Use Cloud Storage 13 Compute as staging area 15 14 Engine Cloud 16 SQL 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Control latency and availability 3 2 with zones and regions 5 3 6 Choose the closest Distribute your apps 5 7 zone/region so as and data across zones 6 to to reduce latency. to reduce service 8 7 disruptions. 9 8 10 9 11 Region: North America Region: Europe Region: ... 10 12 Zone: us-central1-a Zone: europe-west1-b Zone: ... 11 13 ... ... ... 12 14 13 15 14 16 15 Distribute your apps and data across 1716 regions for global availability. 17 18 16

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Interact with 9 9 10 10 11 Cloud Storage 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo : Interact with Cloud Storage 3 2 5 3 6 5 In this demo, we carry out the steps of an ingest- 7 transform-and-publish data pipeline manually 6 8 7 9 1. Ingest data into a Compute Engine instance 8 10 9 11 2. Transform data on the Compute Engine instance 10 12 11 13 3. Store the transformed data on Cloud Storage 12 14 13 15 4. Publish Cloud Storage data to the web 14 16 15 1716 17 18 17

Big Data & Machine Learning Cloud OnBoard 1 2 1 Ingest-Transform-Publish 3 Step 4 2 using core infrastructure 5 3 6 5 7 6 8 Step 1 Step 2 Step 3 7 Publish 9 8 10 9 11 10 12 Ingest/ Import 11 Store 13 Extract Cloud 12 Storage 14 13 15 14 Compute 16 15 Engine Cloud 1716 SQL 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Cloud Shell gives you an easy command-line 3 2 5 3 Click 6 5 7 6 8 7 9 8 10 9 11 10 Do Now 12 11 13 12 14 13 Cloud Shell comes pre-installed with the tools, libraries, 15 14 and so on you need to interact with Google Cloud Platform 16 15 1716 17 18 18

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 7 7 8 8 Module Review 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (1 of 2) 3 2 5 3 6 Compute nodes on GCP are: 5 7 6 (select the correct option) 8 7 9 8 10 ❏ Allocated on demand, and you pay for the time that they are up. 9 11 10 ❏ Expensive to create and teardown 12 11 ❏ Pre-installed with all the software packages you might ever need. 13 12 14 13 ❏ One of ~50 choices in terms of CPU and memory 15 14 16 15 1716 17 18 19

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review answers (1 of 2) 3 2 5 3 6 Compute nodes on GCP are: 5 7 6 (select the correct option) 8 7 9 8 10 ➔ Allocated on demand, and you pay for the time that they are up. 9 11 10 ❏ Expensive to create and teardown 12 11 ❏ Pre-installed with all the software packages you might ever need. 13 12 14 13 ❏ One of ~50 choices in terms of CPU and memory 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (2 of 2) 3 2 5 3 6 Google Cloud Storage is a good option for storing data that: 5 7 6 (select all of the correct options) 8 7 9 8 10 ❏ Is ingested in real-time from sensors and other devices 9 11 10 ❏ Will be frequently read/written from a compute node 12 11 ❏ May be required to be read at some later time 13 12 14 13 ❏ May be imported into a cluster for analysis 15 14 16 15 1716 17 18 20

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (2 of 2) 3 2 5 3 6 Google Cloud Storage is a good option for storing data that: 5 7 6 (select all of the correct options) 8 7 9 8 10 ❏ Is ingested in real-time from sensors and other devices 9 11 10 ❏ Will be frequently read/written from a compute node 12 11 ➔ May be required to be read at some later time 13 12 14 13 ➔ May be imported into a cluster for analysis 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 2 Resources 3 5 Google Cloud Platform https://cloud.google.com/compute/ 6 Datacenters https://cloud.google.com/storage/ 7 8 Pricing https://cloud.google.com/pricing/ 9 10 Cloud Launcher https://cloud.google.com/launcher/ 11 Pricing Philosophy https://cloud.google.com/pricing/philosophy/ 12 13 14 15 16 17 21

Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 7 7 Data Analysis 8 8 9 9 on the Cloud 10 10 11 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 Version #1.1 16 17 Big Data & Machine Learning 1 2 Agenda 3 5 Stepping stones to transformation 6 7 Your SQL database in the cloud + Demo 8 Managed Hadoop in the cloud + Demo 9 10 11 12 13 14 15 16 17 22

Big Data & Machine Learning Cloud OnBoard 1 2 1 Google Cloud Platform began in 2008, with App Engine, 3 2 a serverless way to run web applications 5 3 6 5 7 6 8 7 9 8 App Engine 10 9 11 10 2 Your code 12 11 13 Upload 12 14 1 3 13 15 Develop Autoscales Reliable 14 16 15 1716 http://googleappengine.blogspot.com/2008/04/introducing-google-app-engine-our-new.html http://googleappengine.blogspot.com/2013/05/the-google-app-engine-blog-is-moving.html 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 3 2 5 App Engine 3 6 5 App Engine 7 Flex 6 8 7 Container 9 Engine 8 10 9 11 10 Compute 12 Engine 11 There [was] something fundamentally 13 12 wrong with what we were doing in 2008 14 13 … We didn't get the right stepping 15 stones into the cloud … 14 16 -- Eric Schmidt, Executive Chairman, Google 15 1716 17 18 23

Big Data & Machine Learning Cloud OnBoard 1 2 1 GCP now consists of a suite of products that together provide these 3 stepping stones in a business’ transformative journey 2 5 3 6 5 7 Flexibility, scalability 6 Change where you compute and reliability Change how you compute 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 Cost effective virtual machines, Reliable, autoscaling messaging, Fully managed products for data 14 storage, Hadoop, and MySQL to data processing, and storage. warehousing, data analysis, 16 migrate your current workloads to streaming, and machine learning. 15 the public cloud. 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 3 Machine learning. This is the next 2 5 3 transformation … the programming 6 5 paradigm is changing. Instead of 7 6 programming a computer, you teach a 8 7 9 computer to learn something and it 8 10 9 does what you want. 11 10 12 11 Eric Schmidt, 13 12 Executive Chairman, 14 13 Google 15 14 16 15 1716 17 18 24

Big Data & Machine Learning Cloud OnBoard 1 2 1 WIRED’s headline 3 2 5 3 “If you want to teach a neural network to 6 5 recognize a cat, for instance, you don’t 7 6 tell it to look for whiskers, ears, fur, 8 7 and eyes. You simply show it thousands 9 8 and thousands of photos of cats, and 10 9 eventually it works things out.” 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Machine Learning is not new, 3 2 but it is now mainstream 5 3 6 5 7 Search 6 8 People who bought ... 7 Spam filtering 9 8 Suggest next video 10 9 Route planning 11 10 Smart Reply 12 11 13 12 14 13 What’s common to all of 15 14 ? these use cases of Machine 16 15 Learning? 1716 17 18 25

Big Data & Machine Learning Cloud OnBoard 1 2 1 There are three components in a recommendation system 3 2 5 3 Rating Training Recommending 6 5 7 Users rate a few houses A machine learning model is For each user, the model is 6 8 explicitly or implicitly created to predict a user’s applied to every unrated 7 rating of a house house and the top 5 houses 9 8 for that user are saved. 10 9 11 10 12 11 13 12 14 13 15 14 16 15 ? What else is needed? 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 The ML algorithm essentially clusters users and items 3 2 5 3 1 Who is like this user? 2 Is this a good house? 6 5 7 6 8 7 9 8 10 9 11 10 12 11 How often do you need to compute 13 Predict rating 12 3 ? the predicted ratings? 14 Is this house similar to houses that 13 Where would you save them? people similar to this user like? 15 14 Predicted rating = user-preference * 16 15 item-quality 1716 17 18 26

Big Data & Machine Learning Cloud OnBoard 1 2 1 In addition to the ML algorithm, you also need 3 sophisticated data management 2 5 3 6 5 7 Data Collection Scalable front end to collect customer actions 6 8 7 9 8 10 Data Analysis Data that is accessible and not silo-ed 9 11 10 12 11 Machine Learning (Re-)training and experimentation 13 12 14 13 15 14 Serving Scalable, real-time system to serve 16 recommendations 15 1716 17 18 Big Data & Machine Learning 1 2 Agenda 3 5 Stepping stones to transformation 6 7 Your SQL database in the cloud + Demo 8 Managed Hadoop in the cloud + Demo 9 10 11 12 13 14 15 16 17 27

Big Data & Machine Learning Cloud OnBoard 1 2 1 Choose your storage solution based on your access pattern 3 2 5 Cloud 3 Storage Cloud SQL Datastore Bigtable BigQuery 6 5 7 Capacity Petabytes + Gigabytes Terabytes Petabytes Petabytes 6 8 7 9 Access Like files in a Relational Persistent Key-value(s), Relational 8 metaphor file system database Hashmap HBase API 10 9 Have to copy to Filter objects 11 Read SELECT rows scan rows SELECT rows 10 local disk on property 12 11 Write One file INSERT row put object put row Batch/stream 13 12 14 13 Update An object Field Attribute Row Field granularity (a “file”) 15 14 No-ops, high 16 No-ops SQL Structured Interactive SQL* 15 Usage Store blobs database on data from throughput, querying fully the cloud AppEngine apps scalable, managed warehouse 1716 flattened data 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Cloud SQL is a fully managed database service 3 2 5 3 Flexible pricing 6 5 7 Familiar 6 8 7 9 Managed backups 8 10 Cloud SQL 9 Google-managed 11 Automatic replication 10 MySQL or Postgres 12 11 Fast connection from GCE & GAE 13 12 14 13 Connect from anywhere 15 14 16 15 Google Security 1716 17 18 28

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Set up rentals data 9 9 10 10 11 in Cloud SQL 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo: Setup rentals data in Cloud SQL 3 2 External 5 machine 3 In this demo, we populate rentals data in Cloud 6 5 SQL for the recommendation engine to use: 7 6 8 7 1. Create Cloud SQL instance 9 8 2. Create database tables by importing .sql 10 9 files from Cloud Storage 11 10 3. Populate the tables by importing .csv 12 Cloud Import 11 files from Cloud Storage Storage 13 12 4. Allow access to Cloud SQL 14 13 5. Explore the rentals data using SQL 15 14 statements from Cloud Shell 16 Cloud SQL 15 1716 17 18 29

Big Data & Machine Learning 1 2 Agenda 3 5 Stepping stones to transformation 6 7 Your SQL database in the cloud + Demo 8 Managed Hadoop in the cloud + Demo 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 There is a rich open-source ecosystem for big data 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 http://hadoop.apache.org/ 13 http://pig.apache.org/ 15 14 http://hive.apache.org/ 16 http://spark.apache.org/ 15 1716 17 18 30

Big Data & Machine Learning Cloud OnBoard 1 2 1 Dataproc reduces the cost and complexity associated with 3 Spark and Hadoop clusters 2 5 3 6 5 Image Versioning 7 6 8 Familiar 7 9 8 10 Dataproc Resize in seconds 9 Google-managed: 11 10 Hadoop Automated cluster mgmt 12 11 Pig 13 12 Hive Integrates with Google Cloud 14 13 Spark 15 14 Flexible VMs 16 15 1716 Google Security 17 18 Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Recommendations ML 9 9 10 10 11 with Dataproc 11 12 12 13 13 14 14 15 15 16 16 17 31

Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo: Recommendations ML with Cloud Dataproc 3 2 5 3 In this demo, we implement 6 5 machine learning recommendations 7 6 using Cloud Dataproc: 8 2 Train 7 model 1. Launch Dataproc 9 8 2. Train and apply ML model 10 9 written in PySpark to create 11 10 1 Dataproc product recommendations 12 11 3. Explore inserted rows in 13 12 Cloud SQL 14 13 3 Cloud SQL 15 14 Show 16 recommendations 15 1716 17 18 Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 7 7 8 8 Module Review 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 32

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (1 of 2) 3 2 5 3 Relational databases are a good choice when you need: 6 5 (select all of the correct options) 7 6 8 7 ❏ Streaming, high-throughput writes 9 8 ❏ Fast queries on terabytes of data 10 9 11 ❏ Aggregations on unstructured data 10 12 ❏ Transactional updates on relatively small datasets 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (1 of 2) 3 2 5 3 Relational databases are a good choice when you need: 6 5 (select all of the correct options) 7 6 8 7 ❏ Streaming, high-throughput writes 9 8 ❏ Fast queries on terabytes of data 10 9 11 ❏ Aggregations on unstructured data 10 12 ✓ Transactional updates on relatively small datasets 11 13 12 14 13 15 14 16 15 1716 17 18 33

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (2 of 2) 3 2 5 3 Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and 6 5 Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform? 7 6 (select all of the correct options) 8 7 9 8 ❏ It’s the same API, but Google implements it better 10 9 ❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on 11 10 ❏ Fully-managed versions of the software offer no-ops 12 11 13 ❏ Running it on Google infrastructure offers reliability and cost savings 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review (2 of 2) 3 2 5 3 Cloud SQL and Cloud Dataproc offer familiar tools (MySQL and 6 5 Hadoop/Pig/Hive/Spark). What is the value-add provided by Google Cloud Platform? 7 6 (select all of the correct options) 8 7 9 8 ❏ It’s the same API, but Google implements it better 10 9 ❏ Google-proprietary extensions and bug fixes to MySQL, Hadoop, and so on 11 10 ✓ Fully-managed versions of the software offer no-ops 12 11 13 ✓ Running it on Google infrastructure offers reliability and cost savings 12 14 13 15 14 16 15 1716 17 18 34

Big Data & Machine Learning 1 2 Resources 3 5 Cloud SQL https://cloud.google.com/sql/ 6 7 Cloud Dataproc https://cloud.google.com/dataproc/ 8 Cloud Solutions https://cloud.google.com/solutions/ 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 7 7 Scaling Data Analysis 8 8 9 9 10 10 11 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 Version #1.1 16 17 35

Big Data & Machine Learning 1 2 Agenda 3 5 Fast random access 6 7 Warehouse and interactively query petabytes 8 Interactive, iterative development + Demo 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Choosing where to store data on GCP 3 2 5 3 6 5 Ne unstructured structured e d 7 MOBILE 6 SDKs 8 7 Transactional Data analytics 9 8 Cloud workload workload 10 Firebase 9 Storage Storage No-SQL Millisecond 11 SQL Latency 10 12 Cloud 11 One Ne Bigtable e d Cloud dat ab ase 13 MOBILE 12 SQL en o ugh SDKs 14 13 Latency in 15 Cloud Horizontal seconds 14 Cloud Spanner scalability Firebase BigQuery 16 15 Realtime DB Datastore 1716 17 18 36

Big Data & Machine Learning Cloud OnBoard 1 2 1 Use cloud spanner if you need globally consistent data or more 3 2 than one Cloud SQL instance 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 Source: 16 15 https://quizlet.com/blog/ 1716 quizlet-cloud-spanner 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Comparing storage options: technical details 3 2 5 3 6 Cloud Cloud Cloud 5 Bigtable Cloud SQL BigQuery 7 Datastore Storage Spanner 6 8 7 Type NoSQL NoSQL Blobstore Relational Relational Relational 9 document wide column SQL for OLTP SQL for OLTP SQL for OLAP 8 10 9 Transactions Yes Single-row No Yes Yes No 11 10 12 Complex No No No Yes Yes Yes 11 queries 13 12 14 Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+ 13 15 14 Unit size 1 MB/entity ~10 MB/cell 5 TB/object Determined 10,240 MiB/ 10 MB/row 16 ~100 MB/row by DB engine row 15 1716 17 18 37

Big Data & Machine Learning Cloud OnBoard 1 2 1 Comparing storage options: use cases 3 2 5 3 Cloud Bigtable Cloud Cloud SQL Cloud BigQuery 6 Datastore Storage Spanner 5 7 6 Type NoSQL NoSQL Blobstore Relational Relational Relational 8 document wide column SQL for OLTP SQL for OLTP SQL for OLAP 7 9 8 Best for Getting “Flat” data, Structured Web Large-scale Interactive 10 9 started, App Heavy and frameworks, database querying, 11 Engine read/write, unstructured existing applications offline 10 applications events, binary or applications (> ~2 TB) analytics 12 11 analytical object data 13 data 12 14 13 Use cases Getting AdTech, Images, User Whenever Data 15 14 started, App Financial large media credentials, high I/O, warehousing 16 Engine and IoT data files, customer global 15 applications backups orders consistency 1716 is needed 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Bigtable is meant for high throughput data where access is primarily 3 for a range of Row Key prefixes 2 5 3 6 5 7 6 Row Key Column data 8 7 9 NASDAQ#1426535612045 MD:SYMBOL: MD:LASTSALE: MD:LASTSIZE: MD:TRADETIME: MD:EXCHANGE: 8 ZXZZT 600.58 300 1426535612045 NASDAQ 10 9 11 ... ... ... ... ... ... 10 12 11 13 12 Tables should be tall and narrow 14 Store changes as new rows 13 15 14 Bigtable will automatically 16 15 compact the table 1716 17 18 38

Big Data & Machine Learning Cloud OnBoard 1 2 1 Short meaningful column names reduce storage and RPC overhead 3 2 5 3 Design row key with most 6 common query in mind 5 Column families is a quick 7 way to get some hierarchy 6 8 7 9 8 Row Key Column data 10 9 11 NASDAQ#1426535612045 MD:SYMBOL: MD:LASTSALE: MD:LASTSIZE: MD:TRADETIME: MD:EXCHANGE: 10 ZXZZT 600.58 300 1426535612045 NASDAQ 12 11 13 12 14 13 15 14 Design row key to minimize hotspots Use short column names 16 Designed for sparse tables 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Can work with Bigtable using the HBase API 3 2 import org.apache.hadoop.hbase.*; 5 import org.apache.hadoop.hbase.client.*; 3 import org.apache.hadoop.hbase.util.*; 6 5 7 byte[] CF = Bytes.toBytes("MD"); // column family 6 Connection connection = ConnectionFactory.createConnection(...) 8 Table table = null; 7 try { 9 table = connection.getTable(TABLE_NAME); 8 Put p = new Put(Bytes.toBytes("NASDAQ#GOOG #1234561234561")); 10 9 p.addColumn(CF, Bytes.toBytes("SYMBOL"), Bytes.toBytes("GOOG")); 11 p.addColumn(CF, Bytes.toBytes("LASTSALE"), Bytes.toBytes(742.03d)); 10 ... 12 table.put(p); 11 } finally { 13 if (table != null) table.close(); 12 } 14 13 15 14 16 15 1716 17 18 39

Big Data & Machine Learning Cloud OnBoard 1 2 1 Comparing storage options: technical details 3 2 5 3 6 Cloud Cloud Cloud 5 Bigtable Cloud SQL BigQuery 7 Datastore Storage Spanner 6 8 7 Type NoSQL NoSQL Blobstore Relational Relational Relational 9 document wide column SQL for OLTP SQL for OLTP SQL for OLAP 8 10 9 Transactions Yes Single-row No Yes Yes No 11 10 12 Complex No No No Yes Yes Yes 11 queries 13 12 14 Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+ 13 15 14 Unit size 1 MB/entity ~10 MB/cell 5 TB/object Determined 10,240 MiB/ 10 MB/row 16 ~100 MB/row by DB engine row 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Comparing storage options: use cases 3 2 5 3 Cloud Bigtable Cloud Cloud SQL Cloud BigQuery 6 Datastore Storage Spanner 5 7 6 Type NoSQL NoSQL Blobstore Relational Relational Relational 8 document wide column SQL for OLTP SQL for OLTP SQL for OLAP 7 9 8 Best for Getting “Flat” data, Structured Web Large-scale Interactive 10 9 started, App Heavy and frameworks, database querying, 11 Engine read/write, unstructured existing applications offline 10 applications events, binary or applications (> ~2 TB) analytics 12 11 analytical object data 13 data 12 14 13 Use cases Getting AdTech, Images, User Whenever Data 15 14 started, App Financial large media credentials, high I/O, warehousing 16 Engine and IoT data files, customer global 15 applications backups orders consistency 1716 is needed 17 18 40

Big Data & Machine Learning 1 2 Agenda 3 5 Fast random access 6 7 Warehouse and interactively query petabytes 8 Interactive, iterative development + Demo 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 BigQuery is a fully managed data warehouse that lets you do ad-hoc 3 SQL queries on massive volumes of data 2 5 3 6 5 BigQuery Service 7 6 8 7 9 8 10 Project X Project Y 9 11 10 Dataset A Dataset B Dataset C Dataset D 12 11 13 Table 1 Table 1 Table 1 Table 1 12 14 13 15 14 Table 2 Table 2 Table 2 Table 2 16 15 1716 17 18 41

Big Data & Machine Learning Cloud OnBoard 1 2 1 A demo of BigQuery on a 10 billion-row dataset shows what it is 3 and what it can do 2 5 3 6 5 #standardsql 7 SELECT Familiar, SQL 2011 query 6 language, SUM(views) as views language 8 7 FROM `bigquery-samples.wikipedia_benchmark.Wiki10B` Interactive ad-hoc analysis 9 WHERE of petabyte-scale databases 8 title like "%google%" No need to provision 10 9 GROUP by language clusters 11 ORDER by views DESC 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Three ways of loading data into BigQuery 3 2 5 3 6 5 Files on disk or Cloud Stream Data Federated data source 7 Storage 6 8 7 9 8 10 9 CSV 11 JSON 10 AVRO 12 11 Google 13 Sheets 12 Serverless POST 14 13 ETL 15 14 16 15 1716 17 18 42

Big Data & Machine Learning Cloud OnBoard 1 2 1 With Federated data sources, you can directly query files on 3 Cloud Storage, without having to ingest them into BigQuery 2 5 3 6 5 7 6 8 7 Also: Google Drive, Bigtable 9 8 10 Also: JSON/Avro/Google Sheet 9 11 10 12 11 13 12 14 Can also pass in a schema 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Comparing storage options: technical details 3 2 5 3 6 Cloud Cloud Cloud 5 Bigtable Cloud SQL BigQuery 7 Datastore Storage Spanner 6 8 7 Type NoSQL NoSQL Blobstore Relational Relational Relational 9 document wide column SQL for OLTP SQL for OLTP SQL for OLAP 8 10 9 Transactions Yes Single-row No Yes Yes No 11 10 12 Complex No No No Yes Yes Yes 11 queries 13 12 14 Capacity Terabytes+ Petabytes+ Petabytes+ 500 GB Petabytes Petabytes+ 13 15 14 Unit size 1 MB/entity ~10 MB/cell 5 TB/object Determined 10,240 MiB/ 10 MB/row 16 ~100 MB/row by DB engine row 15 1716 17 18 43

Big Data & Machine Learning Cloud OnBoard 1 2 1 Comparing storage options: use cases 3 2 5 3 Cloud Bigtable Cloud Cloud SQL Cloud BigQuery 6 Datastore Storage Spanner 5 7 6 Type NoSQL NoSQL Blobstore Relational Relational Relational 8 document wide column SQL for OLTP SQL for OLTP SQL for OLAP 7 9 8 Best for Getting “Flat” data, Structured Web Large-scale Interactive 10 9 started, App Heavy and frameworks, database querying, 11 Engine read/write, unstructured existing applications offline 10 applications events, binary or applications (> ~2 TB) analytics 12 11 analytical object data 13 data 12 14 13 Use cases Getting AdTech, Images, User Whenever Data 15 14 started, App Financial large media credentials, high I/O, warehousing 16 Engine and IoT data files, customer global 15 applications backups orders consistency 1716 is needed 17 18 Big Data & Machine Learning 1 2 Agenda 3 5 Fast random access 6 7 Warehouse and interactively query petabytes 8 Interactive, iterative development + Demo 9 10 11 12 13 14 15 16 17 44

Big Data & Machine Learning Cloud OnBoard 1 2 1 Increasingly, data analysis and machine learning are carried 3 out in self-descriptive, shareable, executable notebooks 2 5 3 6 5 Share 7 6 Code 8 7 9 8 10 9 A typical notebook 11 contains code, 10 Output 12 charts, and 11 explanations 13 12 14 13 15 14 16 Image Source: 15 Markup Git Logo from 1716 Wikipedia 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Datalab is an open-source notebook built on Jupyter (IPython) 3 2 5 3 6 5 Analyze data in BigQuery, Datalab is free—just pay 7 for Google Cloud resources 6 Compute Engine or Cloud Storage 8 7 9 8 10 9 11 10 12 11 13 12 14 Use existing 13 Python packages 15 14 16 15 1716 17 18 45

Big Data & Machine Learning Cloud OnBoard 1 2 1 Datalab notebooks are developed in an iterative, collaborative process 3 2 5 3 PHASE 5 PHASE 1 6 5 Share and Write code in 2 7 collaborate Python 5 5 6 8 7 1 9 8 Development 10 9 Process in 3 11 Cloud Datalab 10 PHASE 4 PHASE 2 12 11 Write Run cell 13 commentary in (Shift+Enter) 12 markdown 4 14 13 15 14 PHASE 3 16 15 Examine Output 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Datalab supports BigQuery 3 2 5 3 %%sql 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 To Pandas 12 14 13 15 14 16 15 1716 17 18 46

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Create ML dataset 9 9 10 10 11 with BigQuery 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo: Create ML dataset with 3 2 BigQuery 5 3 6 5 In this demo, we use BigQuery to create a 7 6 dataset that we later use to build a taxi 8 demand forecast system using Machine Learning. 7 9 8 ● What kinds of things affect taxi demand? 10 9 ● What are some ways to measure “demand”? 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 47

Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo: Create ML dataset with BigQuery 3 2 5 3 In this demo, we use BigQuery to create a dataset that we later 6 5 use to build a taxi demand forecast system using Machine Learning. 7 6 8 7 1. Use BigQuery and Datalab to explore and visualize data 9 8 2. Build a Pandas dataframe that will be used as the training 10 9 dataset for machine learning using TensorFlow 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 7 7 8 8 Module Review 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 48

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Match the use case on the left with the product on the right 6 5 7 6 8 Global consistency needed 1. Datalab 7 9 8 High-throughput writes of wide-column data 2. BigTable 10 9 11 10 Warehousing structured data 3. BigQuery 12 11 13 Develop Big Data algorithms interactively in Python 4. Spanner 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Match the use case on the left with the product on the right 6 5 7 6 8 Global consistency needed (4) 1. Datalab 7 9 8 High-throughput writes of wide-column data (2) 2. BigTable 10 9 11 10 Warehousing structured data (3) 3. BigQuery 12 11 13 Develop Big Data algorithms interactively in Python (1) 4. Spanner 12 14 13 15 14 16 15 1716 17 18 49

Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 7 7 Machine Learning 8 8 9 9 10 10 11 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 Version #1.1 16 17 Big Data & Machine Learning 1 2 Agenda 3 5 Machine learning with TensorFlow + Demo 6 7 Pre-built machine learning models + Demo 8 9 10 11 12 13 14 15 16 17 50

Big Data & Machine Learning Cloud OnBoard 1 2 1 TensorFlow is an open source library that underlies many Google products 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo: Playing with neural networks to learn what they are 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 http://playground.tensorflow.org/ 17 18 51

Big Data & Machine Learning Cloud OnBoard 1 2 1 Supervised machine learning requires features and labels 3 2 5 Neural Network 3 6 5 7 6 8 7 9 8 Input … 10 features Prediction 9 11 … 10 12 … 11 13 12 14 13 15 14 Cost 16 15 1716 Neural network imageby Dake, Mysid [CC BY 1.0], via Wikimedia Commons 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Machine Learning with TensorFlow involves four steps: 3 2 5 1 3 Gather Gather training data (input features and labels) 6 Data 5 7 6 8 7 2 Create model 9 Create 8 10 9 11 10 3 12 Train Train the model based on input data 11 13 12 14 13 4 Use the model on new data 15 14 Use 16 15 1716 17 18 52

Big Data & Machine Learning Cloud OnBoard 1 2 1 Gather training data and select input features 3 2 Input features 5 3 6 5 1 7 6 Gather 8 Data 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 discard target 1716 Neural network imageby Dake, Mysid [CC BY 1.0], via Wikimedia Commons 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 All input features need to be numeric 3 2 5 3 Use as-is One-hot encoding 6 5 1 7 6 Gather 8 Data 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 Neural network imageby Dake, Mysid [CC BY 1.0], via Wikimedia Commons 17 18 53

Big Data & Machine Learning Cloud OnBoard 1 2 1 Create a neural network model, defining the number of feature columns 3 and hidden units 2 5 3 6 5 nhidden 7 2 6 Create 8 7 9 8 10 9 noutputs 11 npredictors 10 12 … 11 13 … 12 14 13 15 14 estimator = DNNRegressor(hidden_units=[5], feature_columns=[...]) 16 15 1716 Neural network imageby Dake, Mysid [CC BY 1.0], via Wikimedia Commons 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Train the model on the collected data 3 2 5 3 6 model 5 7 3 Predicted 6 Train npredictors value of 8 taxicab 7 … … demand 9 8 10 9 Update Cost 11 model based 10 on Cost 12 11 13 12 True value of 14 taxicab 13 demand 15 14 16 15 estimator.fit(predictors, targets, steps=1000) 1716 Neural network imageby Dake, Mysid [CC BY 1.0], via Wikimedia Commons 17 18 54

Big Data & Machine Learning Cloud OnBoard 1 2 1 Train the model on the collected data 3 2 5 3 6 model 5 7 4 6 Use 8 7 9 8 rain Predicted value 10 9 of taxicab 11 Max temp demand 10 … … 12 11 13 12 Cost 14 13 15 14 Update model based on True value of 16 15 Cost taxicab demand 1716 Neural network imageby Dake, Mysid [CC BY 1.0], via Wikimedia Commons 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Train the model on the collected data 3 2 5 3 6 5 input = pd.DataFrame.from_dict(data = 7 4 6 Use {'dayofweek' : [4, 5, 6], 8 'mintemp' : [60, 15, 60], 7 'maxtemp' : [80, 80, 65], 9 8 'rain' : [0, 0.8, 0]}) 10 9 11 10 12 # read trained model from /tmp/trained_model 11 estimator = DNNRegressor(model_dir='/tmp/trained_model', 13 12 hidden_units=[5]) 14 13 15 14 pred = estimator.predict(input.values) 16 print pred 15 1716 17 18 55

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo 2 Part 2: 7 7 8 8 Carry out ML 9 9 10 10 11 with TensorFlow 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Demo 2, Part 2: Carry out ML with TensorFlow 3 2 5 3 In this demo, we build a neural network to predict taxicab demand 6 5 on a day-by-day basis using TensorFlow. 7 6 8 7 9 8 10 9 11 Inputs Prediction 10 12 Neural Network 11 13 12 14 13 15 14 16 15 1716 17 18 56

Big Data & Machine Learning 1 2 Agenda 3 5 Machine learning with TensorFlow + Demo 6 7 Pre-built machine learning models + Demo 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 The accuracy of a ML problem is driven largely by the size and quality 3 of the dataset; this is why ML requires massive compute 2 5 3 6 Scale of Compute Problem 5 7 6 8 7 9 Accuracy 8 10 9 11 10 12 11 13 12 14 13 15 14 16 Size of dataset 15 1716 https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html 17 18 57

Big Data & Machine Learning Cloud OnBoard 1 2 1 CloudML Engine simplifies the use of Distributed TensorFlow 3 2 5 3 ... 6 5 7 6 ... 8 7 9 . 8 . . 10 . 9 Size of . 11 . 10 dataset 12 11 ... 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 ML APIs are pre-trained ML models (trained off Google’s data) for common 3 tasks; they are accessible through REST APIs 2 5 3 6 Use your own data to train models Machine Learning as an API 5 7 6 8 7 9 8 10 9 Cloud Cloud 11 Vision API Speech API 10 12 TensorFlow Cloud Machine 11 Learning Engine 13 12 14 13 15 14 16 15 Cloud Cloud Cloud Video Natural Language Translation API Intelligence 1716 API 17 18 58

117 Big Data & Machine Learning 1 2 3 5 6 7 8 9 10 11 Logo Detection 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Face detection 3 2 "faceAnnotations" : [ 5 { 3 6 "headwearLikelihood" : "VERY_UNLIKELY", 5 "surpriseLikelihood" : "VERY_UNLIKELY", 7 rollAngle" : -4.6490049, 6 "angerLikelihood" : "VERY_UNLIKELY", 8 "landmarks" : [ 7 { 9 "type" : "LEFT_EYE", 8 10 "position" : { 9 "x" : 691.97974, 11 "y" : 373.11096, 10 "z" : 0.000037421443 12 } 11 }, 13 12 ... "detectionConfidence" : 0.93568963, ], 14 13 "boundingPoly" : { "joyLikelihood" : "VERY_LIKELY", "vertices" : [ "panAngle" : 4.150538, 15 14 { "sorrowLikelihood" : "VERY_UNLIKELY", 16 "x" : 743, "tiltAngle" : -19.377356, 15 "y" : 449 "underExposedLikelihood" : "VERY_UNLIKELY", 1716 }, "blurredLikelihood" : "VERY_UNLIKELY" ... 17 18 59

Big Data & Machine Learning Cloud OnBoard 1 2 1 Web annotations 3 2 5 { 3 "entityId": "/m/0gff2yr", 6 "score": 5.92256, 5 "description": "ArtScience Museum" 7 } 6 8 { 7 { "entityId": "/m/0h898pd", 9 "entityId": "/m/016ms7", "score": 7.4162, 8 "score": 1.44038, "description": "Harry Potter (Literary Series)" "description": "Ford Anglia" } 10 9 } 11 10 12 11 13 12 14 13 15 14 16 15 1716 CC-BY 2.0 Rev Stan: https://www.flickr.com/photos/revstan/6865880240 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Try it in the browser with your own images 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 cloud.google.com/vision 1716 17 18 60

Big Data & Machine Learning Cloud OnBoard 1 2 1 The Translation API supports 100+ languages 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 https://cloud.google.com/translate/ 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Wootric uses the Cloud Natural Language API (entity and sentiment) to 3 make sense of qualitative customer feedback 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 61

Big Data & Machine Learning Cloud OnBoard 1 2 1 Extracted entities are tied into a knowledge graph 3 2 5 { 3 "name": "Joanne 'Jo' Rowling", 6 "type": "PERSON", 5 "metadata": { 7 "mid": "/m/042xh", 6 "wikipedia_url": "http://en.wikipedia.org/wiki/J._K._Rowling" 8 } 7 9 8 10 9 Joanne "Jo" Rowling, pen names J. K. Rowling and Robert Galbraith, 11 is a British novelist, screenwriter and film producer best known as 10 12 the author of the Harry Potter fantasy series 11 13 12 { { 14 13 "name": "British", "name": "Harry Potter", "type": "LOCATION", "type": "PERSON", 15 "metadata": { 14 "metadata": { "mid": "/m/07ssc", "mid": "/m/078ffw", 16 "wikipedia_url": 15 "wikipedia_url": "http://en.wikipedia.org/wiki/United_Kingdom" "http://en.wikipedia.org/wiki/Harry_Potter" } } 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 When you analyze sentiment, you get a score (positive/negative) as well 3 as a magnitude (how intense?) 2 5 3 6 5 7 The food was excellent, I would definitely go back! 6 8 7 { 9 8 "documentSentiment": { 10 9 "score": 0.8, 11 "magnitude": 0.8 10 12 } 11 } 13 12 14 13 15 14 16 15 1716 17 18 62

Big Data & Machine Learning Cloud OnBoard 1 2 1 The Cloud Speech API can be used to transcribe audio to text 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 http://cloud.google.com/speech 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Like the Vision API, the Video Intelligence API can identify labels in a 3 video, along with a timestamp 2 5 3 { 6 5 "description": "Bird's-eye view", 7 6 "language_code": "en-us", 8 7 "locations": { 9 8 "segment": { 10 9 "start_time_offset": 71905212, 11 10 "end_time_offset": 73740392 12 11 }, 13 12 "confidence": 0.96653205 14 13 } 15 14 } 16 15 1716 https://cloud.google.com/video-intelligence/ 17 18 63

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo 2 Part 3: 7 7 8 8 Machine Learning APIs 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 3 2 5 3 Demo 2, Part 3: Machine 6 5 Learning APIs 7 6 8 7 Use several of the Machine Learning 9 8 APIs (Vision, Translate, Natural 10 9 Language Processing, Speech) from 11 Python 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 64

Big Data & Machine Learning 1 “How much is this car worth?” 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning 1 “Thanks to the Google Cloud Platform, Ocado was able to use 2 the power of cloud computing and train our models in parallel.” 3 5 6 7 “Hi Ocado, 8 I love your website. I have children so it’s 9 easier for me to do the shopping online. 10 Many thanks for saving my time! 11 Regards” 12 Improves natural 13 language processing 14 of customer service Feedback Customer is happy 15 claims 16 17 65

Big Data & Machine Learning 1 2 3 5 6 50% 7 8 of enterprises will be 9 spending more per annum 10 on bots and chatbot creation than traditional 11 mobile app development by 12 2021 – Gartner 13 14 15 16 17 Big Data & Machine Learning 1 2 Custom image Build off NLP Use Vision Use 3 model to API to route API as-is to Dialogflow to 5 price cars customer find text in create a new emails memes shopping 6 experience 7 8 9 10 11 12 13 14 15 16 17 66

Big Data & Machine Learning Introducing Cloud AutoML A technology that can automatically create a Machine Learning Model 1 2 3 5 6 7 8 9 DADATATA ML MODEL TUNE ML MODEL 10 ML MODEL DESIGN TUNE ML MODEL EVEVALUATEALUATE DEPLOY UPUPDATEDATE PREPROCESSINGPREPROCESSING DESIGN PARAMETERS DEPLOY PARAMETERS 11 12 13 14 15 16 Confidential & Proprietary 17 Big Data & Machine Learning 1 Cloud AutoML Vision 2 3 Train your model Evaluate Upload and label in minutes or one day 5 images 6 7 8 9 10 11 Cloud AutoML 12 13 Handbag Shoe Hat 14 15 Model is now trained and ready to make prediction. 16 This model can scale as needed to adapt to customer demands. 17 67

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Module Review 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Match the use case on the left with the 6 5 7 product on the right 6 8 7 9 8 1. Vision API 10 Create, test new machine learning methods 9 2. TensorFlow 11 No-ops, custom machine learning applications at scale 10 12 Automatically reject inappropriate image content 3. Speech API 11 Build application to monitor Spanish twitter feed 13 12 Transcribe customer support calls 4. Cloud ML 14 13 5. Translation API 15 14 16 15 1716 17 18 68

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Match the use case on the left with the 6 5 7 product on the right 6 8 7 9 8 1. Vision API 10 Create, test new machine learning methods (2) 9 2. TensorFlow 11 No-ops, custom machine learning applications at scale (4) 10 12 Automatically reject inappropriate image content (1) 3. Speech API 11 Build application to monitor Spanish twitter feed (5) 13 12 Transcribe customer support calls (3) 4. Cloud ML 14 13 5. Translation API 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Resources (1 of 2) 3 2 5 3 6 5 https://cloud.google.com/spanner/ 7 Cloud Spanner 6 8 7 Cloud Bigtable https://cloud.google.com/bigtable/ 9 8 10 9 Google BigQuery https://cloud.google.com/bigquery/ 11 10 12 11 Cloud Datalab https://cloud.google.com/datalab/ 13 12 14 TensorFlow https://www.tensorflow.org/ 13 15 14 16 15 1716 17 18 69

Big Data & Machine Learning Cloud OnBoard 1 2 1 Resources (2 of 2) 3 2 5 3 6 5 Cloud Machine Learning https://cloud.google.com/ml/ 7 6 8 7 Vision API https://cloud.google.com/vision/ 9 8 10 9 Translation API https://cloud.google.com/translate/ 11 10 12 Speech API https://cloud.google.com/speech/ 11 13 12 https://cloud.google.com/video- 14 13 Video Intelligence API intelligence 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 7 7 8 Data Processing Architecture 8 9 9 10 10 11 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 16 17 70

Big Data & Machine Learning 1 2 Agenda 3 5 Message-oriented architectures 6 7 Serverless data pipelines 8 GCP Reference Architecture 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Asynchronous processing is useful for 3 P1 P2 P3 Producers 2 long-lived tasks or to have loose 5 3 coupling between two systems 6 5 7 6 8 7 9 Potential use cases: 8 Message 10 9 Queue 1. Send an SMS 11 2. Train ML model 10 12 3. Process data from multiple sources 11 4. Weekly reports … 13 12 14 13 15 14 16 15 C1 C2 C3 Consumers 1716 17 18 71

Big Data & Machine Learning Cloud OnBoard 1 2 For robust asynchronous processing, you need: 1 3 2 P1 P2 P3 5 3 6 5 1. A global, highly available queue 7 6 8 7 3. Queue 9 8 must be 10 9 interoperable 2. Scale without over-provisioning 11 10 12 11 13 12 14 13 15 14 4. Reliable delivery of messages 16 15 1716 C1 C2 C3 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Pub/Sub provides a no-ops, serverless global message queue 3 2 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 72

Big Data & Machine Learning 1 2 Agenda 3 5 Message-oriented architectures 6 7 Serverless data pipelines 8 GCP Reference Architecture 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 Dataflow offers NoOps data pipelines in Java and Python 1 3 Open-source API (Apache 2 p = beam.Pipeline(options=options) Beam) can be executed on 5 Input 3 Flink, Spark, etc. also 6 5 Read lines = p | beam.io.ReadFromText(‘gs://…’) 7 6 8 Transform 1 traffic = lines | beam.Map(parse_data).with_output_types(unicode) 7 9 8 Transform 2 | beam.Map(get_speedbysensor) # (sensor, speed) Map 10 9 11 | beam.GroupByKey() # (sensor, [speed]) Group-By 10 Group 12 11 Transform 3 | beam.Map(avg_speed) # (sensor, avgspeed) Reduce 13 12 14 | beam.Map(lambda tup: '%s: %d' % tup)) 13 Transform 4 15 14 Write output = traffic | beam.io.WriteToText(‘gs://...]’) Each of these steps is run 16 15 in parallel and autoscaled 1716 Output p.run(); by execution framework 17 18 73

Big Data & Machine Learning Cloud OnBoard 1 2 1 Same code does real-time and batch 3 2 5 options = PipelineOptions(pipeline_args) 3 6 options.view_as(StandardOptions).streaming = True 5 7 p = beam.Pipeline(options=options) 6 lines = p | beam.io.ReadStringsFromPubSub(input_topic) 8 BigQuery 7 traffic = (lines 9 | 8 10 Cloud beam.Map(parse_data).with_output_types(unicode) 9 Pub/Sub | beam.Map(get_speedbysensor) # (sensor, 11 10 Cloud speed) 12 Cloud | beam.WindowInto(window.FixedWindows(15, 0)) 11 Dataflow Pub/Sub | beam.GroupByKey() # (sensor, [speed]) 13 12 | beam.Map(avg_speed) # (sensor, avgspeed) 14 | beam.Map(lambda tup: '%s: %d' % tup)) 13 Cloud 15 Storage traffic | beam.io.WriteStringsToPubSub(output_topic) 14 Cloud 16 15 Storage p.run() 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Dataflow does ingest, transform, and load; consider using it 3 2 instead of Spark 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 74

Big Data & Machine Learning 1 2 Agenda 3 5 Message-oriented architectures 6 7 Serverless data pipelines 8 GCP Reference Architecture 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 Choosing where to store data on GCP 2 1 3 2 5 3 unstructured structured 6 5 7 6 8 7 Transactional Data analytics 9 workload workload 8 10 Cloud Millisecond 9 SQL No-SQL Storage Latency 11 10 12 11 One Cloud 13 database 12 Cloud enough Bigtable 14 13 SQL Latency in 15 seconds 14 Horizontal 16 scalability 15 Cloud 1716 Spanner Cloud BigQuery 17 Datastore 18 75

Big Data & Machine Learning Cloud OnBoard 1 2 1 Run Spark/Hadoop jobs on Cloud Dataproc 3 2 5 3 Input and Output 6 Data Sources 5 Direct 7 access 6 8 Cloud 7 Storage 9 API Cloud Input and 8 Client output 10 Dataproc connectors Cloud 9 Bigtable 11 10 12 BigQuery 11 Applications on 13 cluster 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 On GCP, you can have the same data processing pipeline for 3 2 processing both batch and stream 5 3 6 Events, 5 metrics, Cloud and so on Cloud Stream Datalab 7 BigQuery 6 Pub/Sub 8 7 Cloud ML 9 Raw logs, Cloud Data Studio 8 files, assets, Dataflow Engine Dashboards/BI 10 Google 9 Analytics data, and so on Cloud 11 Bigtable 10 Storage Batch 12 11 Co-workers 13 12 14 13 B C A 15 14 Applications 16 and Reports 15 1716 17 18 76

Big Data & Machine Learning 1 1 2 2 3 3 5 5 6 6 Demo: 7 7 8 8 Module Review 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Match the use case on the left with the product on the right 6 5 7 6 8 A. Decoupling producers and consumers of data 1. Cloud Dataflow 7 9 in large organizations and complex systems 8 10 9 B. Scalable, fault-tolerant multi-step 11 10 processing of data 2. Cloud Pub/Sub 12 11 13 12 14 13 15 14 16 15 1716 17 18 77

Big Data & Machine Learning Cloud OnBoard 1 2 1 Module review 3 2 5 3 Match the use case on the left with the product on the right 6 5 7 6 8 A. Decoupling producers and consumers of data 1. Cloud Dataflow 7 9 in large organizations and complex systems 8 10 9 B. Scalable, fault-tolerant multi-step 11 10 processing of data 2. Cloud Pub/Sub 12 11 13 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Resources (1 of 2) 3 2 5 3 6 5 Cloud Pub/Sub https://cloud.google.com/pubsub/ 7 6 8 7 Cloud Dataflow https://cloud.google.com/dataflow/ 9 8 10 9 Processing media using https://cloud.google.com/solutions/me 11 10 Cloud Pub/Sub and 12 dia-processing-pub-sub-compute-engine 11 Compute Engine 13 12 14 13 15 14 16 15 1716 17 18 78

Big Data & Machine Learning Cloud OnBoard 1 2 1 Resources (2 of 2) 3 2 5 3 6 5 Reverse Geocoding of 7 6 https://cloud.google.com/solutions/reverse- 8 Geolocation Telemetry 7 geocoding-geolocation-telemetry-cloud-maps- 9 in the Cloud Using the 8 ap 10 9 Maps API 11 10 12 Using Cloud Pub/Sub for https://cloud.google.com/solutions/us 11 13 Long-running Tasks ing-cloud-pub-sub-long-running-tasks 12 14 13 15 14 16 15 1716 17 18 Big Data & Machine Learning 1 1 2 2 3 Cloud OnBoard 3 5 5 6 6 7 7 Summary 8 8 9 9 10 10 11 11 12 12 Cloud OnBoard 13 13 14 14 15 15 16 Version #1.1 16 17 79

Big Data & Machine Learning Cloud OnBoard 1 2 1 An Evolving Cloud 3 2 5 3 6 5 1st Wave 7 6 Your kit, someone 8 else’s building. 7 Yours to manage. 9 8 2nd Wave 10 9 Standard virtual 11 kit,for rent. 10 Still yours to manage. 12 11 3rd Wave 13 12 14 Invest your energy 13 in great apps 15 14 16 15 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Google Cloud provides a way to take advantage of Google’s 3 investments in infrastructure and data processing innovation 2 5 3 6 5 7 6 8 7 9 Cloud DataStore Pub/Sub Cloud 8 Storage Spanner 10 9 11 10 12 11 Cloud DataProc Bigtable BigQuery DataFlow DataFlow ML Engine Auto ML 13 Storage 12 14 13 15 14 16 2002 2004 2006 2008 2010 2012 2014 2016 2018 15 1716 17 18 80

Big Data & Machine Learning Cloud OnBoard 1 2 1 Typical Big Data Processing 3 2 5 Monitoring Programming 3 6 5 7 6 8 Performance Resource 7 tuning provisioning 9 8 10 9 11 10 12 11 13 Utilization Handling 12 improvements growing scale 14 13 15 14 Deployment & Reliability 16 15 configuration 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Big Data with Google: Focus on insight, not infrastructure. 3 2 5 Programming 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 81

Big Data & Machine Learning Cloud OnBoard 1 2 1 In summary, GCP offers you ways to... 3 2 5 3 6 5 7 6 8 7 9 8 10 Spend less on ops Incorporate real-time Apply machine Create citizen 9 and administration data into apps and learning broadly data scientists 11 architectures and easily 10 12 We make it simple and Transform your 11 We’ve “automated To get the most out” the complexity out of data and practical to organization into 13 12 of building and secure competitive incorporate machine a truly data driven 14 maintaining data advantage. learning models company. Putting 13 within custom tools into hands of and analytics applications. domain experts. 15 14 systems. 16 15 16 17 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 Next Steps on your Google Cloud learning journey 1 3 2 5 3 1 2 3 6 5 7 Today Tomorrow Future 6 Google Cloud Platform Complete hands-on labs: Find more training online 8 7 Fundamentals: Big Data Baseline: Data, ML, AI quest cloud.google.com/training 9 and Machine Learning google.qwiklabs.com 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 16 17 17 18 82

Big Data & Machine Learning Cloud OnBoard 1 Complete 10 hands-on labs free on Qwiklabs 2 1 by 30 April, and receive $200 in GCP credits 3 2 5 [Only for Cloud OnBoard Attendees] 3 6 5 7 1 Receive a follow up email after event 6 8 7 2 Create Qwiklabs account with the email 9 you used to register for Cloud OnBoard 8 10 Open your email and confirm account 9 3 11 Username 10 12 4 Return to Qwiklabs and log in 11 13 Password 12 5 Enroll in the Baseline: Data, ML, AI quest and 14 take your first lab! 13 15 14 6 Complete all 10 labs and we will send you an 16 email after 30 April with instructions to redeem 15 the $200 credits. Make sure you opt-in to receive 16 17 emails from Qwiklabs. 17 18 Big Data & Machine Learning Cloud OnBoard 1 To help you get started 2 1 3 Activate your voucher now for a free course worth $99! 2 5 3 Go to 6 1 5 https://www.coursera.org/voucher/CloudOnBoardML 7 6 8 7 9 Activate voucher and sign 8 2 up for a free account 10 9 11 10 12 11 Enroll in Serverless Data 13 Analysis with Google BigQuery 12 3 and Cloud Dataflow for Free 14 13 -Limited period offer! 15 14 Explore other Courses at 16 15 Coursera.org/Googlecloud 16 17 17 18 83

Big Data & Machine Learning Cloud OnBoard 1 2 1 Make Google Cloud certification your goal! 3 2 5 3 6 Find study guides, tips, practice 5 exams, and testing sites 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 cloud.google.com/certification 16 17 17 18 $3,000 Google Cloud Startup Program in credits Google Cloud is a perfect fit for launching and scaling your early-stage startup. What’s an eligible A special offer for Cloud Onboard Singapore attendees: startup? Visit https://goo.gl/bmJwwk before May 18th to enroll, and eligible • Raised no more than a Series A startups* receive $3,000 in Google Cloud and Firebase credits. • Less than 5 years old • Are located in our approved countries g.co/cloudstartups • Have not participated in [email protected] the Google Cloud Startup program before Confidential & Proprietary 84

Big Data & Machine Learning 1 Be part of the 2 3 GCP User Group SG Community! 5 6 7 Network, share, learn - 8 Connect all about Google Cloud 9 10 11 Learn Learn from leads, users, and tech experts 12 13 14 Gain access to the or bit.ly/gcpusergroupsg 15 Access Google Cloud team and the latest 16 capabilities 17 Big Data & Machine Learning Cloud OnBoard 1 2 1 Resources 3 2 5 3 6 5 7 6 8 Big data and machine learning blog https://cloud.google.com/blog/big-data/ 7 9 8 Google Cloud Platform blog https://cloudplatform.googleblog.com/ 10 9 11 10 Google Cloud Platform curated articles https://medium.com/google-cloud 12 11 13 12 14 13 15 14 16 15 16 17 17 18 85

1 2 3 4 5 6 7 https://cloud.google.com/training/ 8 9 10 11 12 13 14 15 16 17 18