Big Data & Machine Learning 1 2 Agenda 3 5 Message-oriented architectures 6 7 Serverless data pipelines 8 GCP Reference Architecture 9 10 11 12 13 14 15 16 17 Big Data & Machine Learning Cloud OnBoard 1 2 Dataflow offers NoOps data pipelines in Java and Python 1 3 Open-source API (Apache 2 p = beam.Pipeline(options=options) Beam) can be executed on 5 Input 3 Flink, Spark, etc. also 6 5 Read lines = p | beam.io.ReadFromText(‘gs://…’) 7 6 8 Transform 1 traffic = lines | beam.Map(parse_data).with_output_types(unicode) 7 9 8 Transform 2 | beam.Map(get_speedbysensor) # (sensor, speed) Map 10 9 11 | beam.GroupByKey() # (sensor, [speed]) Group-By 10 Group 12 11 Transform 3 | beam.Map(avg_speed) # (sensor, avgspeed) Reduce 13 12 14 | beam.Map(lambda tup: '%s: %d' % tup)) 13 Transform 4 15 14 Write output = traffic | beam.io.WriteToText(‘gs://...]’) Each of these steps is run 16 15 in parallel and autoscaled 1716 Output p.run(); by execution framework 17 18 73

Google Cloud Manual - Page 75 Google Cloud Manual Page 74 Page 76