Big Data & Machine Learning Cloud OnBoard 1 2 1 Same code does real-time and batch 3 2 5 options = PipelineOptions(pipeline_args) 3 6 options.view_as(StandardOptions).streaming = True 5 7 p = beam.Pipeline(options=options) 6 lines = p | beam.io.ReadStringsFromPubSub(input_topic) 8 BigQuery 7 traffic = (lines 9 | 8 10 Cloud beam.Map(parse_data).with_output_types(unicode) 9 Pub/Sub | beam.Map(get_speedbysensor) # (sensor, 11 10 Cloud speed) 12 Cloud | beam.WindowInto(window.FixedWindows(15, 0)) 11 Dataflow Pub/Sub | beam.GroupByKey() # (sensor, [speed]) 13 12 | beam.Map(avg_speed) # (sensor, avgspeed) 14 | beam.Map(lambda tup: '%s: %d' % tup)) 13 Cloud 15 Storage traffic | beam.io.WriteStringsToPubSub(output_topic) 14 Cloud 16 15 Storage p.run() 1716 17 18 Big Data & Machine Learning Cloud OnBoard 1 2 1 Dataflow does ingest, transform, and load; consider using it 3 2 instead of Spark 5 3 6 5 7 6 8 7 9 8 10 9 11 10 12 11 13 12 14 13 15 14 16 15 1716 17 18 74
Google Cloud Manual Page 75 Page 77