This is website for the "Production Scale Big Data Implementation" course taught in Purdue's Krannert BAIM program. The course syllabus can be found here.


Knowing how to do ad hoc analytics on your laptop is one thing. Knowing how to productionize analytics in the real world is something completely different. If our analytics, machine learning, visualization, and statistics stay on our laptops, they can’t produce long term value in an organization.

In this class, we will explore how companies are implementing and utilizing analytics at production scale. To this end, we will work hands on with a real world data pipeline for object detection. This data pipeline is a replica of a data pipeline utilized by a large enterprise company to detect threats in data from motion detectors. We will work as a team to implement, discuss, and document this pipeline, and, at the end of the course, you will have a practical understanding of analytics in the real world.

The data pipeline that we will implement together will include the following elements:

  • Data Storage and Organization - We will utilize common cloud-based object stores to store the data that we will be analyzing.
  • Analyses and Aggregation - We will leverage the latest tooling for detecting objects in image data and aggregating statistics, including TensorFlow.
  • Analysis Orchestration - We will utilize projects from the container ecosystem (Docker and Kubernetes) to schedule and manage our workloads.
  • Distributed Data Pipelining - We will construct multi-stage workflows that are automatically triggered and process streaming data.


(Slides will be posted here after each class)