I am a PhD trained data scientist with industry experience developing and deploying predictive models, optimization services, data pipelines, and more. My main interests are:
- Data science workflows on modern, cloud native infrastructure (multi-tenant clusters orchestrated, e.g., by Kubernetes)
- Reproducibility and provenance for complex, distributed data pipelines
- Machine learning interpretability
- Portability and open standards for neural network architectures
- Leveraging modern language primitives (e.g., in languages like Go) for machine learning and data science
On the Web:
- Lead Data Scientist and Advocate, Pachyderm, 2016-Present
- Freelance Data Scientist/Engineer, 2016-present, working with clients including:
- Data Scientist, Telnyx, 2015-2016
- Data Science Instructor, Thinkful, 2015-2016
- Technical Specialist, Marshall, Gerstein, and Borun, 2012-2015
- Research Assistant, Purdue University, 2008-2015
- High Performance Computing Intern, NCAR, 2008
- Languages: Go, Python, Julia, Matlab, Mathematica, Latex
- Machine Learning: Classification, Clustering, Regression, Anomaly Detection, Recommendation, Time Series
- Big Data: Pachyderm, Spark, ELK, Cassandra, Redshift, S3
- Databases: Postgres, MySQL, BoltDB, RethinkDB, SQLite, MongoDB, Cassandra, Elasticsearch, etcd
- Visualization: Matplotlib, Seaborn, Bokeh, Dashing, Kibana, Domo, gonum/plot
- Devops/Misc.: Docker, Kubernetes, Jenkins, Ansible, Travis, Git, Jupyter, RabbitMQ
A next gen telephony company wanted an efficient pricing service that would predict operating costs dynamically over time. I developed this service (initially in python, eventually in Go), including a custom probability based model, and improved cost predictions from 45% error to less than 10% error.
A company developing a scheduling application wanted a recommendation service for their users. I developed this service to integrate in their current infrastructure (React, RethinkdB, Go, REST), which involved asynchronously processing recommendation requests from all their users with custom machine learning algorithms.
A startup wanted immediate visibility into real-time revenue and profit. I built a service that processed data streaming into Redshift and logged aggregates to Logstash. This data was then immediately searchable via Elasticsearch and visualized via Kibana and Dashing.