DataOps Summit: Streaming Real-time Telemetry With OmniSci and StreamSets

Randy Zwitch × October 31, 2019 ×

In this talk from the StreamSets DataOps 2019 conference, I provide an overview of the data pipeline for the OmniSci F1 Demo. Using StreamSets Data Collector in concert with Apache Kafka and OmniSciDB, you can create a full real-time data pipeline for telemetry data using only open-source components.

The talk outlines using the UDP listener for StreamSets to collect packets from the F1 2018 game, writing the packets to Kafka, reading from Kafka and using Groovy to parse the packets, and using the OmniSci JDBC driver to insert the data into one of nine OmniSciDB tables. With this workflow, you have a robust platform for accelerated analytics, using the power of GPUs for fast computation.

GitHub: https://github.com/omnisci/vehicle-telematics-analytics-demo

Speakerdeck: https://speakerdeck.com/omnisci/the-f1-demo-streaming-real-time-telemetry-using-apache-kafka-and-streamsets

ODSC webinar: End-to-End Data Science Without Leaving the GPU

Randy Zwitch × July 11, 2019 × DataScience

In this webinar sponsored by the Open Data Science Conference (ODSC), I outline a brief history of GPU analytics and the problems that using GPU analytics solves relative to using other parallel computation methods such as Hadoop. I also demonstrate how OmniSci fits into the broader GPU-accelerated data science workflow, with examples provided using Python.

Check out the video, grab the Jupyter Notebook from the odscwebinar repo and get started with OmniSci and GPU-accelerated data science!

PyData NYC 2018: End-to-End Data Science Without Leaving the GPU

Randy Zwitch × February 1, 2019 × DataScience

This talk is from October 2018, and so much has changed in the GOAI/RAPIDS ecosystem that it’s comical to see how much has changed! Regardless, the high-level concepts of how OmniSci works and the concepts behind GPU dataframes (then: pygdf, now: cudf) remain the same, so watching this talk still has value if you are interested in an end-to-end GPU workflow.

With the release of pymapd 0.7 a few days ago, getting started with GPU data science is just a matter of having an NVIDIA GPU and OmniSci Core (OSS) and a quick conda command to set up your environment:

conda install -c conda-forge -c nvidia -c rapidsai -c numba -c defaults pymapd cudf python=3.6

So check out the video, grab the Jupyter Notebook from the pydatanyc2018 GitHub repo and get started with GPU-accelerated data science!

« Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Next »