With chess on my mind thanks to Netflix’s The Queen’s Gambit, this week I used PySpark to build a multi-label classifier to predict chess match outcomes from the Chess Game Dataset.


The process of configuring your local environment to use Apache Spark is arguably the most arduous part of this process. First you must install Docker, and then download the container image for PySpark, specialized for Jupyter Notebooks. If you already use Docker and your Docker Hub is up to date, the setup process will likely be expedited, but unfortunately my configuration process involved not only downloading Docker, but also…

Thoughts and opinions shared in this post are my own, and are unaffiliated with Amazon Web Services.

My first job out of university was as a Solutions Architect for a cloud computing platform. Solutions Architecture is at the crossroad of business, technology, and communication. It involves communicating with stakeholders to understand their business problem, and then helping them plan an implementation in the cloud.

During my time as an solutions architect, I learned a lot about cloud infrastructure, common business and technological problems that companies face today, and the ways that cloud computing has been used to revolutionize Machine Learning…

Want to help me improve the map? Fill out the 3-question survey here!

Original map: Average Jeans Color per State, 2018

This month I bought the first pair of blue jeans I’ve owned in a decade. I don’t know what inspired me to stray from my usual black-on-black, but regardless, I’m never looking back; I’ve been dressing like an androgynous Jerry Seinfeld every day since.

With denim on my mind, I found myself revisiting a meme I had seen on twitter months ago: Average Jeans Color Per State. In this post, I’m going to show you how I recreated this map with Python.

Data Collection

The first step in this…

The prison is a facet of society that is taken, unquestionably, for granted. Most people cannot fathom a world in which the prison does not exist, yet to most, it is nothing more than an abstract concept that exists “over there.” In truth, justification for the existence of the Prison Industrial Complex (PIC) is supported by little to no data; bail amount, sentence length, mandatory minimums, solitary confinement, parole terms, etc. are often arbitrarily chosen numbers that are not derived from any scientific methodology. …

Khyatee Desai

music lover, picture taker, aspiring data scientist based in nyc

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store