Data Science in the Age of Cloud Computing

Khyatee Desai
DataDrivenInvestor
Published in
4 min readNov 9, 2020

--

Thoughts and opinions shared in this post are my own, and are unaffiliated with Amazon Web Services.

My first job out of university was as a Solutions Architect for a cloud computing platform. Solutions Architecture is at the crossroad of business, technology, and communication. It involves communicating with stakeholders to understand their business problem, and then helping them plan an implementation in the cloud.

During my time as an solutions architect, I learned a lot about cloud infrastructure, common business and technological problems that companies face today, and the ways that cloud computing has been used to revolutionize Machine Learning and Artificial Intelligence.

What is Cloud Computing?

Cloud computing is the delivery of on-demand IT resources, over the internet, using a pay-as-you-go pricing model.

You can think of cloud computing in a similar manner to how you think about utilities such as electricity; all of the minutiae regarding how electricity gets from point A to point B is abstracted away from you. That is to say, the infrastructure, maintenance, security, and fault tolerance of the power plant is not something you need to worry about, because your electric company manages those things on your behalf.

When you flip on a light switch, you begin to receive electricity, you pay for the amount of electricity you use, and you’re able to flip the switch off whenever you want. Cloud computing follows the same delivery model as our electric company example, but instead of delivering electricity, cloud providers deliver IT resources such as compute, storage, databases, networking, and security capabilities.

What is The Cloud?

The foundational unit of the Cloud is the server. A server is any piece of hardware or software that can interact with programs or other devices. Your large clunky desktop computer from 1997 is a server, your iPhone is a server, even your Peloton is a server.

In reality, the internet is nothing more than all of the servers on the planet, constantly connecting to one another an exchanging information. When I store my iPhone pictures in the iCloud, it simply means they are being stored on servers managed by Apple, rather than on my iPhone.

AI and ML in the Cloud

Beyond just servers, cloud computing also encompasses storage, networking, security, databases, and beyond. One particular space in cloud computing that has picked up recently is Artificial Intelligence and Machine Learning in the cloud. With the ability to harness the power of thousands of concurrently working machines to complete a task, data scientists are able to perform complex training and testing in a matter of seconds rather than hours or days.

One such cloud service built for the purpose of machine learning in the cloud is Amazon Sagemaker. Sagemaker is a fully managed service that allows data scientists to build, train, and deploy machine learning models in the cloud. Sagemaker utilizes cloud based notebooks similar to Jupyter, comes with pre-built algorithms and models, 1-click hyperparameter tuning, and much more.

Amazon Sagemaker Workflow

Another set of services that are rapidly gaining traction are AI services such as Amazon Rekognition, Amazon Comprehend, Amazon Polly, and Amazon Lex. Under the hood, these services are utilizing the same core compute infrastructure of the cloud: virtualized servers. They abstract away all of the complex machine learning work, to allow the end user to utilize incredible artificial intelligence capabilites, such as image and video recognition, text-to-speech communication, and natural language processing.

The Amazon Web Services AI/ML Stack

The Good and the Ugly

With technological breakthrough inevitably comes the potential for abuse and infringement on personal autonomy. Cloud computing has enabled the development of widespread surveillance through tools like facial recognition that are used to identify and detain people, disproportionately of minority groups, at alarming rates.

The ability to harness the compute and storage power of the cloud also allows for the ability to mine and analyze Big Data — data at the petabyte, and exabyte scale. This is common practice among data science teams at social media companies, where algorithms learn from users’ personal data, and then create the most efficient model to monopolize their attention. This process of using data to create a product that is addictive, manipulative of the users emotions, and thus potentially harmful, rides a fine line where morality is concerned.

Cloud computing has revolutionized the way we exchange information. Conveniences we take for granted today such as Netflix and Twitch are able to function thanks to immense technological feats that people would not have dreamed of just 50 years ago. However it is important to remember that as the technology grows and evolves at an exponential rate, so must the manner in which we regulate and evaluate the impact of the technology on the greater good.

--

--