Data Scientist / Data Engineer

As part of my assignment, I have been researching a couple of hot roles in the digitization or digital transformation sphere, that of Data Engineer and Data Scientist. This fascinating trek down the rabbit hole has been quite illuminating.

One of the backbones of the digital transformation is the advent, and rise of this little thing called “Big Data”. With Software taking over the world, compute power becoming ubiquitous and practically free, just about every part of modern life creates data that can be leveraged in myriad ways. From tracking your browsing history so that if you research Chicago Deep Dish pizza pans, the next time you log into Facebook, you will see ads from Amazon or Sur la Table for hard anodized, non-stick deep dish pizza pans.

Creepy, but it has become expected. But when you think about how that happens, you begin to get at the scope of big data, and then it is natural to think of the roles that play a part in this realm.

Data Engineer

The first key role is the Data Engineer. Somewhat analogous to the Network engineer from the late 1990’s and the early ‘oughts, this is someone who really gets down and dirty. They create the systems that capture the data, store it, and do some rudimentary manipulations. This is the world of Hadoop, MongoDB, NoSQL, and a seemingly endless stream of buzzwords. Really good data engineers not only create and maintain the repositories of big data, but also work to optimize access, improve speed, and like in the network engineering days, made sure that the “speeds and feeds” were appropriate for the task at hand.

This is plenty challenging, as they often need to connect to a plethora of data sources, including order processing systems, e-commerce applications, web API’s, even physical devices such as IoT data loggers. They are often required to groom, transform, log, and index potentially massive data sets. Think about a specific case: Uber, the ride sharing or TN (transportation network) company. Think about the number of riders they have a day. The number of drivers. The location of their drivers. Each transaction. Each time a user clicks on the app. All of this data (and I mean ALL) is gathered, groomed, stored, collated, indexed, and made available to their internal systems.

The data engineer may not be an expert in deep analytics that are run on the data, but make no mistake about it, their contribution to the success of Uber is not to be trivialized.

Data Scientist

The Data Scientist is a breed apart. Typically a classically educated STEM (science, technology, engineering and mathematics) person, they almost always have a graduate degree. They often come from one of the hard sciences where large data set modeling is a common skill (think astrophysics, climate modeling, complex reaction dynamic modeling and you get the idea). They are well versed in programming with Python or other analytical languages (one surprise that I found was that I expected Matlab from The Mathworks to be requested, but it is missing from every job requisition I have reviewed. Odd), R – the statistical language, data visualization experience, and an alphabet soup of competencies in machine learning, artificial intelligence.

Many of these people are expected to have PhD’s in Computer Science, and to have extensive experience in creating models to extract value from the underlying data set(s).


Clearly, these two co-dependent roles can’t exist in a vacuum. Both are crucial to the digitization strategy of the modern business, and how to provide a superior customer experience.

As a part of the world of business transformation, the evolving roles, and importance of classic information technology is making it ever more important to understand the stakeholders, their needs, and the overall business value that the solutions bring. The capabilities are more important than the speeds and feeds. Knowing how the stakeholders work is key to our success

%d bloggers like this: