The DAIML Engineer

Look, I hate to be the guy introducing a new term, especially one as dumb as DAIML Engineer. But times are changing, and I’ve never been good at naming things.

Engineers are being expected to do more. Companies are creating significant financial value with significantly smaller teams. As this progression continues to play out, engineers will be expected to widen their toolbelt and increase their scope of responsibility.

DAIML Skillset

If it wasn’t obvious, DAIML = Data, Artificial Intelligence, and Machine Learning Engineer. This is someone who possesses the skills to build robust data pipelines, develop Generative AI applications, and deploy ML models into production following MLOps best practices.

I have to call out that I’m using AI to mean Generative AI. We, of course, all know that Generative AI is a specialized type of Deep Learning which in turn is a specialized type of Machine Learning which is a form of Artificial Intelligence. “DGAIML” didn’t have the same ring to it, so just assume AI = Gen AI.

To get a sense of the skills possessed by each of these groups, here’s a non-exhaustive list:

  • Data

    • Data modeling

    • Data ingestion

    • Data transformation logic (SQL, PySpark, etc..)

    • Orchestration

  • AI 

    • RAG

    • Fine-tuning

    • Safety Guardrails

    • Evals

  • ML

    • Feature Engineering

    • Modeling Frameworks

    • Model Evaluation

    • Model Serving

  • Engineer (core skills that span all groups)

    • Cloud computing

    • DevOps

    • Infrastructure as Code

    • Python

You may balk at this list and think there’s no way an individual could know all these topics, but I disagree. It’s absolutely feasible for an engineer to know all these topics and more - and many already do. I’m not saying they’re experts in any of these. The DAIML engineer’s skill set is wide and often deep in one of these categories (known as the T-shaped skillset).

Why DAIML?

I’d argue this skillset has already been developing for some time in the data space. It’s pretty common for a Data Engineer to also build pipelines for batch inference of a machine learning model. Or for an AI Engineer building a RAG system to build a data pipeline that converts unstructured data into embeddings.

However, there have been a number of advancements that I believe will make the DAIML engineer more commonplace going forward.

  1. Platform capabilities - it’s easier than ever to go to the cloud platform of your choice and spin up Databricks some resources to start on any of the concepts listed above in no time. These platforms offer so many capabilities out of the box that a lot of time, all you have to do is stitch a few things together and you’re done. When things are easier, you can do more things.

  2. Generative AI tooling - like it or not, it’s hard to deny its ability to make engineers more productive. I’ve seen these tools slowly win the adoption of engineers - starting from the optimistic engineer who just wants to experiment with new tech all the way to the grizzled veteran who knows more about database design than the entirety of Zach Wilson’s LinkedIn audience. These tools not only enable engineers to get work done faster, but also learn new concepts and codebases faster.

  3. Evolving company shape - this is tangential to Generative AI, but there’s been a sudden boom of companies doing millions in annual revenue with one or two employees. While I take these examples with a grain of salt, I do believe we’re going through a paradigm shift where engineering teams will be smaller and therefore expected to do more.

All that said, I don’t believe every company will expect this broad skillset from their Data and ML Engineers. Generally, the larger the company, the more likely there is to be a specialized role where you can focus on one of these areas. But I believe the majority of companies will shift away from specialized Data and ML engineers over time and move towards this concept of a DAIML Engineer that can handle the end-to-end data lifecycle.

My Goal Here

I spent five years as a data engineer, three years building ML systems, and now I’m in a consulting role - solidly living the DAIML life.

My goal with this blog is simple:

  1. Work on solidifying the thoughts in my head and gain the confidence to share them with the world

  2. Help other engineers working in this space to expand their knowledge and grow in their careers

You can expect to see content on comparing platforms for DAIML engineering, exploring new tools and techniques, reviewing the latest news, and more. You won’t see regurgitated AI-hype posts. I’m focused on real engineering solving real problems and not vibe-coding a new SaaS to schedule a post on Twitter X Bluesky.

Feel free to subscribe if you want to follow along. I’ll eventually start summarizing the posts from each week and sending them as a newsletter if that’s more your thing.