7 August 2024 - By Col Jung
In the 1980s, Wall Street discovered that physicists were great at solving complex financial problems that made their firms a bucket load of money. Becoming a “quant” meant joining the hottest profession of the time.
Twenty years later, in the late 2000s, as the world was on the cusp of a big data revolution, a similar trend emerged as businesses sought a new breed of professionals capable of sifting through all that juicy data for lucrative insights.
This emerging field became known as data science.
In 2018, while completing my PhD in modeling frontier cancer treatments, I transitioned from academia to industry and began working for one of the largest banks in Australia. (See my new analytics YouTube channel for more.)
I was joined by seven other STEM doctorate candidates from top universities across the country, each specializing in diverse fields such as diabetes research and machine learning to neuroscience and rocket engineering.
Despite being scattered across every corner of the company, we all eventually ended up in the bank’s big data division — a twist we still joke about…
This convergence of diverse expertise in the big data division highlighted the interdisciplinary nature of data science. Each of us brought unique perspectives and skills to the table, enriching our collective ability to tackle complex data challenges.
Over the years, the skill set required for data scientists has evolved significantly. Initially, proficiency in statistical analysis and programming languages like Python and R was sufficient. However, as the field has matured, the demands have expanded to encompass a broader range of competencies.
Python remains a cornerstone of data science due to its versatility and extensive libraries. However, modern data scientists are expected to go beyond basic scripting. They need to master advanced techniques in data manipulation, visualization, and machine learning. Libraries such as Pandas, Matplotlib, and Scikit-learn are essential tools in their arsenal.
Data engineering has become a critical component of the data science workflow. Data scientists must now understand how to design and maintain robust data pipelines, ensuring that data is clean, accessible, and ready for analysis. This involves working with big data technologies like Apache Spark, Hadoop, and cloud platforms such as AWS and Azure.
The rise of machine learning operations (MLOps) reflects the need for scalable and reliable deployment of machine learning models. Data scientists must be proficient in version control, continuous integration/continuous deployment (CI/CD) pipelines, and monitoring systems to ensure that models perform well in production environments. Tools like Docker, Kubernetes, and MLflow are becoming standard in the industry.
Generative AI represents the cutting edge of artificial intelligence, enabling the creation of new content, from text and images to music and code. Data scientists are now exploring the potential of GenAI to automate tasks, generate insights, and create innovative solutions. Understanding the principles of neural networks, GANs (Generative Adversarial Networks), and transformers is crucial for leveraging GenAI effectively.
The role of the modern data scientist is more comprehensive than ever before. They are expected to handle the entire data lifecycle, from data collection and preprocessing to model development, deployment, and monitoring. This end-to-end approach requires a blend of technical skills, domain knowledge, and business acumen.
Col Jung, reflecting on his journey, notes: "The landscape of data science is constantly evolving. Staying ahead requires continuous learning and adaptation. It's not just about mastering new tools and techniques, but also about understanding how to apply them to solve real-world problems."
The evolution of data science has transformed it into a multifaceted discipline that demands a diverse skill set. From Python scripting and data engineering to MLOps and generative AI, modern data scientists must be equipped to navigate a complex and rapidly changing landscape. As the field continues to grow, the ability to adapt and innovate will be key to success.