Perform research and design of algorithmic models

Data Scientist

About

This unit is about performing research and designing a variety of algorithmic models for internal and external clients

Compulsory NOS Code SSC/N8104 NSQF Level 7

Scope

define hypothesis, select model, prototype and design

Define hypothesis

identify the objective of the analysis
develop a hypothesis based on the objective of the analysis
identify suitable libraries, packages, frameworks, applications to address the objective

Select model

identify mode of learning, i.e. supervised or unsupervised
conduct research on existing statistical models to evaluate fitment with the objective
depending on the use case, identify if neural networks or deep learning models can be built
optimize the existing statistical models as per need
identify suitable statistical models on the basis of data volumes and key variables
define connectors or combinations of key variables for each statistical model

Prototype and design

determine and collect the training data
design and prototype algorithmic model
identify and resolve overfitting or underfitting of algorithmic model
identify and resolve residual and dispersion errors with data
define data flows such as human-in-the-loop constraints required to reinforce algorithmic models
define and quantify success metrics for the algorithmic model
create documentation on designed algorithmic models for future references and versioning
retrain datasets that have been used for supervised learning on a continuous basis
validate designed models using appropriate tools and processes
Iterate the process to fine-tune the model til the desired quality of output or performance is achieved

Required Knowledge & Understanding

Technical Skills

how to develop experimental and analytical plans for data modeling
the use of strong baselines
how to accurately determine cause and effect relations
different probability theory concepts such as probability distributions, statistical significance, hypothesis testing and regression
different Bayesian thinking concepts such as conditional probability, priors and posteriors, and maximum likelihood
strong research experience in deep learning, reinforcement learning and other machine learning algorithms and their usage
different programming languages that can be used to design algorithmic models such as python, ruby, C, java, c++ or c#
different use cases and the suitability of various algorithmic models to address them
how to build and test a hypothesis
when to use supervised or unsupervised learning
how to evaluate data volumes and key variables
how to define combinations of key variables
how to optimize overfitting or underfitting of algorithmic models
how to optimize residual and dispersion errors in algorithmic models
how to define data flows such as human-in-the-loop constraints required to reinforce algorithmic models
different cloud or distributed computing platforms such as AWS, Azure, Hadoop, their affiliated services and how to use these
how to identify and refer anomalies in data
how to work on various operating systems such as linux, ubuntu, or windows

Soft Skills

Analytical Thinking

impact analysis of the various actions performed and disseminating relevant information to others. analyze data, models and understand its implications on business performance

Attention to Detail

check your work is complete and free from errors