Contact Us

Perform exploratory data analysis as per specifications

Perform exploratory data analysis as per specifications
Data Scientist

About

This unit is about using a variety of techniques to perform exploratory analysis to describe and summarize data for internal and external clients.

Scope

Define the dataset, Summarize and optimize the dataset

Define the dataset
  • identify the data types for each variable of the dataset
  • identify the key variables required for modelling or analysis
Summarize and optimize the dataset
  • use statistical techniques to summarize the key variables in the dataset
  • describe summary statistics for key variables using graphical formats
  • perform dimension reduction to optimize the variables in the dataset, if required
  • define the correlation factors using clustering and other techniques
  • validate data using appropriate tools and processes
  • repeat the analysis iteratively to arrive at optimal results
  • validate the final output in consultation with the relevant stakeholders
  • gain inferences from the final output of the data analysis
  • develop a hypothesis model to explain the discovered inferences
  • evaluate the results of the analysis and define business outcomes
  • define prescriptive actions based on the defined business outcomes

Required Knowledge & Understanding

Technical Skills
  • the difference between various types of data, For example, qualitative vs quantitative data, discrete vs continuous data, processed vs unprocessed data
  • different statistical analysis software, packages, libraries and tools that can be used to summarize data such as R, Numpy, Statsmodels, or Pandas
  • different functions to summarize variables across different data types such as integer, float, or character
  • different graphical formats to describe summary statistics
  • different methodological approaches for dimension reduction such as PCA, LDA, or NMF
  • different methodological approaches for defining correlations between variables such as the scatter diagram method, correlation coefficients, method of least squares
  • multivariate visualizations, for mapping and understanding interactions between different fields in the data
  • how to make inferences from analysed data and explain it using a hypothesis model
  • different types of prescriptive actions
  • how to identify and refer anomalies in data
  • how to work on various operating systems such as linux, ubuntu, or windows
Soft Skills
Analytical Thinking
impact analysis of the various actions performed and disseminating relevant information to others. Analyze data and understand its implications on business