• support@conveytechlabs.com




Course Outline:

1.Introduction Data Science


  • CRISP – DM Framework
  • Technology stack for Data Science

2.RDBMS (Oracle ) with SQL

  • SQL Introduction (DDL, DML)
  • Joins
  • Views, Triggers and Procedures
  • Advanced SQL for Analytics

3.Python programming

  • Variables and data types
  • Standard I/O
  • Operators
  • Control flow (if else, for, while, break and continue)
  • Data Structures ( Lists, Tuples, Sets, Dictionary and Strings)
  • Functions ( recursive, lambda functions, map, filter and reduce)
  • Modules and Packages
  • Working with Python Libraries ( OS, datetime, system)
  • Exception Handling
  • Object Oriented Programming ( Classes, Objects, oops )

4.Exploratory Data Analysis

  • Basic statistics
  • Hypothesis testing
  • Data distributions (Central Limit Theorem )
  • Introduction to visualization
  • Plotting with Matplotlib and seaborn
  • Introduction to Tableau for Reporting
  • Percentiles and Quartiles
  • IQR, box-plot and whiskers
  • Bar Charts, Pie Charts, Line and Pair charts
  • Uni variate, bi variate and multi variate analysis
  • EDA case study

5.Python For Data Science

  • Introduction to numpyand operations on numpy
  • Getting started with Pandas and operations on pandas
  • Sampling techniques
  • Data Preprocessing with Pandas (excel, csv and pdf)
  • Missing value analysis ( NULL value treatment)
  • Data Normalization and standardization
  • Outlier analysis and treatment
  • Web scrapping using beautifulsoup, word clouds

6.Machine Learning with Python

a) Linear Regression:

  • Algebra for regression
  • Assumptions  of Linear regression
  • Multiple regression
  • Feature Selection ( VIF and P-statistic)
  • Model building
  • Parameter tuning for regression
  • Model validation ( Accuracy, Variance, R-squared)
  • Bias variance tradeoff
  • Case study on regression

b) Logistic Regression:

  • Logistic regression intuition
  • Sigmoid function, mathematics behind logistic regression
  • Feature engineering and collinearity
  • Regularization (L1 and L2) and parameter tuning
  • Case study on logistic regression

c) Decision Trees:

  • Decision trees introduction
  • Homogeneity, GINI index and Information gain
  • Building decision trees and parameter tuning
  • Truncating and Pruning trees
  • Random forest (ensembles)
  • OOB (out of bag error)
  • Cross validation, bagging  and boosting (XG boost, ada boost and GBM)
  • Case study on decision tree and random forest

d) K nearest neighbor for classification

e) Model deployment with PMML, H5 and pickle

Live Traffic

Live Traffic Feed