## Description

__Foundations of Statistics for Data Science:__

- Understanding the properties of an attribute: Central tendencies (Mean, Median, Mode); Measures of spread (Range, Variance, Standard Deviation); Basics of Probability Distributions; Expectation and Variance of a variable
- introduction to random variables, probability theory, conditional probability, and to a most powerful algorithm in probability theory – Bayes Theorem.
- Understanding the properties of an attribute: Central tendencies (Mean, Median, Mode); Measures of spread (Range, Variance, Standard Deviation); Basics of Probability Distributions; Expectation and Variance of a variable
- Discrete probability distributions: Bernoulli, Binomial, Geometric, Poisson and properties of each.
- Continuous probability distributions: Exponential; Special emphasis on Normal distribution; t-distribution
- how to conduct a statistical hypothesis testing and will be introduced to various methods such as chi-square test, t-test, z-test, F-test and ANOVA methods in detail.

** **

__R & Python:__

- R and Python basics, understanding data structures, functions, control structures, data manipulations, date and string manipulations, etc.
- Pre-processing Techniques: Binning, Filling missing values, Standardization & Normalization, type conversions, train-test data split, ROCR1
- Hands-on implementation of all the pre-processing techniques in R and Python.

** **

** **

** **

** **

** **

** **

** **

**Machine learning Models:**

**Machine learning Models:**** **

**KNN Model:**

**KNN Model:**

- Computational geometry; Voronoi Diagrams; Delaunay Triangulations
- K-Nearest Neighbor algorithm; Wilson editing and triangulations
- Aspects to consider while designing K-Nearest Neighbor
- Hands-on example of K-Nearest Neighbor using R
- Collaborative filtering and its application areas

**SVM**

**SVM**

- Support Vector Machines (SVM) is the most elegant technique developed in the last two decades. You will learn about this extremely powerful, cutting-edge technique on this day.
- Linear learning machines and Kernel space, Making Kernels and working in feature space
- Demonstrate the working of SVM classification and regression problems using a business case in R.

**Decision Trees**

**Decision Trees**

in machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any one single algorithm. The basics of ensembles are bagging & boosting that will be covered in detail and later progress with machine learning methods that use either or both approaches to build ensemble models.

- Bagging & boosting and its impact on bias and variance
- 0 boosting
- Random forest
- Gradient Boosting Machines and XGBoost which are the very popular winning recipe of data science competitions.
- Architecting ML solutions

**Clustering:**

**Clustering:**You will learn the most commonly used unsupervised learning algorithm – Clustering.

- Different clustering methods; review of several distance measures
- Iterative distance-based clustering;
- Dealing with continuous, categorical values in K-Means
- Constructing a hierarchical cluster, K-Medoids, k-Mode and density based clustering to handle different data types in practice.
- Test for stability check of clusters.
- Hands-on implementation of each of these methods will be conducted in R.
**Business case analysis**- The objective of this session is to provide an application and end-to-end view of solving a Data Science problem and defend your analysis.
- We provide a business case in advance in which you will be required to apply all the data pre-processing steps and prepare the input for one or more ML algorithms learnt thus far.
- The lab is designed such that everyone participates in the discussion, design the solution approach for the given business case and defend the analysis approach.

** Text Mining:**

**Text Mining:**

- Introduction to the Fundamentals of information retrieval;
- TFandIDF
- Thinking about the math behind text; Properties of words; Vector Space Model

- Matrix factorization: SVD
- Text Indexing
- Inverted Indexes
- Boolean query processing
- Handling phrase queries, proximity queries o LSA

** **