CSPB 3022 - Introduction to Data Science with Probability and Statistics

*Note: This course description is only applicable for the Computer Science Post-Baccalaureate program.ÌýAdditionally, students must always refer to course syllabus for the most up to date information.Ìý

  • Credits: 3.0Ìý
  • Prerequisites: CSPB or CSCI 1300ÌýComputer Science 1: Starting ComputingÌýwith minimum grade C-.
  • Minimum Passing Grade: C-
  • Textbook: "A Modern Introduction to Probability and Statistics" by Dekking et al. &Ìý"Introduction to Statistical Learning with Applications in R" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

[video:https://youtu.be/8Jw3JWoVK_o]

Brief Description of Course Content

Introduces students to the tools methods and theory behind extracting insights from data. Covers algorithms of cleaning and munging data, probability theory and common distributions, statistical simulation, drawing inferences from data, and basic statistical modeling.Ìý

Specific Goals for the Course

Ìý
  • Recognize the importance of data collection, identify limitations in data collection methods and other sources of statistical bias, and determine their implications and how they affect the scope of inference.Ìý
  • Use statistical software to summarize data numerically and visually, and to perform data analysis.Ìý
  • Have a conceptual understanding of the unified nature of statistical inference.Ìý
  • Apply estimation and testing methods to analyze single variables or the relationship between two variables in order to understand natural phenomena and make data-based decisions.Ìý
  • Model numerical response variables using a single explanatory variable or multiple explanatory variables in order to investigate relationships between variables.Ìý
  • Interpret results correctly, effectively, and in context without relying on statistical jargon.Ìý
  • Critique data-based claims and evaluate data-based decisions.Ìý
  • Data Exploration and Probability
  • Conditional probability and Bayes rule
  • Discrete/continuous random variables and computing with distributions
  • Joint distributions, covariance, correlation and sums of random variables
  • Using Jupyter python environment
  • Python tools for data science – NumPy and Pandas
  • Basic statistical estimation, random samples, bootstrap and resampling techniques, unbiased estimators and confidence intervals for measure data
  • T-Test
  • Linear Regression and classification
  • Maximum likelihood estimation and analysis of variance
Counting theory, Probabilities, Integration

Ìý Return to Course List