Lecture 2

CPaT

Computing Practice and Theory

WordPress

Topics for the quarter

  • What are Concepts, Instances, Attributes
  • Knowledge representation
  • Rule Lists, Trees, Linear Models
  • Training a machine learning system
  • Cleaning and Transforming Data
  • Bayesian Networks
  • Clustering
  • Neural Networks
  • Regression

Readings for this week

  • Chapter 1 in Witten

Additional resources

  • Artificial Intelligence, Russell and Norvig
  • Reinforcement Learning, Sutton and Barto
  • Principles of Computer Security Lab Manual, Nestler, White, Conklin — not the one by Conklin and White
  • Seattle networking environment: seattle.cs.washington.edu

Algorithms

  • rules, P. 6, there are 24 rows, how many possible functions are there from a set of 24 instances to a set of 3 outcomes?
  • In general, the data is noisy
  • decision list vs decision tree. P. 13
  • generalization

Data

  • What kinds of data are there?
  • Why data needs to be cleaned/preprocessed: missing values, inconsistent values
  • Summarizing data: mean, standard deviation, min, max, quartiles
  • attribute subset selection: finding a minimum set of attributes that adequately describes the concept.