Session: Scikit-learn tutorial

Target-audience:
Advanced

Machine Learning is the branch of computer science concerned with the development of algorithms which can learn from previously-seen data in order to make predictions about future data, and has become an important part of research in many scientific fields. This set of tutorials will introduce the basics of machine learning, and how these learning tasks can be accomplished using Scikit-Learn, a machine learning library written in Python and built on NumPy, SciPy, and Matplotlib. By the end of the tutorials, participants will be poised to take advantage of Scikit-learn’s wide variety of machine learning algorithms to explore their own data sets.

I am planning to cover a subset of the material from the scikit-learn tutorial given at Pycon 2015.

Install

This tutorial requires the following packages:

  • Python version 2.7 or 3.4+
  • numpy version 1.8 or later: http://www.numpy.org/
  • scipy version 0.15 or later: http://www.scipy.org/
  • matplotlib version 1.3 or later: http://matplotlib.org/
  • scikit-learn version 0.15 or later: http://scikit-learn.org
  • ipython/jupyter version 3.0 or later, with notebook support: http://ipython.org
  • seaborn: version 0.5 or later, used mainly for plot styling

The easiest way to get these is to use the conda environment manager. I suggest downloading and installing miniconda. The following command will install all required packages:

$ conda install numpy scipy matplotlib scikit-learn ipython-notebook

Alternatively, you can download and install the (very large) Anaconda software distribution, found at https://store.continuum.io/.

The tutorial will be based on Jupyter notebooks. In order to follow along, you will need to either

  • clone this github repo: https://github.com/lesteve/sklearn_tutorial
  • if you don't have git installed, you can get a zip archive from https://github.com/lesteve/sklearn_tutorial/archive/master.zip