## Why Python?

For most people, I recommend getting started with R, because the tools in R for exploratory data analysis and visualization are easier and more comprehensive than the tools in Python. However, if you have a computer science background or if you want to jump on the fast track to high-performance machine-learning, then you might want to start with Python. Python is an awesome programming language, because it is easy to write, readable, and well-documented, and it is very fast if you do it right. It also has extensive libraries for scientific computing, stats, and machine learning.

## The Python scientific stack

Python has a comprehensive and integrated scientific computing stack that has an incredible combination of performance, ease of use, and depth. It is made up of several libraries and utilities, including:

*numpy*: Fast and easy array computations and manipulations. Includes “broadcasting” and “fancy” indexing, which give Python arrays some of the simple syntax of R vectors.*scipy*: Scientific computing functionality, including optimized matrix operations and data structures, numerical optimization, calculus functions, etc.*pandas*: Data frames in Python. A little more complicated than R data frames, but with much better performance (more like R data*tables*than data frames in practice)*scikit-learn*: Machine learning in Python. Fast and easy to use.*Jupyter (previously IPython Notebook)*: A browser-based notebook for scientific computing in Python and other languages.*Matplotlib*: A nice plotting library. Many people use Seaborn as a user-friendly alternative.

## A couple of quick tutorials to get started

Kaggle has a great tutorial series on getting started with Python. It takes you through the basics of loading data, manipulating data, transforming data, and building a random forest machine learning model. It uses the Titanic survival dataset to walk you through all of these skills in a practical case study. It is probably best to start at part I, although it is probably OK to skip part I if you are impatient.

- Getting Started with Python Part I: Covers basic CSV loading and
*numpy*arrays - Getting Started with Python Part II: Introduces the extremely useful
*pandas*data frame concept - Getting Started with Random Forests: Introduces the
*scikit-learn*library and shows you how to build a basic machine learning model

The Kaggle tutorials should get you started and start building your confidence. After these, I recommend the scikit-learn quick start tutorial. This gives a bit of the bigger picture on scikit-learn and the concepts of machine learning. The scikit-learn documentation taught me so much, and I highly recommend it. The scikit-learn website also contains lots of code examples, although the examples can seem a bit complex at first.

## Where to go from here

If you really sit down and work through these tutorials, you will be ready to try some more examples on your own. I recommend checking out a couple more straightforward Kaggle competitions like Give Me Some Credit. *Machine learning is a craft*, and it takes practice to get good at it, but the payoff can be huge.

## Leave a Reply