Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Getting Started with Data Science for non-programmers

Getting Started with Data Science for non-programmers

Data Science

Data Science

Main subjects

  • Data Analyzis
  • Data Science
  • Machine Learning
  • Deep Learning
  • AI (Artificial Intelligence)
  • Big Data

Self introduction

About you

https://code-maven.com/slides/data-science

  • (Full) Name
  • Workplace/Organization
  • Field of expertise (job title?)
  • Why are you interested in Data Science?
  • What are your hobbies?

Overview of this material

  • Install Tools (Python, Jupyter Notebook, NumPy, Pandas, Matplotlib, and Seaborn).
  • Data Source: Read in some CSV file (and maybe also a JSON file).
  • Make some simple filtering and transformation on the data.
  • Create some simple graphs.
  • Create some understanding of the data.

Tools

Visualization

Install Python on Windows

  • Anaconda
  • Download, install, accept the defaults (except: add Anaconda to PATH)

Install Python on Linux

Use the terminal

$ which python3

$ sudo apt-get install python3
$ sudo yum install python3

$ sudo apt-get install virtualenv
$ sudo yum install virtualenv

Install Python Mac OSX

Use the terminal.

Install Homebrew if you don't have it yet.

$ which python3

$ brew install python3
$ brew install virtualenv

Install Jupyter notebook and Python modules on Linux and OSX

$ virtualenv -p python3 ~/venv3
$ source ~/venv3/bin/activate
$ pip install jupyter pandas seaborn

Install Python modules on Windows (Anaconda)

  • Open the "Anaconda Prompt"
  • Type in conda list seaborn to see if seaborn is already installed
  • (it will show something like this, if it is installed):
# packages in environment at C:\ProgramData\Anaconda3:
#
# Name                    Version                   Build  Channel
seaborn                   0.9.0                    py37_0
  • Install seaborn by typing conda install seaborn

  • If all else fails you can also try pip install seaborn

Start Jupyter notebook

  • Windows: Anaconda Jupyter notebook
  • Linux, OSX: $ jupyter notebook

Reading CSV file (Planets)

Exercise 1

  • Download the planets.csv file and follow the steps above.

Seaborn

"""
Source : https://seaborn.pydata.org/introduction.html
"""

import seaborn as sns

sns.set()  # Apply the default default seaborn theme, scaling, and color palette. Optional.

tips = sns.load_dataset("tips")  # Load example dataset into Pandas DataFrame
#print(type(tips))

# print(tips)

plot = sns.relplot(
    x = "total_bill",
    y = "tip",
    col = "time",
    hue = "smoker",
    style = "smoker",
    size = "size",
    data = tips)

# print(type(plot))    # seaborn.axisgrid.FacetGrid
plot.savefig("tips.png")

Temperatures

Exercise 2

Salaries

Stack Overflow survey

Exercise 3

  • Download the Stack Overflow data set
  • Compare the average salary in different countries
  • Are there outliers? Can you remove them?
  • Analyze the date in the same way SO did, but restricted to a specific country.

Other Materials

Data Sources (Competitions)

Thank You