Getting Started with Data Science for non-programmers

Data Science

source

Main subjects

Data Analyzis
Data Science
Machine Learning
Deep Learning
AI (Artificial Intelligence)
Big Data

Self introduction

Gábor Szabó @szabgab
Helps organizations improve their development by creating faster feedback loops
Training courses
Code Maven Workshops
Code Maven in English
Code Maven in Hebrew
Code Mavens Meetup group

About you

https://code-maven.com/slides/data-science

(Full) Name
Workplace/Organization
Field of expertise (job title?)
Why are you interested in Data Science?
What are your hobbies?

Overview of this material

Install Tools (Python, Jupyter Notebook, NumPy, Pandas, Matplotlib, and Seaborn).
Data Source: Read in some CSV file (and maybe also a JSON file).
Make some simple filtering and transformation on the data.
Create some simple graphs.
Create some understanding of the data.

Tools

Visualization

Install Python on Windows

Anaconda
Download, install, accept the defaults (except: add Anaconda to PATH)

Install Python on Linux

Use the terminal

$ which python3

$ sudo apt-get install python3
$ sudo yum install python3

$ sudo apt-get install virtualenv
$ sudo yum install virtualenv

Install Python Mac OSX

Use the terminal.

Install Homebrew if you don't have it yet.

$ which python3

$ brew install python3
$ brew install virtualenv

Install Jupyter notebook and Python modules on Linux and OSX

$ virtualenv -p python3 ~/venv3
$ source ~/venv3/bin/activate
$ pip install jupyter pandas seaborn

Install Python modules on Windows (Anaconda)

Open the "Anaconda Prompt"
Type in conda list seaborn to see if seaborn is already installed
(it will show something like this, if it is installed):

# packages in environment at C:\ProgramData\Anaconda3:
#
# Name                    Version                   Build  Channel
seaborn                   0.9.0                    py37_0

Install seaborn by typing conda install seaborn
If all else fails you can also try pip install seaborn

Start Jupyter notebook

Windows: Anaconda Jupyter notebook
Linux, OSX: $ jupyter notebook

Reading CSV file (Planets)

Exercise 1

Download the planets.csv file and follow the steps above.

Seaborn

examples/seaborn_tips.ipynb

"""
Source : https://seaborn.pydata.org/introduction.html
"""

import seaborn as sns

sns.set()  # Apply the default default seaborn theme, scaling, and color palette. Optional.

tips = sns.load_dataset("tips")  # Load example dataset into Pandas DataFrame
#print(type(tips))

# print(tips)

plot = sns.relplot(
    x = "total_bill",
    y = "tip",
    col = "time",
    hue = "smoker",
    style = "smoker",
    size = "size",
    data = tips)

# print(type(plot))    # seaborn.axisgrid.FacetGrid
plot.savefig("tips.png")

Temperatures

Exercise 2

Download another CSV file and analyze that. (Search for public data sets)
Create some nice graphs
UN
UCR - Unified Crime Reporting
Tons of others

Getting Started with Data Science for non-programmers

Getting Started with Data Science for non-programmers

Getting Started with Data Science for non-programmers

Data Science

Main subjects

Self introduction

About you

Overview of this material

Tools

Visualization

Install Python on Windows

Install Python on Linux

Install Python Mac OSX

Install Jupyter notebook and Python modules on Linux and OSX

Install Python modules on Windows (Anaconda)

Start Jupyter notebook

Reading CSV file (Planets)

Exercise 1

Seaborn

Temperatures

Exercise 2

Salaries

Stack Overflow survey

Exercise 3

Other Materials

Data Sources (Competitions)

Thank You

Keyboard shortcuts

Getting Started with Data Science for non-programmers