Getting Started with Data Science for non-programmers
Getting Started with Data Science for non-programmers
Data Science
Main subjects
- Data Analyzis
- Data Science
- Machine Learning
- Deep Learning
- AI (Artificial Intelligence)
- Big Data
Self introduction
-
Gábor Szabó @szabgab
-
Helps organizations improve their development by creating faster feedback loops
About you
https://code-maven.com/slides/data-science
- (Full) Name
- Workplace/Organization
- Field of expertise (job title?)
- Why are you interested in Data Science?
- What are your hobbies?
Overview of this material
- Install Tools (Python, Jupyter Notebook, NumPy, Pandas, Matplotlib, and Seaborn).
- Data Source: Read in some CSV file (and maybe also a JSON file).
- Make some simple filtering and transformation on the data.
- Create some simple graphs.
- Create some understanding of the data.
Tools
Visualization
Install Python on Windows
- Anaconda
- Download, install, accept the defaults (except: add Anaconda to PATH)
Install Python on Linux
Use the terminal
$ which python3
$ sudo apt-get install python3
$ sudo yum install python3
$ sudo apt-get install virtualenv
$ sudo yum install virtualenv
Install Python Mac OSX
Use the terminal.
Install Homebrew if you don't have it yet.
$ which python3
$ brew install python3
$ brew install virtualenv
Install Jupyter notebook and Python modules on Linux and OSX
$ virtualenv -p python3 ~/venv3
$ source ~/venv3/bin/activate
$ pip install jupyter pandas seaborn
Install Python modules on Windows (Anaconda)
- Open the "Anaconda Prompt"
- Type in
conda list seaborn
to see if seaborn is already installed - (it will show something like this, if it is installed):
# packages in environment at C:\ProgramData\Anaconda3:
#
# Name Version Build Channel
seaborn 0.9.0 py37_0
-
Install
seaborn
by typingconda install seaborn
-
If all else fails you can also try
pip install seaborn
Start Jupyter notebook
- Windows: Anaconda Jupyter notebook
- Linux, OSX:
$ jupyter notebook
Reading CSV file (Planets)
Exercise 1
- Download the planets.csv file and follow the steps above.
Seaborn
"""
Source : https://seaborn.pydata.org/introduction.html
"""
import seaborn as sns
sns.set() # Apply the default default seaborn theme, scaling, and color palette. Optional.
tips = sns.load_dataset("tips") # Load example dataset into Pandas DataFrame
#print(type(tips))
# print(tips)
plot = sns.relplot(
x = "total_bill",
y = "tip",
col = "time",
hue = "smoker",
style = "smoker",
size = "size",
data = tips)
# print(type(plot)) # seaborn.axisgrid.FacetGrid
plot.savefig("tips.png")
Temperatures
Exercise 2
-
Download another CSV file and analyze that. (Search for public data sets)
-
Create some nice graphs
Salaries
Stack Overflow survey
Exercise 3
- Download the Stack Overflow data set
- Compare the average salary in different countries
- Are there outliers? Can you remove them?
- Analyze the date in the same way SO did, but restricted to a specific country.
Other Materials
- DataCamp
- Machine Learning at Coursera
- Machine Learning in Hebrew
- Books
- Courses
- Tips about courses
- On Facebook