A DataFrame is a multidimensional array where both the rows and columns can be labeled.
This recipe includes the following topics:
- Create a Pandas DataFrame
- Access entire DataFrame
- Access single column data
- Access top 3 rows
- Find unique values in a given column
# import modules
import pandas as pd
import numpy as np
# create row data
salary_ds = [70000, 85000, 150000]
salary_web = [65000, 90000, 120000]
salary_x = [75000, 81000, 110000]
salary_y = [85000, 93000, 100000]
salary_z = [65000, 75000, 90000]
salary_a = [55000, 68000, 990000]
salary_b = [70000, 91000, 110000]
salaries = np.array([salary_ds, salary_web, salary_x, salary_y, salary_z, salary_a, salary_b])
# define row name
rownames = ['Data Science', 'Web Development', 'Career X', 'Career Y', 'Career Z', 'Career A', 'Career B']
# define column name
colnames = ['1 year', '3 years', '5 years']
# create DataFrame
df = pd.DataFrame(salaries, index=rownames, columns=colnames)
# display DataFrame
print('Display entire DataFrame')
print(df)
# Select single column
print('Display single column')
print(df['1 year'])
# Select first 3 rows
print('Display first 3 rows')
print(df.head(3))
# Find unique values in a given column
print('Display unique values in a given column')
print(df['5 years'].value_counts())
Display entire DataFrame
1 year 3 years 5 years
Data Science 70000 85000 150000
Web Development 65000 90000 120000
Career X 75000 81000 110000
Career Y 85000 93000 100000
Career Z 65000 75000 90000
Career A 55000 68000 990000
Career B 70000 91000 110000
Display single column
Data Science 70000
Web Development 65000
Career X 75000
Career Y 85000
Career Z 65000
Career A 55000
Career B 70000
Name: 1 year, dtype: int64
Display first 3 rows
1 year 3 years 5 years
Data Science 70000 85000 150000
Web Development 65000 90000 120000
Career X 75000 81000 110000
Display unique values in a given column
110000 2
990000 1
100000 1
90000 1
120000 1
150000 1
Name: 5 years, dtype: int64