数据选取的 4 种方式:
- 使用 loc() 和 iloc() 选取单独几行
- 使用 isin() 查找和选取对应数据
- 使用 unique() 选出唯一值
- 使用 df.nlargest() 和 df.nsmallest()
使用loc和iloc选取单独几行
使用loc 按行索引标签选取数据
1 2 3 4 5 6 7
| import pandas as pd import numpy as np df = pd.DataFrame({'A': pd.date_range('2019/01/01',periods=6), 'B': ['a','b','c','d','e', 'f'], 'C': np.arange(10, 16)}) df = df.set_index('A') df
|
data:image/s3,"s3://crabby-images/4c798/4c798cd075e110e7a055e5aacb3f69b407c893c0" alt="image.png"
1 2 3 4 5
|
df.loc['2019-01-01':'2019-01-03']
|
data:image/s3,"s3://crabby-images/11856/118567656e5e5783d6ee73b059e4174ee588176e" alt="image.png"
1 2
| df.loc['2019-01-01':'2019-01-03','B' ]
|
1 2 3 4 5
| A 2019-01-01 a 2019-01-02 b 2019-01-03 c Name: B, dtype: object
|
1 2 3 4 5
|
df.loc[df['B']=='c']
|
data:image/s3,"s3://crabby-images/c0470/c04702e746e7138d0e553d9dd025c24f50984668" alt="image.png"
使用iloc按索引位置选取数据
1 2 3
| B a C 10 Name: 2019-01-01 00:00:00, dtype: object
|
data:image/s3,"s3://crabby-images/ed362/ed3624de607ad3e59afdcaaad409a77d4eb8bb45" alt="image.png"
data:image/s3,"s3://crabby-images/56b28/56b28d57ec863134b0e97724be54001cac41976a" alt="image.png"
使用isin()查找和选取对应数据
1 2 3
| df = pd.DataFrame({'A': pd.date_range('2019/01/01',periods=5), 'B': ['a','b','c','d','e']}) df
|
data:image/s3,"s3://crabby-images/85932/859326b74aeea0af0f168ee64243b3ec4598330c" alt="image.png"
1 2 3
| data = ['2019-01-01', '2019-01-03'] df = df.loc[df['A'].isin(data), ['A','B']] df
|
data:image/s3,"s3://crabby-images/de189/de18919f70c6ec60da05b76a03c15aea8bae7b83" alt="image.png"
使用unique()选出唯一值
1 2 3 4 5 6
| import numpy as np A = [1, 2, 2, 5,3, 4, 3]
a = np.unique(A) a
|
1 2 3
| a, s= np.unique(A, return_index=True) s
|
1 2 3
| a, s, p = np.unique(A, return_index=True, return_inverse=True) p
|
1
| array([0, 1, 1, 4, 2, 3, 2])
|
使用 df.nlargest() 和 df.nsmallest()
在之前的实现方式,df.head() 用来查看前多少行数据,然后需要找到最大的话,往往分两步,把 df 进行排序,然后选择前多少行数据。而这两个函数分别是取df最大的前几个,和最小的前几个,比较实用。
参数解释
tips.nlargest(n, columns, keep=’first’)
n
:前xx个,int值
columns
:列名
keep='first'
:keep=’first’或者’last’。当出现重复值时,keep=’first’,会选取在原始DataFrame里排在前面的数据,keep=’last’则去排后面的数据。
还是拿小费数据集演示下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| import seaborn as sns from pandas import Series,DataFrame import pandas as pd import matplotlib.pyplot as plt %matplotlib inline
tips = sns.load_dataset('tips')
tips.head()
|
data:image/s3,"s3://crabby-images/06182/061823b36f430d4db50c80a567f67433e224498a" alt="image.png"
1 2
| tips.nlargest(5,'total_bill')
|
data:image/s3,"s3://crabby-images/4cc8c/4cc8c9b00df13fd63411a90c244341560226bac1" alt="image.png"
1 2
| tips.nsmallest(5,'total_bill',keep='last')
|
data:image/s3,"s3://crabby-images/a4946/a4946ff7bb6f8bd3a6f09612be100691a0af47c0" alt="image.png"