数据选取的 4 种方式:
- 使用 loc() 和 iloc() 选取单独几行
- 使用 isin() 查找和选取对应数据
- 使用 unique() 选出唯一值
- 使用 df.nlargest() 和 df.nsmallest()
使用loc和iloc选取单独几行
使用loc 按行索引标签选取数据
1 2 3 4 5 6 7
| import pandas as pd import numpy as np df = pd.DataFrame({'A': pd.date_range('2019/01/01',periods=6), 'B': ['a','b','c','d','e', 'f'], 'C': np.arange(10, 16)}) df = df.set_index('A') df
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605165871551-de19839c-229d-4db3-8ef9-d05faf7c480c.png#align=left&display=inline&height=217&margin=%5Bobject%20Object%5D&name=image.png&originHeight=217&originWidth=136&size=5852&status=done&style=none&width=136)
1 2 3 4 5
|
df.loc['2019-01-01':'2019-01-03']
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605165902027-9abeb27a-df8d-444d-a55f-65e1baefa21e.png#align=left&display=inline&height=140&margin=%5Bobject%20Object%5D&name=image.png&originHeight=140&originWidth=129&size=4043&status=done&style=none&width=129)
1 2
| df.loc['2019-01-01':'2019-01-03','B' ]
|
1 2 3 4 5
| A 2019-01-01 a 2019-01-02 b 2019-01-03 c Name: B, dtype: object
|
1 2 3 4 5
|
df.loc[df['B']=='c']
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605165931557-e1f6ec09-36a1-47c6-b83d-f5c42f24b01e.png#align=left&display=inline&height=83&margin=%5Bobject%20Object%5D&name=image.png&originHeight=83&originWidth=130&size=2324&status=done&style=none&width=130)
使用iloc按索引位置选取数据
1 2 3
| B a C 10 Name: 2019-01-01 00:00:00, dtype: object
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605165958232-c0607b50-80df-4604-99e1-acdb0fc6ecec.png#align=left&display=inline&height=82&margin=%5Bobject%20Object%5D&name=image.png&originHeight=82&originWidth=124&size=2208&status=done&style=none&width=124)
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605165988596-0e2429d9-d071-450d-af74-464df0b25a0e.png#align=left&display=inline&height=107&margin=%5Bobject%20Object%5D&name=image.png&originHeight=107&originWidth=127&size=3333&status=done&style=none&width=127)
使用isin()查找和选取对应数据
1 2 3
| df = pd.DataFrame({'A': pd.date_range('2019/01/01',periods=5), 'B': ['a','b','c','d','e']}) df
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605166024593-990c5d8d-06c1-417a-8909-83c5903276ae.png#align=left&display=inline&height=165&margin=%5Bobject%20Object%5D&name=image.png&originHeight=165&originWidth=119&size=4569&status=done&style=none&width=119)
1 2 3
| data = ['2019-01-01', '2019-01-03'] df = df.loc[df['A'].isin(data), ['A','B']] df
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605166049429-c0f3edc2-e957-44c9-b9c5-8604ea099b0d.png#align=left&display=inline&height=76&margin=%5Bobject%20Object%5D&name=image.png&originHeight=76&originWidth=117&size=2846&status=done&style=none&width=117)
使用unique()选出唯一值
1 2 3 4 5 6
| import numpy as np A = [1, 2, 2, 5,3, 4, 3]
a = np.unique(A) a
|
1 2 3
| a, s= np.unique(A, return_index=True) s
|
1 2 3
| a, s, p = np.unique(A, return_index=True, return_inverse=True) p
|
1
| array([0, 1, 1, 4, 2, 3, 2])
|
使用 df.nlargest() 和 df.nsmallest()
在之前的实现方式,df.head() 用来查看前多少行数据,然后需要找到最大的话,往往分两步,把 df 进行排序,然后选择前多少行数据。而这两个函数分别是取df最大的前几个,和最小的前几个,比较实用。
参数解释
tips.nlargest(n, columns, keep=’first’)
n
:前xx个,int值
columns
:列名
keep='first'
:keep=’first’或者’last’。当出现重复值时,keep=’first’,会选取在原始DataFrame里排在前面的数据,keep=’last’则去排后面的数据。
还是拿小费数据集演示下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| import seaborn as sns from pandas import Series,DataFrame import pandas as pd import matplotlib.pyplot as plt %matplotlib inline
tips = sns.load_dataset('tips')
tips.head()
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605166272569-8c787fb4-af24-4569-a314-383ed5cb5a20.png#align=left&display=inline&height=163&margin=%5Bobject%20Object%5D&name=image.png&originHeight=163&originWidth=343&size=15042&status=done&style=none&width=343)
1 2
| tips.nlargest(5,'total_bill')
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605166292121-6cd66fdf-4fcd-4425-a28e-e9b0b658c3c9.png#align=left&display=inline&height=164&margin=%5Bobject%20Object%5D&name=image.png&originHeight=164&originWidth=351&size=15896&status=done&style=none&width=351)
1 2
| tips.nsmallest(5,'total_bill',keep='last')
|
![image.png](https://cdn.nlark.com/yuque/0/2020/png/613759/1605166320920-00e3d9da-e574-439a-b51d-57652707a87e.png#align=left&display=inline&height=164&margin=%5Bobject%20Object%5D&name=image.png&originHeight=164&originWidth=360&size=15028&status=done&style=none&width=360)