pandas.unstack()详解

一、概述

pandas是一款用于数据分析的开源工具包，unstack()函数是pandas中map-reduce（map操作和reduce操作）的一个重要数据重构工具。unstack()函数的作用就是将一个层次化索引的DataFrame转化为一个普通的二维DataFrame，同时用unstack()函数可以方便的进行数据透视表的运算。unstack()函数可以用于处理缺失值、处理时序数据以及解决索引和字段之间的转化问题。

二、unstack()函数基本语法

pandas.unstack(level=-1, fill_value=None)

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3,
                   'B': ['A', 'B', 'C'] * 4,
                   'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                   'D': np.random.randn(12),
                   'E': np.random.randn(12)})
print(df)
print(df.unstack())  # 列转行
print(df.set_index(['A', 'B']))  # 设置索引
print(df.set_index(['A', 'B']).unstack(level=(0, 1)))  # 多索引列转行

三、使用示例

1、列转行

使用unstack()函数可以将DataFrame的列转化为行。

import pandas as pd

data = {
    "type": ["A", "A", "B", "B"],
    "value": [10, 20, 30, 40],
    "name": ["Tom", "Tom", "Bob", "Bob"]
}
df = pd.DataFrame(data, columns=["type", "value", "name"])
print(df)
print(df.set_index(['type', 'name'])['value'].unstack())  # 列转行

2、填充缺失值

使用unstack()函数可以方便的对缺失值进行填充，同时保留原有的索引。

import pandas as pd
import numpy as np

data = {
    "type": ["A", "A", "A", "B", "B", "B"],
    "time": ["2020-01", "2020-02", "2020-03", "2020-01", "2020-02", "2020-03"],
    "value": [10, np.nan, 30, 40, 50, np.nan]
}
df = pd.DataFrame(data, columns=["type", "time", "value"])
print(df)
df = df.set_index(["type", "time"]).unstack()
print(df)
df = df.fillna(method="ffill", axis=1)
print(df.stack().swaplevel())  # 填充缺失值

3、多层索引列转行

使用unstack()函数可以方便的将多层索引的DataFrame进行列转行，同时可以指定需要转化的层。

import pandas as pd
import numpy as np

data = {
    "type": ["A", "A", "A", "B", "B", "B"],
    "time": ["2020-01", "2020-02", "2020-03", "2020-01", "2020-02", "2020-03"],
    "value1": [10, 20, 30, 40, 50, 60],
    "value2": [20, 30, 40, 50, 60, 70]
}
df = pd.DataFrame(data, columns=["type", "time", "value1", "value2"])
print(df.set_index(["type", "time"]))
print(df.set_index(["type", "time"]).unstack(level=(0, 1)))  # 多索引列转行

四、小结

pandas.unstack()是一个重要的数据重构工具，可以用于数据透视表的运算，可以方便的处理缺失值、处理时序数据以及解决索引和字段之间的转化问题。在实际的应用中，unstack()函数非常实用，非常方便，能够将DataFrame转化为二维的表格，便于后续的数据分析和运算。

原创文章，作者：小蓝，如若转载，请注明出处：https://www.506064.com/n/244282.html