Pandas 行列轉換的完全指南

一、基礎概念

在介紹 Pandas 行列轉換的各種方法之前，我們需要了解一些基礎概念。Pandas 中最重要的兩個數據結構是 Series 和 DataFrame。Series 是一維數組，它由值和索引組成。DataFrame 是二維表格，它由多個列和行組成。

在 DataFrame 中，我們通常會遇到兩種類型的轉換：行轉列和列轉行。行轉列指的是將 DataFrame 中的一些行數據轉換成新的列，而列轉行則相反，將多列數據轉換成一些新的行。

二、行轉列

1. stack()

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C': [1, 2, 3, 4, 5, 6, 7, 8],
                   'D': [9, 10, 11, 12, 13, 14, 15, 16]})
stacked = df.set_index(['A', 'B']).stack()
print(stacked)

使用 stack() 方法可以將 DataFrame 中的列轉換成多層索引的 Series。在上面的例子中，我們首先使用 set_index() 方法將 A 和 B 列設置為索引，然後使用 stack() 方法將 C 和 D 列轉換成多層索引的 Series。

2. melt()

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C': [1, 2, 3, 4, 5, 6, 7, 8],
                   'D': [9, 10, 11, 12, 13, 14, 15, 16]})
melted = df.melt(id_vars=['A', 'B'], value_vars=['C', 'D'])
print(melted)

使用 melt() 方法可以將 DataFrame 中的多列數據轉換成一些新的行，其中需要指定 value_vars 和 id_vars 兩個參數。value_vars 代表要轉換的列，id_vars 代表保持不變的列。在上面的例子中，我們將 C 和 D 列轉換成了新的一列，並保留了 A 和 B 列。

三、列轉行

1. transpose()

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C': [1, 2, 3, 4, 5, 6, 7, 8],
                   'D': [9, 10, 11, 12, 13, 14, 15, 16]})
transposed = df.transpose()
print(transposed)

使用 transpose() 方法可以將 DataFrame 中的行和列交換。在上面的例子中，我們將原來的列轉換成了新的行。

2. pivot()

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C': [1, 2, 3, 4, 5, 6, 7, 8],
                   'D': [9, 10, 11, 12, 13, 14, 15, 16]})
pivoted = df.pivot(index='A', columns='B', values='C')
print(pivoted)

使用 pivot() 方法可以將 DataFrame 中的某些列轉換成新的行和列。其中，index 參數代表新 DataFrame 的行索引，columns 參數代表新 DataFrame 的列索引，values 參數代表填充新 DataFrame 的值。在上面的例子中，我們將原 DataFrame 中的 A 列作為新 DataFrame 的行索引，B 列作為新 DataFrame 的列索引，C 列作為新 DataFrame 的值。

四、其他方法

1. wide_to_long()

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C_1': [1, 2, 3, 4, 5, 6, 7, 8],
                   'C_2': [9, 10, 11, 12, 13, 14, 15, 16]})
long = pd.wide_to_long(df, stubnames='C', i=['A', 'B'], j='number')
print(long)

使用 wide_to_long() 方法可以將 DataFrame 中的寬格式數據轉換成長格式數據。其中，stubnames 參數代表列名中的前綴，i 參數代表保留的列，j 參數代表新生成的列名。在上面的例子中，我們將原 DataFrame 中的 C_1 和 C_2 列轉換成了新的一列，用 number 作為列名。

2. pivot_table()

import pandas as pd

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C': [1, 2, 3, 4, 5, 6, 7, 8],
                   'D': [9, 10, 11, 12, 13, 14, 15, 16]})
pivoted_table = df.pivot_table(values='C', index='A', columns='B', aggfunc=np.sum)
print(pivoted_table)

使用 pivot_table() 方法可以對 DataFrame 進行聚合，並將結果以新的行列形式返回。其中，values 參數代表需要聚合的列，index 參數代表行索引，columns 參數代表列索引，aggfunc 參數代表聚合函數。在上面的例子中，我們對列 C 進行了 sum 聚合，以 A 列作為行索引，B 列作為列索引。

五、總結

本篇文章介紹了 Pandas 行列轉換的常見方法，包括 stack、melt、transpose、pivot、wide_to_long 和 pivot_table。行列轉換是數據分析和數據清洗中的常見操作，熟練掌握這些方法可以幫助我們更加高效地處理數據。希望本文能夠幫助到大家。

原創文章，作者：小藍，如若轉載，請註明出處：https://www.506064.com/zh-hant/n/303604.html