什么是 Python Itertools 函数？

0 Shares

深入解析 Python Itertools 模块

Python 的 Itertools 模块，按照官方文档的描述，是一个为 Python 迭代器提供快速、高效内存管理工具的集合。这些工具既可以独立使用，也可以组合使用，从而以简洁、高效且节省内存的方式创建和操作迭代器。

Itertools 模块中的函数极大地方便了迭代器的使用，特别是在处理大规模数据时。这些函数能够基于现有的迭代器构建更为复杂的迭代器。

此外，Itertools 还可以帮助开发者减少使用迭代器时可能出现的错误，并使代码更加清晰、易读和易于维护。

根据 Itertools 模块中迭代器所提供的功能，可以将它们划分为以下几种类型：

#1. 无限迭代器

这类迭代器允许处理无限序列，在没有特定退出条件的情况下持续运行循环。它们在模拟无限循环或生成无界序列时特别有用。Itertools 提供了三个无限迭代器：count()、cycle() 和 repeat()。

#2. 组合迭代器

组合迭代器提供了一系列函数，用于处理笛卡尔积，并执行可迭代对象中元素的组合和排列。当需要找出可迭代对象中元素的所有可能排列或组合方式时，这些函数是首选。Itertools 提供了四个组合迭代器：product()、permutations()、combinations() 和 combinations_with_replacement()。

#3. 基于最短输入序列终止的迭代器

这类迭代器是用于有限序列的终止迭代器，其输出取决于所使用的具体函数。示例包括：accumulate()、chain()、chain.from_iterable()、compress()、dropwhile()、filterfalse()、groupby()、islice()、pairwise()、starmap()、takewhile()、tee() 和 zip_longest()。

让我们通过实例来了解不同 Itertools 函数的工作方式：

无限迭代器

Itertools 模块提供了以下三个无限迭代器：

#1. count()

count(start, step) 函数生成一个从起始值开始的无限数字序列。此函数接受两个可选参数：start 和 step。 start 参数设置数字序列的起始位置，默认为 0。step 参数设置每个连续数字之间的差值，默认为 1。

import itertools
# 从 4 开始计数，步长为 2
for i in itertools.count(4, 2):
    # 添加条件以结束循环，避免无限循环
    if i == 14:
        break
    else:
        print(i) # 输出 - 4, 6, 8, 10, 12

输出:

#2. cycle()

cycle(iterable) 函数接收一个可迭代对象作为参数，然后循环访问该对象，按照元素出现的顺序访问其中的每个项目。

例如，如果向 cycle() 函数传入 ["red", "green", "yellow"]，在第一个循环中，我们首先访问 “red”，在第二个循环中访问 “green”，然后是 “yellow”。在第四个循环中，因为迭代中的所有元素都已遍历，我们将从 “red” 开始，并无限循环下去。

调用 cycle() 时，可以将结果存储在变量中，从而创建一个保持其状态的迭代器。这可以确保循环不会每次都重新开始，从而避免只能访问第一个元素的情况。

import itertools

colors = ["red", "green", "yellow"]
# 将颜色列表传入 cycle() 函数
color_cycle = itertools.cycle(colors)
print(color_cycle)

# 使用 range 来限制循环次数，避免无限循环
# next() 用于返回迭代器中的下一个元素
for i in range(7):
    print(next(color_cycle))

输出：

red
green
yellow
red
green
yellow
red

#3. repeat()

repeat(elem, n) 接受两个参数：一个需要重复的元素 elem 和元素重复的次数 n。要重复的元素可以是一个单独的值，也可以是可迭代对象。如果不提供 n，则该元素将无限重复。

import itertools
   
for i in itertools.repeat(10, 3):
    print(i)

输出：

10
10
10

组合迭代器

组合迭代器包括：

#1. product()

product() 函数用于计算传递给它的多个可迭代对象的笛卡尔积。例如，如果存在两个迭代或集合，如 x = {7,8} 和 y = {1,2,3}，x 和 y 的笛卡尔积将包含来自 x 和 y 中元素的所有可能组合，其中第一个元素来自 x，第二个元素来自 y。本例中 x 和 y 的笛卡尔积为 [(7, 1), (7, 2), (7, 3), (8, 1), (8, 2), (8, 3)]。

product() 接受一个可选参数 repeat，用于计算可迭代对象与其自身的笛卡尔积。 repeat 指定在计算笛卡尔积时输入的可迭代对象中每个元素的重复次数。

例如，调用 product('ABCD', repeat=2) 将生成诸如 (‘A’, ‘A’), (‘A’, ‘B’), (‘A’, ‘C’) 等组合。如果将 repeat 设置为 3，则该函数将产生诸如 (‘A’, ‘A’, ‘A’), (‘A’, ‘A’, ‘B’), (‘A’, ‘A’, ‘C’), (‘A’, ‘A’, ‘D’) 等组合。

from itertools import product
# product() 函数使用可选参数 repeat
print("product() 函数使用可选参数 repeat")
print(list(product('ABC', repeat = 2)))

# product() 函数不使用可选参数
print("product() 函数不使用可选参数")
print(list(product([7,8], [1,2,3])))

输出

product() 函数使用可选参数 repeat
[('A', 'A'), ('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'B'), ('B', 'C'), ('C', 'A'), ('C', 'B'), ('C', 'C')]
product() 函数不使用可选参数
[(7, 1), (7, 2), (7, 3), (8, 1), (8, 2), (8, 3)]

#2. permutations()

permutations(iterable, group_size) 返回传递给它的可迭代对象的所有可能排列。排列是指集合中元素的排序方式的数量。permutations() 接受一个可选参数 group_size。如果未指定 group_size，则生成的排列大小与传递给函数的可迭代对象的长度相同。

import itertools
numbers = [1, 2, 3]
sized_permutations = list(itertools.permutations(numbers,2))
unsized_permuatations = list(itertools.permutations(numbers))

print("排列长度为 2")
print(sized_permutations)
print("排列没有设置长度")
print(unsized_permuatations)

输出

排列长度为 2
[(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2)]
排列没有设置长度
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

#3. combinations()

combinations(iterable, size) 函数从传递给函数的迭代对象中的元素返回所有指定长度的可能组合。size 参数指定每个组合的大小。

结果是排序的。组合与排列略有不同。对于排列，元素的顺序很重要；但对于组合，顺序并不重要。例如，对于 [A, B, C]，有 6 种排列：AB、AC、BA、BC、CA、CB，但只有 3 种组合：AB、AC、BC。

import itertools
numbers = [1, 2, 3,4]
size2_combination = list(itertools.combinations(numbers,2))
size3_combination = list(itertools.combinations(numbers, 3))

print("组合长度为 2")
print(size2_combination)
print("组合长度为 3")
print(size3_combination)

输出：

组合长度为 2
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
组合长度为 3
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]

#4. combinations_with_replacement()

combinations_with_replacement(iterable, size) 从传递给函数的迭代对象中生成指定长度的所有可能组合，并允许在输出组合中存在重复元素。 size 参数决定生成的组合的大小。

此函数与 combinations() 的区别在于，它允许组合中的元素重复多次。例如，可以得到如 (1,1) 这样的组合，而 combination() 函数则不会产生这样的组合。

import itertools
numbers = [1, 2, 3,4]

size2_combination = list(itertools.combinations_with_replacement(numbers,2))
print("允许重复元素的组合 => 长度为 2")
print(size2_combination)

输出

允许重复元素的组合 => 长度为 2
[(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)]

终止迭代器

这类迭代器包括：

#1. accumulate()

accumulate(iterable, function) 函数接收一个可迭代对象和一个可选的函数作为参数。它返回一个迭代器，该迭代器在每次迭代中对可迭代对象中的元素应用该函数后生成累积结果。如果没有传递函数，则进行加法运算并返回累计结果。

import itertools
import operator
numbers = [1, 2, 3, 4, 5]

# 累加数字之和
accumulated_val = itertools.accumulate(numbers)
accumulated_mul = itertools.accumulate(numbers, operator.mul)
print("不使用函数的累加")
print(list(accumulated_val))
print("使用乘法的累加")
print(list(accumulated_mul))

输出：

不使用函数的累加
[1, 3, 6, 10, 15]
使用乘法的累加
[1, 2, 6, 24, 120]

#2. chain()

chain(iterable_1, iterable_2, ...) 接受多个可迭代对象并将它们链接在一起，生成一个包含传递给 chain() 函数的所有可迭代对象的值的单个迭代器。

import itertools

letters = ['A', 'B', 'C', 'D']
numbers = [1, 2, 3]
colors = ['red', 'green', 'yellow']

# 将字母和数字链接在一起
chained_iterable = list(itertools.chain(letters, numbers, colors))
print(chained_iterable)

输出：

['A', 'B', 'C', 'D', 1, 2, 3, 'red', 'green', 'yellow']

#3. chain.from_iterable()

chain.from_iterable(iterable) 函数与 chain() 函数类似。然而，其区别在于它只需要一个包含多个子迭代器的迭代对象，并将这些子迭代器链接在一起。

import itertools

letters = ['A', 'B', 'C', 'D']
numbers = [1, 2, 3]
colors = ['red', 'green', 'yellow']

iterable = ['hello',colors, letters, numbers]
chain = list(itertools.chain.from_iterable(iterable))
print(chain)

输出：

['h', 'e', 'l', 'l', 'o', 'red', 'green', 'yellow', 'A', 'B', 'C', 'D', 1, 2, 3]

#4. compress()

compress(data, selectors) 接收两个参数：data 是一个可迭代对象，selectors 是一个包含布尔值（true 和 false）的可迭代对象。 1 和 0 可以分别替代布尔值 true 和 false。 compress() 函数使用选择器中对应的元素过滤传递的 data。

选择器中值为 true 或 1 的元素对应的数据被保留，值为 false 或 0 的其余值则被忽略。如果选择器中传递的布尔值少于数据中的项目数，则超出选择器范围的数据元素将被忽略。

import itertools

# 数据包含 10 项
data = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
# 传入 9 个选择器项
selectors = [True, False, 1, False, 0, 1, True, False, 1]

# 根据选择器选择数据元素
filtered_data = list(itertools.compress(data, selectors))
print(filtered_data)

输出：

['A', 'C', 'F', 'G', 'I']

#5. dropwhile()

dropwhile(function, sequence) 函数接收一个函数，该函数中的条件会返回 true 或 false，以及一个值序列。该函数会删除序列中所有元素，直到传递的条件首次返回 false。一旦条件返回 false，其余元素将包含在结果中，无论它们返回 true 还是 false。

import itertools

numbers = [1, 2, 3, 4, 5, 1, 6, 7, 2, 1, 8, 9, 0, 7]

# 删除元素，直到传递的条件返回 False
filtered_numbers = list(itertools.dropwhile(lambda x: x < 5, numbers))
print(filtered_numbers)

输出：

[5, 1, 6, 7, 2, 1, 8, 9, 0, 7]

#6. filterfalse()

filterfalse(function, sequence) 函数接收一个函数，以及一个结果为 true 或 false 的条件，以及一个序列。然后，它返回序列中所有不满足函数中条件的元素。

import itertools

numbers = [1, 2, 3, 4, 2, 3 ,5, 6, 5, 8, 1, 2, 3, 6, 2, 7, 4, 3]

# 过滤条件为 False 的元素
filtered_numbers = list(itertools.filterfalse(lambda x: x < 4, numbers))
print(filtered_numbers)

输出：

[4, 5, 6, 5, 8, 6, 7, 4]

#7. groupby()

groupby(iterable, key) 接收一个可迭代对象和一个键，然后创建一个迭代器，该迭代器返回连续的键和组。为了使其正常工作，传递给它的可迭代对象需要按照相同的键函数进行排序。key 函数计算可迭代对象中每个元素的键值。

import itertools

input_list = [("Domestic", "Cow"), ("Domestic", "Dog"), ("Domestic", "Cat"),("Wild", "Lion"), ("Wild", "Zebra"), ("Wild", "Elephant")]
classification = itertools.groupby(input_list,lambda x: x[0])
for key,value in classification:
  print(key,":",list(value))

输出：

Domestic : [('Domestic', 'Cow'), ('Domestic', 'Dog'), ('Domestic', 'Cat')]
Wild : [('Wild', 'Lion'), ('Wild', 'Zebra'), ('Wild', 'Elephant')]

#8. islice()

islice(iterable, start, stop, step) 允许使用传递的开始、停止和步长值对可迭代对象进行切片。步长参数是可选的。从 0 开始计数，不包含停止编号上的项目。

import itertools

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

# 选择指定范围内的元素
selected_numbers = list(itertools.islice(numbers, 2, 10))
selected_numbers_step= list(itertools.islice(numbers, 2, 10,2))
print("islice 没有设置步长值")
print(selected_numbers)
print("islice 步长值为 2")
print(selected_numbers_step)

输出：

islice 没有设置步长值
[3, 4, 5, 6, 7, 8, 9, 10]
islice 步长值为 2
[3, 5, 7, 9]

#9. pairwise()

pairwise(iterable) 返回从传递给它的可迭代对象中获取的连续重叠对，其顺序与它们在可迭代对象中出现的顺序一致。如果传递给它的可迭代对象的值少于两个，则 pairwise() 的结果为空。

from itertools import pairwise

numbers = [1, 2, 3, 4, 5, 6, 7, 8]
word = 'WORLD'
single = ['A']

print(list(pairwise(numbers)))
print(list(pairwise(word)))
print(list(pairwise(single)))

输出：

[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8)]
[('W', 'O'), ('O', 'R'), ('R', 'L'), ('L', 'D')]
[]

#10. starmap()

starmap(function, iterable) 函数用于当实参参数已分组到元组中时替代 map() 函数。 starmap() 函数将函数应用于传递给它的可迭代对象中的元素。可迭代对象应该将元素分组在元组中。

import itertools

iter_starmap = [(123, 63, 13), (5, 6, 52), (824, 51, 9), (26, 24, 16), (14, 15, 11)]
print (list(itertools.starmap(min, iter_starmap)))

输出：

[13, 5, 9, 16, 11]

#11. takewhile()

takewhile(function, iterable) 的工作方式与 dropwhile() 相反。takewhile() 函数接收一个具有待评估条件的函数和一个可迭代对象。该函数会包含可迭代对象中满足函数条件的所有元素，直到返回 false。一旦返回 false，可迭代对象中的所有后续元素都将被忽略。

import itertools

numbers = [1, 2, 3, 4, 5, 1, 6, 7, 2, 1, 8, 9, 0, 7]

# 保留元素，直到传递的条件返回 False
filtered_numbers = list(itertools.takewhile(lambda x: x < 5, numbers))
print(filtered_numbers)

输出：

[1, 2, 3, 4]

#12. tee()

tee(iterable, n) 函数接收一个可迭代对象并返回多个独立的迭代器。返回的迭代器数量由 n 设置，默认为 2。

import itertools

numbers = [1, 2, 3, 4, 5]

# 从 numbers 创建两个独立的迭代器
iter1, iter2 = itertools.tee(numbers, 2)
print(list(iter1))
print(list(iter2))

输出：

[1, 2, 3, 4, 5]
[1, 2, 3, 4, 5]

#13. zip_longest()

zip_longest(iterables, fillvalue) 接收多个迭代器和一个填充值 fillvalue。该函数返回一个迭代器，该迭代器会聚合传递给它的每个迭代器中的元素。如果迭代器的长度不同，则缺失的值将由传递给函数的填充值替换，直到最长的迭代器耗尽。

import itertools

names = ['John', 'mathew', 'mary', 'Alice', 'Bob', 'Charlie', 'Fury']
ages = [25, 30, 12, 13, 42]

# 合并姓名和年龄，使用破折号填充缺失的年龄
combined = itertools.zip_longest(names, ages, fillvalue="-")

for name, age in combined:
    print(name, age)

输出：

John 25
mathew 30
mary 12
Alice 13
Bob 42
Charlie -
Fury -

结论

Python itertools 模块是 Python 开发人员的重要工具集。它在函数式编程、数据处理和转换、数据过滤和选择、分组和聚合、组合迭代对象、组合数学以及处理无限序列等方面都有着广泛的应用。

作为 Python 开发人员，了解 itertools 将会使您获益良多。因此，请务必利用本文来熟悉 Python Itertools。