Python 代码运行慢怎么用 cProfile 定位性能瓶颈

对于 Python 代码运行慢的问题，最推荐的初步处理方向是使用标准库自带的 cProfile 进行确定性剖析，它适合在测试环境或复现场景中找出耗时最多的函数调用。

先说结论：cProfile 是 Python 标准库提供的确定性性能剖析工具，适合在开发或测试阶段定位 CPU 密集型任务的耗时函数，但不建议直接在高负载生产环境中长时间开启。

先定位：通过 cProfile 生成性能数据文件，确认是哪些函数占用了大部分执行时间。
先做：优先优化调用次数多且单次耗时长的函数，避免过早优化无关紧要的代码。
再验证：修改代码后重新运行剖析，对比总执行时间和函数调用耗时变化。

命令速用版

如果你只是想快速看一眼哪里慢，可以直接在命令行运行脚本并生成统计文件：

python -m cProfile -o output.prof your_script.py

生成 output.prof 后，可以用 Python 自带的 pstats 模块读取文本报告：

python -c "import pstats; p=pstats.Stats('output.prof'); p.sort_stats('cumulative').print_stats(10)"

实战演示：构造慢速代码

为了让你立即体验剖析过程，我们构造一个包含性能问题的简单脚本 slow_demo.py。这段代码使用低效的双重循环计算列表和，你可以直接复制运行：

# slow_demo.py
def calculate_sum(data):
    total = 0
    # 低效的双重循环
    for i in range(len(data)):
        for j in range(i + 1, len(data)):
            total += data[i] * data[j]
    return total

if __name__ == "__main__":
    nums = list(range(1000))
    result = calculate_sum(nums)
    print(f"Result: {result}")

定位瓶颈

在测试环境运行你的脚本，加上 cProfile 参数。如果脚本需要参数，照常传递即可（注意参数前不要加多余的反引号）：

python -m cProfile -o profile_result.prof slow_demo.py `--arg1` value1

使用 pstats 交互式查看，或者写成脚本打印前 N 个耗时函数。关注 cumulative（累计时间）列，它包含了子函数的耗时，更能反映瓶颈：

import pstats
p = pstats.Stats('profile_result.prof')
p.sort_stats('cumulative')
p.print_stats(5)

典型输出如下，可以看到 calculate_sum 占据了绝大部分时间：

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    5.230    5.230 slow_demo.py:3(calculate_sum)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
     1    0.000    0.000    0.000    0.000 {built-in method builtins.range}
     1    0.000    0.000    0.000    0.000 {method 'runctx' of '_lsprof.Profiler' objects}

优化案例与对比

根据剖析结果，calculate_sum 是瓶颈。我们可以利用数学公式优化算法复杂度，从 O(n^2) 降至 O(n)。创建优化后的脚本 fast_demo.py：

# fast_demo.py
def calculate_sum(data):
    total = 0
    sum_all = sum(data)
    # 利用数学公式优化
    for i in range(len(data)):
        total += data[i] * (sum_all - data[i])
    return total // 2

if __name__ == "__main__":
    nums = list(range(1000))
    result = calculate_sum(nums)
    print(f"Result: {result}")

再次运行剖析命令，对比关键指标变化：

python -m cProfile -o profile_result_opt.prof fast_demo.py
python -c "import pstats; p=pstats.Stats('profile_result_opt.prof'); p.sort_stats('cumulative').print_stats(5)"

优化后的典型输出显示累计耗时显著下降：

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.002    0.002 fast_demo.py:3(calculate_sum)
     1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
     ...

对比两次报告，cumtime 从约 5.230 秒降至 0.002 秒，验证优化生效。

怎么验证是否生效

优化代码后，再次运行相同的剖析命令。对比两次生成的报告中，总运行时间（total time）是否下降，以及目标函数的 tottime（内部耗时）或 cumtime（累计耗时）是否减少。不要只看单次运行，建议在相同环境下多跑几次取稳定值。

常见坑

1. 生产环境开销：cProfile 会显著降低代码运行速度，不要在生产环境长期开启，否则可能把性能问题变成更严重的阻塞。

2. IO 密集型误导：如果程序主要在等待网络或磁盘 IO，cProfile 显示的耗时可能集中在 IO 函数上，这时候优化 CPU 代码效果有限，需要考虑异步或并发方案。

3. 多线程限制：在多线程环境下，cProfile 默认只能剖析当前线程，可能无法完整反映并发性能问题，需要结合 threading 模块单独处理。

4. 装饰器影响：某些装饰器可能会掩盖真实的函数调用关系，导致剖析结果中函数名显示为装饰器内部名称，排查时需注意。