数学能力
在整个课程中,我们看到了许多不同的提示方法,可以用来提高LLM数学能力。最近的一种方法,MathPrompter,将其中一些方法(CoT,PAL等)统一到了一个技术中。总体思想是将数学问题分解为代数术语,然后使用Python代码以不同的方式解决它。
MathPrompter有四个步骤。我们将使用以下示例问题来解释它们。该示例直接取自论文。
问:在一家餐厅里,每份成人餐费用为5美元,儿童免费。如果一个由15人组成的团队进来,其中8人是儿童,那么这个团队用餐需要多少钱?
步骤1:生成代数模板
首先,我们需要为问题中的每个数字分配一个变量。这样做有助于将问题更容易地转化为一个抽象的数学问题,同时也便于编写程序代码。
这可以通过少样本提示来完成:
Q: A zoo charges $12 per adult ticket and allows children under 5 to enter for free. A family of 4 adults and 2 children under 5 visit the zoo. What is the total cost for the family to enter?
Qt: At a zoo, each adult ticket costs $A and children under 5 can enter for free. If a family of B adults and C children under 5 visit the zoo, what is the total cost for the family to enter?
Mapping: {A: 12, B: 4, C: 2}
Q: A store sells shoes at $60 per pair and socks at $8 per pair. If a customer buys 2 pairs of shoes and 3 pairs of socks, what is the total cost of the purchase?
Qt: At a store, shoes cost $A per pair and socks cost $B per pair. If a customer buys C pairs of shoes and D pairs of socks, what is the total cost of the purchase?
Mapping: {A: 60, B: 8, C: 2, D: 3}
Q: At a restaurant, each adult meal costs $5 and kids eat free. If a group of 15 people came in and 8 were kids, how much would it cost for the group to eat?
Qt: At a restaurant, each adult meal costs $A and kids eat free. If a group of B people came in and C were kids, how much would it cost for the group to eat?
Mapping:{A: 5, B: 15, C: 8}
Step 2: 第二步:数学提示
这一步的目的是将问题表述为代数表达式和Python代码。这一步有两个同时进行的提示,有助于给出问题的多样化表达。
2a: 代数表达式
我们可以使用少样本提示来让LLM将数学问题表示为代数表达式。这是通过要求LLM生成答案格式来实现的,以"Answer ="开头。
Qt: At a zoo, each adult ticket costs $A and children under 5 can enter for free. If a family of B adults and C children under 5 visit the zoo, what is the total cost for the family to enter?
Mapping: {A: 12, B: 4, C: 2}
Write a mathematical equation and generate the answer format
starting with 'Answer ='
Answer = A * B
Qt: At a store, shoes cost $A per pair and socks cost $B per pair. If a customer buys C pairs of shoes and D pairs of socks, what is the total cost of the purchase?
Mapping: {A: 60, B: 8, C: 2, D: 3}
Write a mathematical equation and generate the answer format
starting with 'Answer ='
Answer = A * C + B * D
Qt: At a restaurant, each adult meal costs $A and kids eat free. If a group of B people came in and C were kids, how much would it cost for the group to eat?
Mapping: {A: 5, B: 15, C: 8}
Write a mathematical equation and generate the answer format
starting with 'Answer ='
Answer = A * B - A * C
2b: Python 代码
我们还可以要求LLM生成解决问题的Python代码。这是通过要求LLM生成一个Python函数来实现的。
Qt: At a zoo, each adult ticket costs $A and children under 5 can enter for free. If a family of B adults and C children under 5 visit the zoo, what is the total cost for the family to enter? Mapping: `{A: 12, B: 4, C: 2}`
Write a Python function that returns the answer.
def zoo_cost(A, B, C): return A * B
Qt: At a store, shoes cost $A per pair and socks cost $B per pair. If a customer buys C pairs of shoes and D pairs of socks, what is the total cost of the purchase?
Write a Python function that returns the answer.
def store_cost(A, B, C, D): return (A * C) + (B * D)
Qt: At a restaurant, each adult meal costs $A and kids eat free. If a group of B people came in and C were kids, how much would it cost for the group to eat?
Write a Python function that returns the answer.
def restaurant_cost(A, B, C):
return A * (B - C)
答案生成
现在,我们可以使用之前生成的映射来自动填充变量。
Mapping:
{A: 5, B: 15, C: 8}
代数的:
Answer = 5 * 15 - 5 * 8
Python函数:
python
def restaurant_cost(A=5, B=15, C=8):
return A * (B - C)
我们可以使用Python来进行评估。
代数的::
python
>>> eval("5 * 15 - 5 * 8")
35
Python函数:
python
>>> restaurant_cost()
35
第四步:自洽性
最后,我们将利用自洽性原则多次重新运行上述过程(约5次),然后取多数答案。
结论
MathPrompter在MultiArith 数据集上报告了92.5%的准确率。这种技术的成功是一个很好的例子,展示了作为一个提示工程师,你可以将在这门课程中学到的方法结合起来,应对更大的问题。
相关论文:
- Imani, S., Du, L., & Shrivastava, H. (2023). MathPrompter: Mathematical Reasoning using Large Language Models.
- Roy, S., & Roth, D. (2015). Solving General Arithmetic Word Problems. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1743–1752. https://doi.org/10.18653/v1/D15-1202