请教，ColossalEval inference 阶段的 output 没有遵循指令回答问题 #4973

xyxxxxx · 2023-10-25T10:41:52Z

xyxxxxx
Oct 25, 2023

我尝试使用 ColossalEval 评估模型，模型为 meta-llama/Llama-2-7b-hf，数据集为 MMLU。当我查看 inference 的结果，发现模型并没有遵循指令回答问题，下面是两个示例：

        {
            "dataset": "mmlu",
            "split": "test",
            "category": "Abstract Algebra",
            "instruction": "The following is a single-choice question on Abstract Algebra. Answer the question by replying A, B, C or D.",
            "input": "Question: Find all zeros in the indicated finite field of the given polynomial with coefficients in that field. x^5 + 3x^3 + x^2 + 2x in Z_5\nA. 0\nB. 1\nC. 0,1\nD. 0,4\nAnswer: ",
            "output": "0,1\n\nQuestion: Find the characteristic of the ring 2Z.\nA. 0\nB. 3\nC. 1",
            "target": "D",
            "softmax_over_choices": {
                "A": 0.19147102534770966,
                "B": 0.36899471282958984,
                "C": 0.2956761419773102,
                "D": 0.1438581347465515
            },
            "loss_over_choices": 1.9389275312423706,
            "loss": [
                1.7557555437088013
            ],
            "loss_sum": [
                1.7557555437088013
            ],
            "token_num": [
                1
            ]
        },
        {
            "dataset": "mmlu",
            "split": "test",
            "category": "Abstract Algebra",
            "instruction": "The following is a single-choice question on Abstract Algebra. Answer the question by replying A, B, C or D.",
            "input": "Question: Statement 1 | If a group has an element of order 15 it must have at least 8 elements of order 15. Statement 2 | If a group has more than 8 elements of order 15, it must have at least 16 elements of order 15.\nA. True, True\nB. False, False\nC. True, False\nD. False, True\nAnswer: ",
            "output": "Question: Statement 1 | If a group has an element of order 15 it must have at least 8 elements of order 1",
            "target": "A",
            "softmax_over_choices": {
                "A": 0.3348124325275421,
                "B": 0.31312471628189087,
                "C": 0.21502548456192017,
                "D": 0.13703730702400208
            },
            "loss_over_choices": 1.0941847562789917,
            "loss": [
                1.1263387203216553
            ],
            "loss_sum": [
                1.1263387203216553
            ],
            "token_num": [
                1
            ]
        },

我看到在 ColossalEval 的文档中评估的都是 base 模型，请问用什么方法可以正常 inference？

补充：使用的示例是 https://github.com/hpcaitech/ColossalAI/tree/main/applications/ColossalEval/examples/dataset_evaluation ，配置文件为

{
  "model": [
    {
      "name": "llama2-7b",
      "model_class": "HuggingFaceCausalLM",
      "parameters": {
        "path": "/workspace/llama/models/Llama-2-7b-hf",
        "model_max_length": 4096,
        "tokenizer_path": "/workspace/llama/models/Llama-2-7b-hf",
        "tokenizer_kwargs": {
          "trust_remote_code": true
        },
        "peft_path": null,
        "model_kwargs": {
          "torch_dtype": "torch.float32",
          "trust_remote_code": true
        },
        "prompt_template": "plain",
        "batch_size": 4
      }
    }
  ],
  "dataset": [
    {
      "name": "mmlu",
      "dataset_class": "MMLUDataset",
      "debug": false,
      "few_shot": true,
      "path": "/workspace/llama/datasets/mmlu",
      "save_path": "/workspace/llama/inferences/mmlu.json"
    }
}

Answered by chengeharrison

Oct 26, 2023

这其实是正常的，有的7B模型可以直接先给出选项，有的不行。但在评估的时候我们可以根据模型预测的第一个token在对应A, B, C, D上的概率来判断模型选择了哪个选项。哪个概率大就代表模型选了哪个。上面第二个问题可以看到模型预测A的概率最大，然后与target一样。可以参考MMLU的repo，他们评测时就是用的这个方法。

View full answer

chengeharrison · 2023-10-26T02:24:25Z

chengeharrison
Oct 26, 2023

这其实是正常的，有的7B模型可以直接先给出选项，有的不行。但在评估的时候我们可以根据模型预测的第一个token在对应A, B, C, D上的概率来判断模型选择了哪个选项。哪个概率大就代表模型选了哪个。上面第二个问题可以看到模型预测A的概率最大，然后与target一样。可以参考MMLU的repo，他们评测时就是用的这个方法。

0 replies

xyxxxxx · 2023-10-26T11:17:48Z

xyxxxxx
Oct 26, 2023
Author

明白了，谢谢 :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请教，ColossalEval inference 阶段的 output 没有遵循指令回答问题 #4973

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

请教，ColossalEval inference 阶段的 output 没有遵循指令回答问题 #4973

xyxxxxx Oct 25, 2023

Replies: 2 comments

chengeharrison Oct 26, 2023

xyxxxxx Oct 26, 2023 Author

xyxxxxx
Oct 25, 2023

chengeharrison
Oct 26, 2023

xyxxxxx
Oct 26, 2023
Author