notesum.ai

Published at October 21

Self-Explained Keywords Empower Large Language Models for Code Generation

cs.CV

cs.AI

cs.LG

Released Date: October 21, 2024

Authors: Lishui Fan¹, Mouxiang Chen¹, Zhongxin Liu¹

Aff.: ¹Zhejiang University

Arxiv: https://arxiv.org/abs/2410.15966v1

Refer to caption

Model

Method

HumanEval

HumanEval+

MBPP

MBPP+

APPS

Introductory

APPS

Interview

APPS

Competition

Average

Llama-3.1-70B-Instruct

Default

78.0

73.8

87.6

70.9

50.0

15.0

5.0

54.3

Beam Search

79.3

74.4

87.8

70.9

55.0

16.1

5.0

55.5

CoT

79.9

74.4

87.0

71.7

43.3

16.6

6.7

54.2

SelfEvolve

81.7

75.6

85.4

70.4

50.0

15.5

8.3

55.3

SEK

84.8

79.3

88.4

71.2

61.7

20.0

8.3

59.1

Mixtral-8×22B-Instruct-v0.1

Default

76.2

72.0

73.8

64.3

28.3

7.7

1.6

46.3

Beam Search

78.7

73.2

81.2

70.6

33.3

8.8

6.6

50.3

CoT

72.0

65.9

78.0

68.0

31.6

3.8

5.0

46.3

SelfEvolve

56.7

50.0

68.5

60.1

33.3

7.2

5.0

40.1

SEK

81.1

75.6

79.1

66.9

33.3

10.0

6.6

50.4

GPT-3.5-turbo (API)

Default

72.6

67.7

84.1

71.2

46.6

18.3

0.0

51.5

CoT

58.5

54.9

84.1

68.8

41.6

17.2

1.6

46.7

SelfEvolve

73.2

67.7

82.3

66.7

45.0

19.4

1.6

51.8

SEK

75.6

69.5

84.1

72.5

53.3

20.6

5.0

54.4

GPT-4o-mini (API)

Default

87.8

84.1

85.7

72.8

53.3

31.6

11.6

61.0

CoT

87.2

84.1

88.1

73.3

50.0

33.8

11.6

61.2

SEK

87.2

84.1

87.8

74.1

58.3

35.0

13.3

62.8

DeepSeekCoder-V2-Instruct (API)

Default

85.4

82.3

89.4

75.1

70.0

36.1

10.0

64.0

CoT

88.4

82.3

90.5

75.4

60.0

40.5

10.0

63.9

SEK

93.3

85.4

90.2

76.2

75.0

41.1

13.3

67.8