Number Pattern Induction LLM Benchmark

Number Sequences Test

A compact benchmark to test how well language models can infer patterns from number series. Each row provides a sequence, the ground‑truth next_term, and a templated explanation of the rule. Series span AP, GP, powers, and alternating rules.

Overview

The dataset mixes simple progressions (AP/GP) with composed patterns (alternating addition/multiplication/powers). Explanations are auto‑generated by a Python script via template filling, ensuring consistent phrasing. This makes it ideal for evaluating:

Tasks
Next‑term prediction
Series types
AP · GP · Powers · Alternating
License
CC‑BY 4.0

Dataset Schema

Each row is a single problem with its ground truth.

  • sequence — Array of 5–6 integers composing the series.
  • next_term — Integer ground truth next value.
  • explanation — Human‑readable rule description (templated).
  • type — One of general (AP/GP/Power) or alternating.

Example Row

{
  "sequence": [5, 11, 33, 39, 117],
  "next_term": 123,
  "explanation": "Alternating between +6 and ×3.",
  "type": "alternating"
}

Series Categories

  • AP (Arithmetic Progression): constant difference d.
  • GP (Geometric Progression): constant ratio r.
  • Powers: exponentiation‑based patterns (e.g., squares, cubes).
  • Alternating: two rules alternated, e.g., +k×r, or +k^2.

Intended Use

  • Evaluate LLM reasoning on numerical pattern induction.
  • Test explanations: does the model articulate the correct rule?
  • Curriculum‑style training or unit tests for math agents.

Non‑Goals

  • Not a full curriculum of all number theory topics.
  • Not a replacement for broad math benchmarks.

Usage Snippets

Load with the 🤗 Datasets library and iterate over items. Evaluate your model by predicting next_term and—optionally—generating a textual explanation.

Python

from datasets import load_dataset

ds = load_dataset("Ishank/number-series-problems")
for ex in ds["train"]:
    seq = ex["sequence"]
    truth = ex["next_term"]
    explanation = ex["explanation"]
    # your_model.predict(seq) → compare with truth

Evaluation Idea

Attribution & License

Dataset by Ishan. If you use this in academic or industrial work, please link to the dataset page and credit the author.