Number Sequences Test

A compact benchmark to test how well language models can infer patterns from number series. Each row provides a sequence, the ground‑truth next_term, and a templated explanation of the rule. Series span AP, GP, powers, and alternating rules.

View on Hugging Face Jump to Schema

Quick Examples

6, 11, 16, 21, 26, 31 → 36
Rule: Arithmetic progression with first term 6 and common difference 5.

5, 10, 20, 40, 80 → 160
Rule: Geometric progression with common ratio 2.

5, 11, 33, 39, 117 → 123
Rule: Alternating +6 and ×3.

2, 4, 8, 64, 68, 4624 → 4628
Rule: Alternating +4 and squaring ( ^2 ).

Use these to sanity‑check model reasoning chains.

Overview

The dataset mixes simple progressions (AP/GP) with composed patterns (alternating addition/multiplication/powers). Explanations are auto‑generated by a Python script via template filling, ensuring consistent phrasing. This makes it ideal for evaluating:

Pattern discovery and rule induction from short contexts
Arithmetic reasoning and chain‑of‑thought validation
Robustness across single‑rule vs. alternating‑rule sequences

Tasks

Next‑term prediction

Series types

AP · GP · Powers · Alternating

License

CC‑BY 4.0

Dataset Schema

Each row is a single problem with its ground truth.

sequence — Array of 5–6 integers composing the series.
next_term — Integer ground truth next value.
explanation — Human‑readable rule description (templated).
type — One of general (AP/GP/Power) or alternating.

Example Row

{
  "sequence": [5, 11, 33, 39, 117],
  "next_term": 123,
  "explanation": "Alternating between +6 and ×3.",
  "type": "alternating"
}

Series Categories

AP (Arithmetic Progression): constant difference d.
GP (Geometric Progression): constant ratio r.
Powers: exponentiation‑based patterns (e.g., squares, cubes).
Alternating: two rules alternated, e.g., +k ↔ ×r, or +k ↔ ^2.

Intended Use

Evaluate LLM reasoning on numerical pattern induction.
Test explanations: does the model articulate the correct rule?
Curriculum‑style training or unit tests for math agents.

Non‑Goals

Not a full curriculum of all number theory topics.
Not a replacement for broad math benchmarks.

Usage Snippets

Load with the 🤗 Datasets library and iterate over items. Evaluate your model by predicting next_term and—optionally—generating a textual explanation.

Python

from datasets import load_dataset

ds = load_dataset("Ishank/number-series-problems")
for ex in ds["train"]:
    seq = ex["sequence"]
    truth = ex["next_term"]
    explanation = ex["explanation"]
    # your_model.predict(seq) → compare with truth

Evaluation Idea

Accuracy: exact match on next_term.
Rule Consistency: check if the model’s free‑form explanation matches the templated rule (keyword match or regex).
Type Breakdown: report separate scores for general vs alternating.

Attribution & License

Dataset by Ishan. If you use this in academic or industrial work, please link to the dataset page and credit the author.

License: CC BY 4.0
Dataset: Ishank/number-series-problems