Number Sequences Test
A compact benchmark to test how well language models can infer patterns from number series. Each row provides a sequence, the ground‑truth next_term, and a templated explanation of the rule. Series span AP, GP, powers, and alternating rules.
Overview
The dataset mixes simple progressions (AP/GP) with composed patterns (alternating addition/multiplication/powers). Explanations are auto‑generated by a Python script via template filling, ensuring consistent phrasing. This makes it ideal for evaluating:
- Pattern discovery and rule induction from short contexts
- Arithmetic reasoning and chain‑of‑thought validation
- Robustness across single‑rule vs. alternating‑rule sequences
Dataset Schema
Each row is a single problem with its ground truth.
sequence
— Array of 5–6 integers composing the series.next_term
— Integer ground truth next value.explanation
— Human‑readable rule description (templated).type
— One ofgeneral
(AP/GP/Power) oralternating
.
Example Row
{
"sequence": [5, 11, 33, 39, 117],
"next_term": 123,
"explanation": "Alternating between +6 and ×3.",
"type": "alternating"
}
Series Categories
- AP (Arithmetic Progression): constant difference
d
. - GP (Geometric Progression): constant ratio
r
. - Powers: exponentiation‑based patterns (e.g., squares, cubes).
- Alternating: two rules alternated, e.g.,
+k
↔×r
, or+k
↔^2
.
Intended Use
- Evaluate LLM reasoning on numerical pattern induction.
- Test explanations: does the model articulate the correct rule?
- Curriculum‑style training or unit tests for math agents.
Non‑Goals
- Not a full curriculum of all number theory topics.
- Not a replacement for broad math benchmarks.
Usage Snippets
Load with the 🤗 Datasets library and iterate over items. Evaluate your model by predicting next_term
and—optionally—generating a textual explanation.
Python
from datasets import load_dataset
ds = load_dataset("Ishank/number-series-problems")
for ex in ds["train"]:
seq = ex["sequence"]
truth = ex["next_term"]
explanation = ex["explanation"]
# your_model.predict(seq) → compare with truth
Evaluation Idea
- Accuracy: exact match on
next_term
. - Rule Consistency: check if the model’s free‑form explanation matches the templated rule (keyword match or regex).
- Type Breakdown: report separate scores for
general
vsalternating
.
Attribution & License
Dataset by Ishan. If you use this in academic or industrial work, please link to the dataset page and credit the author.
- License: CC BY 4.0
- Dataset: Ishank/number-series-problems