We believe in contributing to the AI research community by sharing high-quality datasets. Below you can find datasets we've created and made available for research purposes.
All our open datasets are available on Hugging Face 🤗
Comprehensive evaluation dataset for Thai language models covering various tasks and domains.
📊 Model evaluation and benchmarkingThai translation of OpenAI's HumanEval dataset for evaluating code generation capabilities in Thai context.
📊 Code generation evaluationThai version of American Invitational Mathematics Examination (AIME) 2024 problems for testing mathematical reasoning.
📊 Mathematical problem-solving evaluationCollection of 500 mathematics problems in Thai for training and evaluating mathematical reasoning capabilities.
📊 Math problem-solving training and evaluationValidation set for AI Mathematical Olympiad problems in Thai.
📊 Advanced mathematical reasoning evaluationSupervised fine-tuning dataset distilled from reasoning models for Thai language.
📊 Fine-tuning language models with reasoning capabilitiesLightweight dataset for training and evaluating code generation in Thai context.
📊 Code generation model trainingExtensive collection of Thai handwritten text samples for OCR and handwriting recognition.
📊 Handwriting recognition, OCR trainingComprehensive collection of Thai legal documents optimized for Retrieval-Augmented Generation (RAG) systems.
📊 Legal AI systems, RAG applications@dataset{iapp_datasets_2024,
author = {iApp Technology Research Team},
title = {Dataset Name},
year = {2024},
publisher = {iApp Technology},
url = {https://iapp.co.th/researches/datasets}
}We welcome contributions to our datasets. If you have:
Please contact our research team.
We're actively working on releasing more datasets:
Different datasets come with different licenses. Please review the license terms for each dataset before use.
For commercial licensing inquiries, please contact us.
We're committed to advancing AI research in Thailand through open collaboration and data sharing.