dataset-finder
Use this skill when users need to search for datasets, download data files, or explore data repositories. Triggers include: requests to "find datasets", "search for data", "download dataset from Kaggle", "get data from Hugging Face", "find ML datasets", or mentions of data repositories like Kaggle, UCI ML Repository, Data.gov, or Hugging Face. Also use for previewing dataset statistics, generating data cards, or discovering datasets for machine learning projects. Requires OpenClawCLI installation from clawhub.ai.
Install via CLI (Recommended)
clawhub install openclaw/skills/skills/anisafifi/dataset-finderWhat This Skill Does
The dataset-finder skill is a powerful utility within the OpenClaw ecosystem designed to streamline the discovery, acquisition, and analysis of data for machine learning and data science workflows. It acts as an abstraction layer over major public data repositories, allowing users to interact with Kaggle, Hugging Face, the UCI ML Repository, and Data.gov through a unified command-line interface. Beyond simple search and download functionality, the skill provides immediate value through automated metadata generation, dataset previews, and statistical summaries. This allows developers and researchers to vet the quality and structure of data files without the overhead of loading massive datasets into memory.
Installation
To begin, ensure you have the OpenClawCLI installed from clawhub.ai. Once the CLI environment is established, install the skill via the command: clawhub install openclaw/skills/skills/anisafifi/dataset-finder. Following the skill registration, ensure your local Python environment has the necessary dependencies by running pip install kaggle datasets pandas huggingface-hub requests beautifulsoup4. For users working on Kaggle-specific tasks, it is mandatory to place your kaggle.json credentials in the appropriate system directory (~/.kaggle/ or %USERPROFILE%\.kaggle\) to authorize API requests.
Use Cases
This skill is ideal for data scientists needing to quickly iterate on machine learning projects by sourcing diverse datasets. It is also perfect for researchers managing local data libraries, as it provides tools for listing local files and generating standardized 'data cards'—which serve as comprehensive documentation for dataset lineage, schema, and usage. Whether you are performing EDA on a CSV file or searching for specialized NLP corpora on Hugging Face, the dataset-finder bridges the gap between raw web sources and your local development environment.
Example Prompts
- "Find some datasets on Kaggle related to housing prices and download the most relevant one."
- "Can you search for sentiment analysis datasets on Hugging Face and show me the statistics for the first result?"
- "Create a data card for my local file 'sales_data.csv' so I can document the schema and citation information."
Tips & Limitations
- Permissions: Always run inside a virtual environment to avoid conflicts with system-level packages.
- Authentication: Ensure your Kaggle and Hugging Face tokens are active. If downloads fail, verify your API keys are in the correct location.
- Performance: For large datasets, use the preview feature first to inspect the header and column types before performing a full download.
- Limitations: Note that external repository rate limits apply; repeated bulk downloading may trigger temporary blocks from the host providers.
Metadata
Not sure this is the right skill?
Describe what you want to build — we'll match you to the best skill from 16,000+ options.
Find the right skillPaste this into your clawhub.json to enable this plugin.
{
"plugins": {
"official-anisafifi-dataset-finder": {
"enabled": true,
"auto_update": true
}
}
}Tags(AI)
Flags: network-access, file-write, file-read, external-api, code-execution
Related Skills
qr-code-generator
Use this skill when users need to create QR codes for any purpose. Triggers include: requests to "generate QR code", "create QR", "make a QR code for", or mentions of encoding data into scannable codes. Supports URLs, text, WiFi credentials, vCards (contact information), email addresses, phone numbers, SMS, location coordinates, calendar events, and custom data. Can customize colors, add logos, generate bulk QR codes, and export in multiple formats (PNG, SVG, PDF). Requires OpenClawCLI installation from clawhub.ai.
academic-research-hub
Use this skill when users need to search academic papers, download research documents, extract citations, or gather scholarly information. Triggers include: requests to "find papers on", "search research about", "download academic articles", "get citations for", or any request involving academic databases like arXiv, PubMed, Semantic Scholar, or Google Scholar. Also use for literature reviews, bibliography generation, and research discovery. Requires OpenClawCLI installation from clawhub.ai.
web-search-hub
Use this skill when users need to search the web for information, news, images, or videos. Triggers include: requests to "search for", "find information about", "look up", "what's the latest on", or any request requiring current web content. Also use for research tasks, fact-checking, finding visual resources, or gathering recent news. Requires OpenClawCLI installation from clawhub.ai. Do NOT use when Claude's built-in web_search tool is more appropriate for simple queries.