Access pre-compiled datasets containing millions of GitHub repository records with stars, forks, contributors, and activity metrics. Our curated datasets include verified repositories across all programming languages and technology domains.
Get instant access to structured open source data without API rate limits or scraping infrastructure. Ideal for researchers, analysts, and businesses requiring large-scale developer ecosystem intelligence for strategic decision-making.
Warning: This dataset is available only to enterprise customers. Please contact us for more details.
GitHub Dataset Use Cases
Academic Software Engineering Research
Conduct research on open source development practices, collaboration patterns, and software evolution with comprehensive repository data. Access clean, structured datasets perfect for empirical software engineering studies.
Developer Tool Market Intelligence
Understand developer preferences, technology adoption trends, and framework popularity to guide product development. Use historical trends to forecast technology shifts and market opportunities.
Technology Investment Research
Analyze open source ecosystem health and technology momentum using repository metrics and contribution patterns. Identify promising technologies and assess developer community strength for investment decisions.
Developer Platform Development
Build code search engines, developer analytics tools, and trend analysis platforms using rich repository datasets. Enhance developer products with data-driven insights and ecosystem intelligence.
GitHub Dataset New Entries
The graph below contains real data based on our scraping operations. Latest update was an hour ago.
Advanced GitHub Web Scraper Available Now!
Get real-time GitHub data with custom processing to perfectly fit your data pipelines.
Access ready-to-use data instantly instead of waiting weeks or months to build your own data collection pipeline.
Structured & Clean Data
All datasets are thoroughly processed, normalized, and validated to ensure high-quality, consistent information.
Zero Maintenance
We handle all updates and data freshness, allowing you to focus on using the data rather than collecting it.
Cost Efficiency
Save thousands in development and infrastructure costs by leveraging our pre-built dataset instead of creating your own.
Disclaimer: Rebrowser is an independent data provider and is not affiliated with, endorsed by, or sponsored by GitHub. Any trademarks are the property of their respective owners. This dataset is compiled from publicly available information; we do not request or collect GitHub user credentials. By using this dataset, you agree to comply with GitHub's Terms of Service and all applicable laws and regulations. Images, logos, descriptions, and other materials included in this dataset remain the intellectual property of their respective owners and are provided solely for informational purposes. Rebrowser makes no warranties regarding the accuracy, completeness, or legality of the data and assumes no liability for how the data is used. You are solely responsible for ensuring that your use of this dataset, including any images or copyrighted materials, does not infringe on the rights of any third party.
This dataset provides comprehensive GitHub repository information including stars, forks, commits, contributors, languages, and issues. You can browse, filter, and export data by programming language, topic, license, and popularity in multiple formats.
The dataset includes repository metrics like star growth, fork counts, and contribution activity over time. Analyze which projects gain traction fastest, identify emerging technologies, and understand developer engagement patterns.
Activity data shows commit frequency, issue response times, pull request acceptance rates, and contributor engagement. Study project maintenance patterns, community health, and identify actively maintained versus abandoned projects.
The dataset includes detailed repository information such as description, topics, license, creation date, last update, primary language, file structure, and README content. Analyze metadata to understand project characteristics and categorization.
Track language usage across repositories, analyze trending frameworks, and monitor technology adoption rates. Study which languages dominate specific domains and identify emerging programming paradigms.
Yes — the dataset includes contributor profiles, contribution counts, follower networks, and activity patterns. Analyze developer communities, identify influential contributors, and understand collaboration networks.
Use GitHub data to compare star counts, fork rates, and community engagement across programming languages and frameworks. Analyze which ecosystems have the most active developer communities.
Repositories include license information, README quality indicators, and documentation completeness. Analyze open source licensing trends and study the relationship between documentation quality and project adoption.
Track new repositories created over time, filtered by language, topic, or organization. Monitor innovation trends, identify emerging technologies, and spot new projects in specific domains.
The dataset is refreshed regularly to capture new repositories, star growth, commit activity, and contributor changes. Historical data enables trend analysis and longitudinal studies of open source development.
Researchers and recruiters use the dataset to identify active developers in specific technologies, analyze contribution patterns, and understand skill distributions across the developer community.
Yes — filter by organization to analyze corporate open source strategies, study company-sponsored projects, and track how different organizations engage with the open source community.
The dataset includes dependency information showing which libraries and packages are most widely used. Study ecosystem interconnections, identify critical dependencies, and analyze supply chain relationships.
Data can be exported in CSV, JSON, XLSX, Parquet, and NDJSON formats. Apply filters and select specific fields before exporting to get the precise dataset you need for your analysis or application.
Analysts use the dataset to study programming language trends, framework adoption rates, and emerging technologies. Track how developer interest shifts over time and predict future technology directions.