OSSInsight: Scalable GitHub Analysis

GitHub is a platform hosting code, enabling collaboration, and supporting version control for a global community of over 100 million developers. The need for free tools is crucial for researching open-source software. Based on our research, we found out that existing tools lack real-time GitHub data...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the VLDB Endowment 2024-08, Vol.17 (12), p.4321-4324
Hauptverfasser: Ghazal, Ahmad, Liang, Zhiyuan, Bains, Sunny, Maduri, Hanumath
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:GitHub is a platform hosting code, enabling collaboration, and supporting version control for a global community of over 100 million developers. The need for free tools is crucial for researching open-source software. Based on our research, we found out that existing tools lack real-time GitHub data processing or have limited functionalities. This demonstration presents OSSInsight, an open source tool for researching and analyzing GitHub repositories. We first present the architecture of the tool including its access to nearly 7 billion archived & real time data and how it is powered by TiDB. The demonstration shows how OSSInsight provides analysis of GitHub data along three dimensions: developers, repositories and organizations. All these analysis are based on generated SQL queries submitted to TiDB database. TiDB possesses HTAP capabilities, utilizing its row store for simple SQL queries while relying on its column store for more complex queries. Users can view and edit these SQL queries and also view their execution plan. Finally, OSSInsight provides an innovative tool based on OpenAI, that conducts data analysis using input in English text, yielding visual representations in the form of charts and graphs.
ISSN:2150-8097
2150-8097
DOI:10.14778/3685800.3685865