Scaling Granite Code Models to 128K Context

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-07
Hauptverfasser: Stallone, Matt, Saxena, Vaibhav, Karlinsky, Leonid, McGinn, Bridget, Bula, Tim, Mishra, Mayank, Adriana Meza Soria, Zhang, Gaoyuan, Prasad, Aditya, Shen, Yikang, Surendran, Saptha, Guttula, Shanmukha, Patel, Hima, Selvam, Parameswaran, Xuan-Hong Dang, Koyfman, Yan, Sood, Atin, Feris, Rogerio, Desai, Nirmit, Cox, David D, Puri, Ruchir, Panda, Rameswar
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.
ISSN:2331-8422