PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: He, Yanheng, Jin, Jiahe, Xia, Shijie, Su, Jiadi, Fan, Runze, Zou, Haoyang, Hu, Xiangkun, Liu, Pengfei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator He, Yanheng
Jin, Jiahe
Xia, Shijie
Su, Jiadi
Fan, Runze
Zou, Haoyang
Hu, Xiangkun
Liu, Pengfei
description Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step toward this vision through human cognition transfer. Our key insight is that the path from executing simple "tasks" to handling complex "work" lies in efficiently capturing and learning from human cognitive processes during computer use. To validate this hypothesis, we introduce three key innovations: (1) PC Tracker, a lightweight infrastructure that efficiently collects high-quality human-computer interaction trajectories with complete cognitive context; (2) a two-stage cognition completion pipeline that transforms raw interaction data into rich cognitive trajectories by completing action semantics and thought processes; and (3) a multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding. Our preliminary experiments in PowerPoint presentation creation reveal that complex digital work capabilities can be achieved with a small amount of high-quality cognitive data - PC Agent, trained on just 133 cognitive trajectories, can handle sophisticated work scenarios involving up to 50 steps across multiple applications. This demonstrates the data efficiency of our approach, highlighting that the key to training capable digital agents lies in collecting human cognitive data. By open-sourcing our complete framework, including the data collection infrastructure and cognition completion methods, we aim to lower the barriers for the research community to develop truly capable digital agents.
doi_str_mv 10.48550/arxiv.2412.17589
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_17589</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_17589</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_175893</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0N7Ww5GTwDXBWcExPzSuxUgjPyMxJVYjML1UIzklNLdBRcPRUCM8vyi5W0NVVcFRwzk_PyyzJLEtV8MovLcpLrVTIzCvJV3DJTM8sScwBqcxJ4WFgTUvMKU7lhdLcDPJuriHOHrpgi-MLijJzE4sq40EOiAc7wJiwCgDTITj8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World</title><source>arXiv.org</source><creator>He, Yanheng ; Jin, Jiahe ; Xia, Shijie ; Su, Jiadi ; Fan, Runze ; Zou, Haoyang ; Hu, Xiangkun ; Liu, Pengfei</creator><creatorcontrib>He, Yanheng ; Jin, Jiahe ; Xia, Shijie ; Su, Jiadi ; Fan, Runze ; Zou, Haoyang ; Hu, Xiangkun ; Liu, Pengfei</creatorcontrib><description>Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step toward this vision through human cognition transfer. Our key insight is that the path from executing simple "tasks" to handling complex "work" lies in efficiently capturing and learning from human cognitive processes during computer use. To validate this hypothesis, we introduce three key innovations: (1) PC Tracker, a lightweight infrastructure that efficiently collects high-quality human-computer interaction trajectories with complete cognitive context; (2) a two-stage cognition completion pipeline that transforms raw interaction data into rich cognitive trajectories by completing action semantics and thought processes; and (3) a multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding. Our preliminary experiments in PowerPoint presentation creation reveal that complex digital work capabilities can be achieved with a small amount of high-quality cognitive data - PC Agent, trained on just 133 cognitive trajectories, can handle sophisticated work scenarios involving up to 50 steps across multiple applications. This demonstrates the data efficiency of our approach, highlighting that the key to training capable digital agents lies in collecting human cognitive data. By open-sourcing our complete framework, including the data collection infrastructure and cognition completion methods, we aim to lower the barriers for the research community to develop truly capable digital agents.</description><identifier>DOI: 10.48550/arxiv.2412.17589</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2024-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.17589$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.17589$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>He, Yanheng</creatorcontrib><creatorcontrib>Jin, Jiahe</creatorcontrib><creatorcontrib>Xia, Shijie</creatorcontrib><creatorcontrib>Su, Jiadi</creatorcontrib><creatorcontrib>Fan, Runze</creatorcontrib><creatorcontrib>Zou, Haoyang</creatorcontrib><creatorcontrib>Hu, Xiangkun</creatorcontrib><creatorcontrib>Liu, Pengfei</creatorcontrib><title>PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World</title><description>Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step toward this vision through human cognition transfer. Our key insight is that the path from executing simple "tasks" to handling complex "work" lies in efficiently capturing and learning from human cognitive processes during computer use. To validate this hypothesis, we introduce three key innovations: (1) PC Tracker, a lightweight infrastructure that efficiently collects high-quality human-computer interaction trajectories with complete cognitive context; (2) a two-stage cognition completion pipeline that transforms raw interaction data into rich cognitive trajectories by completing action semantics and thought processes; and (3) a multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding. Our preliminary experiments in PowerPoint presentation creation reveal that complex digital work capabilities can be achieved with a small amount of high-quality cognitive data - PC Agent, trained on just 133 cognitive trajectories, can handle sophisticated work scenarios involving up to 50 steps across multiple applications. This demonstrates the data efficiency of our approach, highlighting that the key to training capable digital agents lies in collecting human cognitive data. By open-sourcing our complete framework, including the data collection infrastructure and cognition completion methods, we aim to lower the barriers for the research community to develop truly capable digital agents.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0N7Ww5GTwDXBWcExPzSuxUgjPyMxJVYjML1UIzklNLdBRcPRUCM8vyi5W0NVVcFRwzk_PyyzJLEtV8MovLcpLrVTIzCvJV3DJTM8sScwBqcxJ4WFgTUvMKU7lhdLcDPJuriHOHrpgi-MLijJzE4sq40EOiAc7wJiwCgDTITj8</recordid><startdate>20241223</startdate><enddate>20241223</enddate><creator>He, Yanheng</creator><creator>Jin, Jiahe</creator><creator>Xia, Shijie</creator><creator>Su, Jiadi</creator><creator>Fan, Runze</creator><creator>Zou, Haoyang</creator><creator>Hu, Xiangkun</creator><creator>Liu, Pengfei</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241223</creationdate><title>PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World</title><author>He, Yanheng ; Jin, Jiahe ; Xia, Shijie ; Su, Jiadi ; Fan, Runze ; Zou, Haoyang ; Hu, Xiangkun ; Liu, Pengfei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_175893</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>He, Yanheng</creatorcontrib><creatorcontrib>Jin, Jiahe</creatorcontrib><creatorcontrib>Xia, Shijie</creatorcontrib><creatorcontrib>Su, Jiadi</creatorcontrib><creatorcontrib>Fan, Runze</creatorcontrib><creatorcontrib>Zou, Haoyang</creatorcontrib><creatorcontrib>Hu, Xiangkun</creatorcontrib><creatorcontrib>Liu, Pengfei</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>He, Yanheng</au><au>Jin, Jiahe</au><au>Xia, Shijie</au><au>Su, Jiadi</au><au>Fan, Runze</au><au>Zou, Haoyang</au><au>Hu, Xiangkun</au><au>Liu, Pengfei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World</atitle><date>2024-12-23</date><risdate>2024</risdate><abstract>Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step toward this vision through human cognition transfer. Our key insight is that the path from executing simple "tasks" to handling complex "work" lies in efficiently capturing and learning from human cognitive processes during computer use. To validate this hypothesis, we introduce three key innovations: (1) PC Tracker, a lightweight infrastructure that efficiently collects high-quality human-computer interaction trajectories with complete cognitive context; (2) a two-stage cognition completion pipeline that transforms raw interaction data into rich cognitive trajectories by completing action semantics and thought processes; and (3) a multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding. Our preliminary experiments in PowerPoint presentation creation reveal that complex digital work capabilities can be achieved with a small amount of high-quality cognitive data - PC Agent, trained on just 133 cognitive trajectories, can handle sophisticated work scenarios involving up to 50 steps across multiple applications. This demonstrates the data efficiency of our approach, highlighting that the key to training capable digital agents lies in collecting human cognitive data. By open-sourcing our complete framework, including the data collection infrastructure and cognition completion methods, we aim to lower the barriers for the research community to develop truly capable digital agents.</abstract><doi>10.48550/arxiv.2412.17589</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2412.17589
ispartof
issn
language eng
recordid cdi_arxiv_primary_2412_17589
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T09%3A06%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PC%20Agent:%20While%20You%20Sleep,%20AI%20Works%20--%20A%20Cognitive%20Journey%20into%20Digital%20World&rft.au=He,%20Yanheng&rft.date=2024-12-23&rft_id=info:doi/10.48550/arxiv.2412.17589&rft_dat=%3Carxiv_GOX%3E2412_17589%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true