Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin

Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reason...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Killeen, Benjamin D, Suresh, Anushri, Gomez, Catalina, Inigo, Blanca, Bailey, Christopher, Unberath, Mathias
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Human-Computer Interaction Computer Science - Learning Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Killeen, Benjamin D Suresh, Anushri Gomez, Catalina Inigo, Blanca Bailey, Christopher Unberath, Mathias
description	Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.
doi_str_mv	10.48550/arxiv.2412.08020
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_08020</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_08020</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_080203</originalsourceid><addsrcrecordid>eNqFzrEOgjAQgOEuDkZ9ACfvBcCKkLCDRhMng4kbOcjRXFJaUgrK2xuJu9O__MMnxPYgwzhNErlH9-YxjOJDFMpURnIpHlfjSWtWZDxk1nhnNdgG7raynmt4Bg4nyGnkmnoYejYKEG5o1ICKgs7ZtvNYaYKcFXvUULzYrMWiQd3T5teV2J1PRXYJZkDZOW7RTeUXUs6Q4__jAyomPgs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><source>arXiv.org</source><creator>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</creator><creatorcontrib>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</creatorcontrib><description>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.</description><identifier>DOI: 10.48550/arxiv.2412.08020</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Human-Computer Interaction ; Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.08020$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.08020$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Killeen, Benjamin D</creatorcontrib><creatorcontrib>Suresh, Anushri</creatorcontrib><creatorcontrib>Gomez, Catalina</creatorcontrib><creatorcontrib>Inigo, Blanca</creatorcontrib><creatorcontrib>Bailey, Christopher</creatorcontrib><creatorcontrib>Unberath, Mathias</creatorcontrib><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><description>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Human-Computer Interaction</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzrEOgjAQgOEuDkZ9ACfvBcCKkLCDRhMng4kbOcjRXFJaUgrK2xuJu9O__MMnxPYgwzhNErlH9-YxjOJDFMpURnIpHlfjSWtWZDxk1nhnNdgG7raynmt4Bg4nyGnkmnoYejYKEG5o1ICKgs7ZtvNYaYKcFXvUULzYrMWiQd3T5teV2J1PRXYJZkDZOW7RTeUXUs6Q4__jAyomPgs</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Killeen, Benjamin D</creator><creator>Suresh, Anushri</creator><creator>Gomez, Catalina</creator><creator>Inigo, Blanca</creator><creator>Bailey, Christopher</creator><creator>Unberath, Mathias</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241210</creationdate><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><author>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_080203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Human-Computer Interaction</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Killeen, Benjamin D</creatorcontrib><creatorcontrib>Suresh, Anushri</creatorcontrib><creatorcontrib>Gomez, Catalina</creatorcontrib><creatorcontrib>Inigo, Blanca</creatorcontrib><creatorcontrib>Bailey, Christopher</creatorcontrib><creatorcontrib>Unberath, Mathias</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Killeen, Benjamin D</au><au>Suresh, Anushri</au><au>Gomez, Catalina</au><au>Inigo, Blanca</au><au>Bailey, Christopher</au><au>Unberath, Mathias</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</atitle><date>2024-12-10</date><risdate>2024</risdate><abstract>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.</abstract><doi>10.48550/arxiv.2412.08020</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.08020
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_08020
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Human-Computer Interaction Computer Science - Learning Computer Science - Robotics
title	Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A49%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Intelligent%20Control%20of%20Robotic%20X-ray%20Devices%20using%20a%20Language-promptable%20Digital%20Twin&rft.au=Killeen,%20Benjamin%20D&rft.date=2024-12-10&rft_id=info:doi/10.48550/arxiv.2412.08020&rft_dat=%3Carxiv_GOX%3E2412_08020%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true