Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin

Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reason...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Killeen, Benjamin D, Suresh, Anushri, Gomez, Catalina, Inigo, Blanca, Bailey, Christopher, Unberath, Mathias
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Killeen, Benjamin D
Suresh, Anushri
Gomez, Catalina
Inigo, Blanca
Bailey, Christopher
Unberath, Mathias
description Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.
doi_str_mv 10.48550/arxiv.2412.08020
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_08020</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_08020</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_080203</originalsourceid><addsrcrecordid>eNqFzrEOgjAQgOEuDkZ9ACfvBcCKkLCDRhMng4kbOcjRXFJaUgrK2xuJu9O__MMnxPYgwzhNErlH9-YxjOJDFMpURnIpHlfjSWtWZDxk1nhnNdgG7raynmt4Bg4nyGnkmnoYejYKEG5o1ICKgs7ZtvNYaYKcFXvUULzYrMWiQd3T5teV2J1PRXYJZkDZOW7RTeUXUs6Q4__jAyomPgs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><source>arXiv.org</source><creator>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</creator><creatorcontrib>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</creatorcontrib><description>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.</description><identifier>DOI: 10.48550/arxiv.2412.08020</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Human-Computer Interaction ; Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.08020$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.08020$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Killeen, Benjamin D</creatorcontrib><creatorcontrib>Suresh, Anushri</creatorcontrib><creatorcontrib>Gomez, Catalina</creatorcontrib><creatorcontrib>Inigo, Blanca</creatorcontrib><creatorcontrib>Bailey, Christopher</creatorcontrib><creatorcontrib>Unberath, Mathias</creatorcontrib><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><description>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Human-Computer Interaction</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzrEOgjAQgOEuDkZ9ACfvBcCKkLCDRhMng4kbOcjRXFJaUgrK2xuJu9O__MMnxPYgwzhNErlH9-YxjOJDFMpURnIpHlfjSWtWZDxk1nhnNdgG7raynmt4Bg4nyGnkmnoYejYKEG5o1ICKgs7ZtvNYaYKcFXvUULzYrMWiQd3T5teV2J1PRXYJZkDZOW7RTeUXUs6Q4__jAyomPgs</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Killeen, Benjamin D</creator><creator>Suresh, Anushri</creator><creator>Gomez, Catalina</creator><creator>Inigo, Blanca</creator><creator>Bailey, Christopher</creator><creator>Unberath, Mathias</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241210</creationdate><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><author>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_080203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Human-Computer Interaction</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Killeen, Benjamin D</creatorcontrib><creatorcontrib>Suresh, Anushri</creatorcontrib><creatorcontrib>Gomez, Catalina</creatorcontrib><creatorcontrib>Inigo, Blanca</creatorcontrib><creatorcontrib>Bailey, Christopher</creatorcontrib><creatorcontrib>Unberath, Mathias</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Killeen, Benjamin D</au><au>Suresh, Anushri</au><au>Gomez, Catalina</au><au>Inigo, Blanca</au><au>Bailey, Christopher</au><au>Unberath, Mathias</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</atitle><date>2024-12-10</date><risdate>2024</risdate><abstract>Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reasoning. The fixed outputs of such AI models limit the functionality of language controls. Incorporating flexible, language-aligned AI models prompted through language enables more versatile interfaces for diverse tasks and procedures. Using a language-aligned foundation model for X-ray image segmentation, our system continually updates a patient digital twin based on sparse reconstructions of desired anatomical structures. This supports autonomous capabilities such as visualization, patient-specific viewfinding, and automatic collimation from novel viewpoints, enabling commands 'Focus in on the lower lumbar vertebrae.' In a cadaver study, users visualized, localized, and collimated structures across the torso using verbal commands, achieving 84% end-to-end success. Post hoc analysis of randomly oriented images showed our patient digital twin could localize 35 commonly requested structures to within 51.68 mm, enabling localization and isolation from arbitrary orientations. Our results demonstrate how intelligent robotic X-ray systems can incorporate physicians' expressed intent directly. While existing foundation models for intra-operative X-ray analysis exhibit failure modes, as they improve, they can facilitate highly flexible, intelligent robotic C-arms.</abstract><doi>10.48550/arxiv.2412.08020</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2412.08020
ispartof
issn
language eng
recordid cdi_arxiv_primary_2412_08020
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Computer Vision and Pattern Recognition
Computer Science - Human-Computer Interaction
Computer Science - Learning
Computer Science - Robotics
title Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A49%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Intelligent%20Control%20of%20Robotic%20X-ray%20Devices%20using%20a%20Language-promptable%20Digital%20Twin&rft.au=Killeen,%20Benjamin%20D&rft.date=2024-12-10&rft_id=info:doi/10.48550/arxiv.2412.08020&rft_dat=%3Carxiv_GOX%3E2412_08020%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true