Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin
Natural language offers a convenient, flexible interface for controlling robotic C-arm X-ray systems, making advanced functionality and controls accessible. However, enabling language interfaces requires specialized AI models that interpret X-ray images to create a semantic representation for reason...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Killeen, Benjamin D Suresh, Anushri Gomez, Catalina Inigo, Blanca Bailey, Christopher Unberath, Mathias |
description | Natural language offers a convenient, flexible interface for controlling
robotic C-arm X-ray systems, making advanced functionality and controls
accessible. However, enabling language interfaces requires specialized AI
models that interpret X-ray images to create a semantic representation for
reasoning. The fixed outputs of such AI models limit the functionality of
language controls. Incorporating flexible, language-aligned AI models prompted
through language enables more versatile interfaces for diverse tasks and
procedures. Using a language-aligned foundation model for X-ray image
segmentation, our system continually updates a patient digital twin based on
sparse reconstructions of desired anatomical structures. This supports
autonomous capabilities such as visualization, patient-specific viewfinding,
and automatic collimation from novel viewpoints, enabling commands 'Focus in on
the lower lumbar vertebrae.' In a cadaver study, users visualized, localized,
and collimated structures across the torso using verbal commands, achieving 84%
end-to-end success. Post hoc analysis of randomly oriented images showed our
patient digital twin could localize 35 commonly requested structures to within
51.68 mm, enabling localization and isolation from arbitrary orientations. Our
results demonstrate how intelligent robotic X-ray systems can incorporate
physicians' expressed intent directly. While existing foundation models for
intra-operative X-ray analysis exhibit failure modes, as they improve, they can
facilitate highly flexible, intelligent robotic C-arms. |
doi_str_mv | 10.48550/arxiv.2412.08020 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_08020</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_08020</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_080203</originalsourceid><addsrcrecordid>eNqFzrEOgjAQgOEuDkZ9ACfvBcCKkLCDRhMng4kbOcjRXFJaUgrK2xuJu9O__MMnxPYgwzhNErlH9-YxjOJDFMpURnIpHlfjSWtWZDxk1nhnNdgG7raynmt4Bg4nyGnkmnoYejYKEG5o1ICKgs7ZtvNYaYKcFXvUULzYrMWiQd3T5teV2J1PRXYJZkDZOW7RTeUXUs6Q4__jAyomPgs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><source>arXiv.org</source><creator>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</creator><creatorcontrib>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</creatorcontrib><description>Natural language offers a convenient, flexible interface for controlling
robotic C-arm X-ray systems, making advanced functionality and controls
accessible. However, enabling language interfaces requires specialized AI
models that interpret X-ray images to create a semantic representation for
reasoning. The fixed outputs of such AI models limit the functionality of
language controls. Incorporating flexible, language-aligned AI models prompted
through language enables more versatile interfaces for diverse tasks and
procedures. Using a language-aligned foundation model for X-ray image
segmentation, our system continually updates a patient digital twin based on
sparse reconstructions of desired anatomical structures. This supports
autonomous capabilities such as visualization, patient-specific viewfinding,
and automatic collimation from novel viewpoints, enabling commands 'Focus in on
the lower lumbar vertebrae.' In a cadaver study, users visualized, localized,
and collimated structures across the torso using verbal commands, achieving 84%
end-to-end success. Post hoc analysis of randomly oriented images showed our
patient digital twin could localize 35 commonly requested structures to within
51.68 mm, enabling localization and isolation from arbitrary orientations. Our
results demonstrate how intelligent robotic X-ray systems can incorporate
physicians' expressed intent directly. While existing foundation models for
intra-operative X-ray analysis exhibit failure modes, as they improve, they can
facilitate highly flexible, intelligent robotic C-arms.</description><identifier>DOI: 10.48550/arxiv.2412.08020</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Human-Computer Interaction ; Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.08020$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.08020$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Killeen, Benjamin D</creatorcontrib><creatorcontrib>Suresh, Anushri</creatorcontrib><creatorcontrib>Gomez, Catalina</creatorcontrib><creatorcontrib>Inigo, Blanca</creatorcontrib><creatorcontrib>Bailey, Christopher</creatorcontrib><creatorcontrib>Unberath, Mathias</creatorcontrib><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><description>Natural language offers a convenient, flexible interface for controlling
robotic C-arm X-ray systems, making advanced functionality and controls
accessible. However, enabling language interfaces requires specialized AI
models that interpret X-ray images to create a semantic representation for
reasoning. The fixed outputs of such AI models limit the functionality of
language controls. Incorporating flexible, language-aligned AI models prompted
through language enables more versatile interfaces for diverse tasks and
procedures. Using a language-aligned foundation model for X-ray image
segmentation, our system continually updates a patient digital twin based on
sparse reconstructions of desired anatomical structures. This supports
autonomous capabilities such as visualization, patient-specific viewfinding,
and automatic collimation from novel viewpoints, enabling commands 'Focus in on
the lower lumbar vertebrae.' In a cadaver study, users visualized, localized,
and collimated structures across the torso using verbal commands, achieving 84%
end-to-end success. Post hoc analysis of randomly oriented images showed our
patient digital twin could localize 35 commonly requested structures to within
51.68 mm, enabling localization and isolation from arbitrary orientations. Our
results demonstrate how intelligent robotic X-ray systems can incorporate
physicians' expressed intent directly. While existing foundation models for
intra-operative X-ray analysis exhibit failure modes, as they improve, they can
facilitate highly flexible, intelligent robotic C-arms.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Human-Computer Interaction</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFzrEOgjAQgOEuDkZ9ACfvBcCKkLCDRhMng4kbOcjRXFJaUgrK2xuJu9O__MMnxPYgwzhNErlH9-YxjOJDFMpURnIpHlfjSWtWZDxk1nhnNdgG7raynmt4Bg4nyGnkmnoYejYKEG5o1ICKgs7ZtvNYaYKcFXvUULzYrMWiQd3T5teV2J1PRXYJZkDZOW7RTeUXUs6Q4__jAyomPgs</recordid><startdate>20241210</startdate><enddate>20241210</enddate><creator>Killeen, Benjamin D</creator><creator>Suresh, Anushri</creator><creator>Gomez, Catalina</creator><creator>Inigo, Blanca</creator><creator>Bailey, Christopher</creator><creator>Unberath, Mathias</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241210</creationdate><title>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</title><author>Killeen, Benjamin D ; Suresh, Anushri ; Gomez, Catalina ; Inigo, Blanca ; Bailey, Christopher ; Unberath, Mathias</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_080203</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Human-Computer Interaction</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Killeen, Benjamin D</creatorcontrib><creatorcontrib>Suresh, Anushri</creatorcontrib><creatorcontrib>Gomez, Catalina</creatorcontrib><creatorcontrib>Inigo, Blanca</creatorcontrib><creatorcontrib>Bailey, Christopher</creatorcontrib><creatorcontrib>Unberath, Mathias</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Killeen, Benjamin D</au><au>Suresh, Anushri</au><au>Gomez, Catalina</au><au>Inigo, Blanca</au><au>Bailey, Christopher</au><au>Unberath, Mathias</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin</atitle><date>2024-12-10</date><risdate>2024</risdate><abstract>Natural language offers a convenient, flexible interface for controlling
robotic C-arm X-ray systems, making advanced functionality and controls
accessible. However, enabling language interfaces requires specialized AI
models that interpret X-ray images to create a semantic representation for
reasoning. The fixed outputs of such AI models limit the functionality of
language controls. Incorporating flexible, language-aligned AI models prompted
through language enables more versatile interfaces for diverse tasks and
procedures. Using a language-aligned foundation model for X-ray image
segmentation, our system continually updates a patient digital twin based on
sparse reconstructions of desired anatomical structures. This supports
autonomous capabilities such as visualization, patient-specific viewfinding,
and automatic collimation from novel viewpoints, enabling commands 'Focus in on
the lower lumbar vertebrae.' In a cadaver study, users visualized, localized,
and collimated structures across the torso using verbal commands, achieving 84%
end-to-end success. Post hoc analysis of randomly oriented images showed our
patient digital twin could localize 35 commonly requested structures to within
51.68 mm, enabling localization and isolation from arbitrary orientations. Our
results demonstrate how intelligent robotic X-ray systems can incorporate
physicians' expressed intent directly. While existing foundation models for
intra-operative X-ray analysis exhibit failure modes, as they improve, they can
facilitate highly flexible, intelligent robotic C-arms.</abstract><doi>10.48550/arxiv.2412.08020</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.08020 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_08020 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Human-Computer Interaction Computer Science - Learning Computer Science - Robotics |
title | Intelligent Control of Robotic X-ray Devices using a Language-promptable Digital Twin |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T07%3A49%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Intelligent%20Control%20of%20Robotic%20X-ray%20Devices%20using%20a%20Language-promptable%20Digital%20Twin&rft.au=Killeen,%20Benjamin%20D&rft.date=2024-12-10&rft_id=info:doi/10.48550/arxiv.2412.08020&rft_dat=%3Carxiv_GOX%3E2412_08020%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |