Robot-assisted ultrasound reconstruction for spine surgery: from bench-top to pre-clinical study

Robot-assisted ultrasound (rUS) systems have already been used to provide non-radiative three-dimensional (3D) reconstructions that form the basis for guiding spine surgical procedures. Despite promising studies on this technology, there are few studies that offer insight into the robustness and generality of the approach by verifying performance in various testing scenarios. Therefore, this study aims at providing an assessment of a rUS system, with technical details from experiments starting at the bench-top to the pre-clinical study. A semi-automatic control strategy was proposed to ensure continuous and smooth robotic scanning. Next, a U-Net-based segmentation approach was developed to automatically process the anatomic features and derive a high-quality 3D US reconstruction. Experiments were conducted on synthetic phantoms and human cadavers to validate the proposed approach. Average deviations of scanning force were found to be 2.84±0.45 N on synthetic phantoms and to be 5.64±1.10 N on human cadavers. The anatomic features could be reliably reconstructed at mean accuracy of 1.28±0.87 mm for the synthetic phantoms and of 1.74±0.89 mm for the human cadavers. The results and experiments demonstrate the feasibility of the proposed system in a pre-clinical setting. This work is complementary to previous work, encouraging further exploration of the potential of this technology in in vivo studies.


Introduction
Back pain is one of the most prevalent symptoms in the worldwide population and one of the top drivers of health costs [1]. Back pain can be caused by structural spinal disorders, which are often treated by spinal stabilization. Before the conditions become too severe, spine surgery, such as pedicle screw placement (PSP), could be considered to alleviate the pain and recover neurological functionality. The surgical procedures usually involve inserting pedicle screws and subsequently introducing a metal connecting rod through the pedicle screws to stabilize the corresponding vertebral bodies. Despite being commonly performed, PSP remains a challenging intervention due to the limited visibility of essential and critical anatomical structures during the surgery [2]. As a result, surgeons need to practice their technical skills to position the instrument precisely, in order to gain the correct mental image of the anatomical features.
Computer-assisted medical imaging techniques (e.g., fluoroscopy) are increasingly being employed for intraoperative visualization and navigation in PSP [3]. By registering the preoperative anatomy to the intraoperative scene, such medical imaging techniques could help surgeons to understand the location and the nature of invisible vertebral features [4]. However, using fluoroscopy intraoperatively exposes the patients and clinicians to ionizing radiation, which may also delay surgical intervention.
A potential non-ionizing alternative consists of twodimensional (2D) ultrasound (US). From knowledge of the location of the US images, the 2D information could be reconstructed into a three-dimensional (3D) representation of the scene. Furthermore, 3D US navigation is not limited by volume size and anatomy shape [5]. In a previous study [6], Ottacher et al. developed a free-hand US system to localize 3D reconstructed vertebrae, reaching a 0.8 ± 0.6-mm positioning accuracy. In a later work, Chen et al. introduced a portable free-hand US system, and their reconstruction showed a 1-mm error on average [7]. While the demonstrated accuracy is quite good, manual scanning remains a lengthy and tedious procedure. The data collection requires lots of expertise and involves careful palpation to guarantee good acoustic coupling and image quality. The handheld approach may also lead to distorted anatomies, e.g., when different pressure levels are applied during different scanning phases.
To tackle the aforementioned problems of free-hand US systems, researchers have investigated robot-assisted US systems. Such systems open up possibilities for automatic scanning [8][9][10]. Robotic approaches could potentially increase motion stability, tracking accuracy, and repeatability. Victorova et al. implemented a US-based robotic system for scoliosis assessment [11]. This system could control the force applied in the normal direction between 2 and 5 N, while following a pre-computed scanning movement based on a manually predefined trajectory. Zhang et al. utilized a flexible ultrasound scanning system (FUSS) to automatically image the human spine [12]. The probe was fixed on the robotic arm with a flexible fixture. The surgeon manually plotted a coarse scanning path on a phantom. Then, the automatic scanning was conducted with a constant 8 N force. However, the system was only validated on a synthetic phantom with a flat silicon surface.
There is a growing interest in the field toward robotassisted US systems for spine surgery. As far as the authors are aware, only a few of the studies focus on clinical translation. There are still challenges to applying the current system for clinical implementation. On the one hand, the bone intensity in the US image is noticeably different between the commonly employed water [6] or agar-agar [13] phantoms and actual human tissues. The surrounding human muscles and fat tissues attenuate the ultrasonic reflection compared to the phantom materials' low acoustic impedance. On the other hand, robotic scanning is more difficult along the complex back contour and deformable skin than in the simplified phantom setting where the surface is typically planar and homogeneous [14]. It requires maintaining a constant force against the spine and being compatible with various clinical scenarios.
It is crucial to provide an assessment and deliver technical details of the rUS system from bench-top to pre-clinical study. Thus, this work is complementary to previous work to obtain a better insight into the potential to employ robotic approaches in a clinical setting. The contributions of the paper include: a) development of an rUS system that employs a hybrid controller for scanning and deep learning (DL)-based image segmentation for reconstruction; b) demonstration of the feasibility of robotic US reconstruction on both synthetic phantoms and human cadavers; and c) detailed analysis for future researchers and reflecting the technical impact on clinical use.

Methods
The proposed system contains three building blocks as depicted in Fig. 1 and explained in the following.

Ultrasound calibration
To track the US images in the robot base frame, a robotic US calibration needs to be conducted. Both temporal and spatial calibrations need to be performed. A setup consisting of a US probe, a robot arm, and a custom-designed Z phantom is employed.
The temporal calibration compensates for the temporal offset f between the measured poses of the robot end effector and the corresponding poses from the recorded US images. The US probe is commanded to translate 20 mm at 2 mm/s along the vertical axis (i.e., the US probe Z-axis) away and toward the phantom bottom five times. The temporal offset is then calculated by comparing the timestamps between the peak value of the pose from the detected phantom in the robot end effector frame and the US image frame. Subsequently, spatial calibration is done with a Z phantom, as described in [15]. The purpose is to derive the homogeneous transformation matrix E E U S T from the US image frame to the robot end effector frame, as well as the scaling matrix T s from the pixel to real distance.
The calibration procedures are repeated three times to ensure correctness. The calibration must be executed again if the US probe is repositioned or the image recording rate is changed.

Path generation and robotic scanning
In free-hand scanning, the sonographer would maneuver the US probe over the target area to ensure good acoustic coupling. To mimic free-hand scanning, a path planning, and hybrid control strategy is described.

Path generation
Without employing an external camera, an approach that benefits from the hands-on operation capability of the robot is proposed to mimic free-hand operation. This path is obtained from S-shaped motion paths that are 'taught' by the operator.

Robot control
To guarantee patient safety, an admittance-based hybrid control scheme is proposed, following a predefined path with proper contact force, as illustrated in Fig. 3. The desired pose of the US probe p d is computed by using the following equation: where the pose deviation p R is calculated from the wrench  to a pose deviation p R = [p p p,θ θ θ ], wherep p p ∈ R 3 andθ θ θ ∈ R 3 are the resulting position and orientation deflection of the probe, respectively. The proposed algorithm establishes an integral-based controller between the pose deflection and force error along the probe Z-axis, as described in [11]: wherep z is the pose deviation along the probe Z-axis and c z is the compliance coefficient. Choosing a high value for c z increases the tracking speed of the robot's end effector with respect to surface fluctuations over the scanning path. The compliance coefficient c z is fine-tuned as 1.5 mm/sN. This integral-based control leads to a zero force steady-state error, where the contact force is regulated to a constant value. For the other axes, a mass-damper-spring relation, namely admittance control, is established to convert the environmental interaction force to a deviation from a reference pose [16]. The environmental force includes contact force and scanning friction between the US probe and the phantom.
whereṗ p p,θ θ θ ∈ R 3 andp p p,θ θ θ ∈ R 3 are the first and the second derivatives of the position and orientation deviations, respectively. m, b, k, I I I , V V V , andC C C correspond to the mass, damper, spring, moment of inertia, viscosity, and angular stiffness.
Higher stiffness values are preferred to limit the steady-state error. However, a higher stiffness also decreases the actuation speed, such that the probe cannot quickly follow fluctuations of the skin surface. Thus, a stiffness value similar to the experimental phantom is chosen; during US scanning, k and C C C are set as 1 mm/N and 0.3 rad/Nm, respectively. The other second-order dynamic model parameters are selected to lead to a 20 Hz bandwidth to cancel out measurement noise in the F/T sensor.
To conduct the required desired motions, the expression graph-based Task Specification Language (eTaSL) [17] is utilized to send the desired joint angles q d to the controller. Then, the applied joint torque is computed internally by the KUKA controller.

Image segmentation and 3D reconstruction
Image segmentation is an essential step to guarantee accurate reconstruction. As the US reflection from bone surfaces is influenced by surrounding soft tissues, traditional image processing can hardly segment the images properly. The U-Net [18] architecture is suitable for medical image segmentation with a limited training dataset. It also shows superior performance compared to traditional image processing. Therefore, it is employed to improve segmentation efficiency and accuracy.
The processing pipeline is illustrated in Fig. 4. Before segmentation, the images are pre-processed. Firstly, raw US images are converted into grayscale and cropped as 480×480 pixels. Secondly, a Gaussian filter is applied to remove speckles. Subsequently, the pre-processed images are sent to U-Net, which segments the bone contours from the surrounding tissues. A standard U-Net implementation with a five squared-shape layers convolutional network is employed. Standard values and common practices are followed for setting the network's hyperparameters. Each layer consists of two convolution and ReLu modules and underwent bias regularization using the L1-Norm. Then, the segmented images are processed with thresholding and morphological operator. Subsequently, the contours are extracted from the segmented images with a Canny edge detection algorithm. All the contour pixels c U S = [u, v, 0, 1] T are converted to point clouds c R = [x, y, z, 1] T in the robot base frame to represent anatomic features by: The spatial relationship from the end effector to the robot base R E E T is derived by the robot's forward kinematics. Subsequently, radius outlier removal is applied to the generated point clouds to remove outliers by computing the distance from their neighbors. Finally, surface meshes are rendered and visualized in a custom-designed graphical user interface (GUI). Figure 5 shows the experimental setup consisting of a US imaging system (Sonosite, FUJIFILM, USA) and a lightweight robotic arm (KUKA LBR Med 7, Augsburg, Germany). The US system uses a 7.5 MHz linear probe. A frame grabber (Epiphan Systems Inc. Palo Alto, Canada) is employed to capture the US images at 50 Hz. A customdesigned US probe holder is mounted onto the robot end effector by a fast tool changer (G-SHW063-2UE, GRIP GmbH, Germany). A six-DoF F/T sensor (Nano25, ATI Industrial Automation Inc.) assembled at the US probe measures interaction force and torque between the US probe and contacted tissues. A PC workstation (Intel i7, CPU @2.6 GHz, 64 G RAM) is used for data acquisition and processing. The Robot Operating System (ROS) and Open Robot Control Software (Orocos, version 2.9.0) are the types of middleware that are used in the implementation of the robot control. An NVIDIA Quadro P2000 is used for image processing and computation.

Synthetic phantoms and human cadavers
Verification experiments were carried out on both synthetic phantoms and human cadavers. An in-house 3D-printed phantom and a commercial lumbar phantom (Model 034, CIRS, USA) were employed for experimental validation in a laboratory setting. Both models are shown in Fig. 6. The 3D-printed model was produced by using polylactic acid and immersed into agar-agar to simulate the echographic response of soft tissue. The commercial CIRS phantom contains additional features representing spinal disks, skin, and soft tissue to mimic the human anatomy. Then, two human cadavers (coccyx to T4 torsi) were used to assess the clinical potential of the proposed system. The body mass index (BMI) of the two cadaver specimens was 19.08 and 30.41, respectively. Both cadavers do not present previous spine pathologies or spine surgeries.

Evaluation of image segmentation
The bone contrast in US images from synthetic phantoms and human cadavers differs because of different US attenuation rates. In this work, two models were trained with the same network architecture. One was trained for the synthetic phantoms, and another one was applied for the human cadavers. The segmentation performance is obviously influenced by the size of the dataset. Data augmentation is applied to extend the training dataset. The employed augmentation techniques include random rotation up to ±5 • and left-right flip. Then, the pre-processed images are fed into the network. The former model used a dataset consisting of 750 images, 2250 after augmentation. Before the experiment, the images were manually recorded from the 3D-printed phantom and the CIRS phantom. The other model was trained on 3000 augmented images from the human cadavers within the lumbar area. Subsequently, both models were trained for 10 epochs with a learning rate of 1e −4 . Then, the two models were separately validated with 150 images. The validation sets were divided into three groups with 50 images in each group. Moreover, the US images used for network training and validation were  not applied to reconstruction. The US images were labeled by the first author and verified with clinicians for correctness. The average intersection over union (IoU) was computed to verify the segmentation performance. IoU was formulated as the number of the intersection over the sum of the labeled area. The accuracy (ACC) and F1-score (F1) were also computed as evaluation metrics, as described in [19]. The automatically segmented images were compared with corresponding manually annotated images, which served as ground truth. Figure 6 shows examples of the automatically segmented image on the phantoms and human cadavers. Table 1

Validation on synthetic phantoms
To cover all the anatomic features, the scanning area was set to be around 50 mm × 100 mm for the 3D-printed phantom and 250 mm × 100 mm for the CIRS phantom. This area ensured the reconstruction could cover a few lumbar vertebrae replications at a time. The target contact force was set to 3 N according to the material stiffness. The automatic scans were performed three times for each phantom. The force and torque were recorded in real time. The spine was segmented as a rigid body and converted into 3D triangular surfaces (STL) as a rigid body using a medical-certified and commercially available segmentation software (Materialise Mimics Innovation Suite software, version 19.0, Materialise NV, Leuven, Belgium). The STL models generated through Mimics from the CT data are considered as ground truth.
To validate the 3D reconstruction, as described in [20], the 3D representation error is applied to quantitatively assess the quality of reconstructed anatomic features. Hereto, a two-stage registration is conducted. Firstly, a point-to-point coarse registration is applied, where the US reconstruction model is roughly aligned with the preoperative CT model using the manually identified landmarks on the anatomic structures. In the second stage, the iterative closest point (ICP) algorithm [21] is applied to refine the registration between those point clouds automatically. The algorithm searches the nearest point in the reconstructed model and computes the nearest neighbor distance from the reconstructed model to the CT model. The 3D representation error is measured by the root mean square error of distances between the point in the US reconstruction and the closest point in the CT model.
The scanning time varies since the size of the scanning area differs from the models. The scanning takes an average of 66 seconds on the 3D-printed phantom. Since the larger scanning area, it takes 442 seconds on the CIRS phantom. Figure 7 shows the contact force along the probe Z-axis. The average contact force along the Z-axis is 3.63 N over the three tests on the 3D-printed phantom, while the deviation ranged from 0.28 to 0.32 N. An average contact force of 2.73 ± 0.46 N was measured on the CIRS phantom. Table 2 summarizes the reconstruction results on the synthetic phantom. The mean 3D representation error is found to be within 1.10 mm for the 3D-printed phantom, while it goes up to 1.57 mm for the CIRS phantom. Figure 8 presents the 3D reconstructions with the distribution of the 3D representation error for the different measurement points. Measurements are in millimeters

Validation on human cadavers
Two human cadavers were separately placed in the prone position and fixed on two wooden boards. The scanning area was roughly 200 mm × 100 mm. This made it possible to cover all vertebrae from L1 to L5. The desired scanning force on the cadaver was set to 5 N to ensure a good acoustic coupling, as described in clinical practice [22]. Then, the procedure was repeated three times on each cadaver. For evaluation, CT scans were conducted and segmented by a technical expert. The 3D ground-truth models were generated in the same fashion as described in Sect. 3.4.    Table 3 summarizes the scanning performance of the human cadaver experiments. The scanning takes on average 227 seconds on the cadavers. The measured force along the probe Z-axis ranges from 5.37 to 5.73 N, while the standard deviation is from 0.68 to 1.44. The contact force is 5.55 ± 1.23 N on the first cadaver (marked as C1) and 5.74 ± 0.92 N on the second cadaver (marked as C2). The amplitude changed when the US probe moved along the round parts of the path, as shown in Fig. 9B. The force along the X-axis goes up to 4.00 N, while the force along the Y-axis stays around 0 N. From Fig. 10, the contours of the transverse processes and facet joints can be clearly observed. Table 4 summarizes the reconstruction accuracy of the two human cadavers. The mean 3D representation error was found to lie within 2.00 mm, while the standard deviation was below 1.00 for the six trails. The reconstruction error ranged from 1.58 to 1.65 mm on cadaver C1 and from 1.78 to 1.93 mm on cadaver C2. The results demonstrate the capability of the proposed approach to deliver high-quality segmentation and reconstruction performance.

Discussion
This work provides an analysis of a robot-assisted ultrasound system that is set up to reconstruct the lumbar spine and could potentially be used for guidance during spine surgery. The system combines a DL-based image segmentation with a hybrid robotic scanning algorithm. Experiments are carried out on synthetic phantoms and human cadavers to validate the system's capability. This paper provides a detailed analysis for future researchers and reflects the technical impact on clinical use.
For image segmentation, the average IoU is 0.73 for the synthetic phantoms and 0.65 for the human cadavers. The results demonstrate that the U-Net could provide the segmented bone contour for 3D reconstruction. Meanwhile, the segmentation results of IoU and F1 on the synthetic phantom outperform that of the cadaver, since it is easier to recognize the bone contour from the surrounding soft tissue on synthetic phantoms. However, the labeling of cadaveric images is more time-consuming. While not being the focus of this   work, other approaches, e.g., unsupervised learning [23], could also be explored to further reduce the labeling work and boost segmentation precision. Moreover, the segmentation is promising but still needs to be investigated with a larger dataset. In this work, the same cadavers were used for training and validation. To avoid group leakage ideally, different models are used for training and validation such as done in Alsinan et al. [24]. The proposed system automatically computes a dense scanning path and executes the scan while ensuring a safe contact force. Compared to previous free-hand scanning [25], the duration of the automatic scan over a similar area was reduced from 12 to 5 min. The average contact force was 3.63 ± 0.30 N and 2.73 ± 0.46 N for the 3D-printed phantom and the CIRS phantom, respectively. For cadaveric testing, the measured force was 5.64 ± 1.10 N, indicating a more noticeable variation in height. The force deviations on the synthetic phantoms were lower compared to the cadaver because of smoother and thinner skin. The results demonstrated the system could guarantee a constant force with small variations to avoid soft tissue deformation and bone displacement, which may improve the reconstruction capability. However, it is noticed that an oscillation of scanning force occurs during scanning, as the estimated stiffness might not be optimal, while the human tissue is not homogeneous.
Since surgeons base their decisions on the understanding of anatomic features, the overall 3D representation accuracy is very important to evaluate the potential of reconstructed models. The proposed rUS system yielded a 3D representation error within 1.10±0.66 mm for the 3D-printed phantom and 1.57 ± 1.07 mm for the CIRS phantom. The reconstruction of the 3D-printed phantom outperformed that of the CIRS phantom since there is less influence from surrounding soft tissue and also the height variations were less prominent in the 3D-printed phantom. Although promising results with synthetic phantoms were obtained, further steps are necessary to investigate the clinical use of the rUS system. The cadaver experiments provide a promising outlook for future studies. The reconstruction results on the human cadaver ranged from 1.59 to 1.93 mm, while the standard deviation was found to stay within 0.95. The reconstructions differ from the cadavers due to the different BMIs. Other 3D ultrasound systems have also been evaluated for specific applications. Ottacher et al. evaluated the free-hand US reconstruction with 1.1 ± 1.1 mm accuracy on 3D-printed phantoms [6]. Loannou et al. studied surface area measurements of an in vitro fetal phantom [26]. The median percent error ranged from 2.0 to 3.4 mm. Compared to the previous reconstruction systems, our proposed approach provides, however, only preliminary results on bench-top and preclinical models, leaving the need for later future research.
The US reconstruction could provide anatomic features and intraoperative navigation for surgeons. On the one hand, the preoperative CT model and surgical plan could be registered into the robot frame and assist the next procedure [27]. On the other hand, the surgeons could also identify the landmark and make an intraoperative surgical plan based on the reconstructed posterior surface of the spine [28].
In view of technical integration, the scope of this work is implementing an rUS system with detailed specifications and conducting quantitative validation of the proposed approach. In view of clinical integration, this study aims to provide a feasibility assessment in a pre-clinical setting, with special attention to detail the experimental conditions in laboratory and cadaver. Overall, the proposed system performs stable scanning and accurate reconstruction on both synthetic phantoms and human cadavers. This work helps bridging, while not fully closing, the gap toward use in clinical practice.
Nevertheless, there are some shortcomings in the current approach. First, the presented approach is not fully automatic. Second, physiologic aspects need to be taken into account. The incorporation of respiratory motion will need to be tackled next (e.g., by sensor fusion with depth/force sensing). Third, stiffness and deformations may differ. All these aspects need to be tackled before in vivo validation can be considered.
Funding This project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under grant agreement no. 101016985 (FAROS) and Flemish Research Foundation (FWO) under grant agreement no. G0A1420N (Radar-spine) and no. 1S36322N (Harmony).

Conflict of interest
The authors have no conflicts of interest to declare.
Ethics approval Ethical approval BASEC no. 202101196 was granted by the Cantonal Ethical Committee of the Canton of Zurich, Switzerland.
Informed consent This study does not involve human participants.