# Dual-Phase Tapped-Delay-Line Time-to-Digital Converter With On-the-Fly Calibration Implemented in 40 nm FPGA

Jun Yeon Won, Student Member, IEEE, Sun Il Kwon, Member, IEEE, Hyun Suk Yoon, Guen Bae Ko, Student Member, IEEE, Jeong-Whan Son, Student Member, IEEE, and Jae Sung Lee

Abstract—This paper describes two novel time-to-digital converter (TDC) architectures. The first is a dual-phase tapped-delayline (TDL) TDC architecture that allows us to minimize the clock skew problem that causes the highly nonlinear characteristics of the TDC. The second is a pipelined on-the-fly calibration architecture that continuously compensates the nonlinearity and calibrates the fine times using the most up-to-date bin widths without additional dead time. The two architectures were combined and implemented in a single Virtex-6 device (ML605, Xilinx) for time interval measurement. The standard uncertainty for the time intervals from 0 to 20 ns was less than 12.83 ps-RMS (root mean square). The resolution (i.e., the least significant bit, LSB) of the TDC was approximately 10 ps at room temperature. The differential nonlinearity (DNL) values were [-1.0, 1.91] and [-1.0, 1.88] LSB and the integral nonlinearity (INL) values were [-2.20, 2.60]and [-1.63, 3.93] LSB for the two different TDLs that constitute one TDC channel. During temperature drift from 10 to 50°C, the TDC with on-the-fly calibration maintained the standard uncertainty of 11.03 ps-RMS.

*Index Terms*—Clock distribution network, field-programmable gate array (FPGA), multi-phase clock, on-the-fly calibration, time measurement, time-of-flight (TOF), time-of-flight positron emission tomography (TOF PET), time-to-digital converter (TDC), Virtex-6.

## I. INTRODUCTION

T HE measurement of timestamps that correspond to physical events with high resolution is important in many nuclear science experiments and applications, particularly for time-of-flight (TOF) detectors. For example, a TOF mass spectrometer resolves the mass of a charged particle based on

J. Y. Won, H. S. Yoon, G. B. Ko, and J.-W. Son are with the Department of Nuclear Medicine and Biomedical Sciences, Seoul National University, Seoul 110-744, Korea.

S. I. Kwon is with the Department of Nuclear Medicine, Seoul National University, Seoul 110-744, Korea.

J. S. Lee is with the Department of Nuclear Medicine, Biomedical Sciences and Institute of Radiation Medicine, Seoul National University, Seoul 110-744, Korea (e-mail: jaes@snu.ac.kr).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TBCAS.2015.2389227

arrival time measurements [1]. A time-of-propagation detector was designed in combination with the Cherenkov technique to identify particles more precisely [2], [3]. In TOF positron emission tomography (TOF PET) with sub-ns coincidence resolving time, the arrival time differences of the photon pairs improve lesion detectability while simultaneously reducing scan times and/or radiation doses [4]–[8].

Precise time measurements for TOF detectors can be conducted using a time-to-digital converter (TDC). In the early days, the classical methods of time stretching and time-to-amplitude conversion combined with use of an analog-to-digital converter (ADC) were widely used for time measurements. With the subsequent development of integrated circuit technologies, fully digital TDCs based on application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) have been widely used because they are less sensitive to environmental fluctuations and offer high conversion rates [9].

Among the available fully digital TDCs, the FPGA-based TDC has the advantages of low development costs and relatively fast development times. Most FPGA-based TDC architectures use tapped delay lines (TDLs) [10]–[13] or Vernier delay lines (VDLs) [14]–[16]. These architectures use a delay line as a time interpolator, and the propagation delay of an element (in the TDL case) or the difference between delays (in the VDL case) therefore determines the resolution, and the uniformity of the delays determines the TDC nonlinearity. Therefore, the continuing improvement of the FPGA manufacturing process has led to finer TDC resolution because of the fast propagation of the delay cells and thus provides less standard uncertainty. Research on TDL TDCs using Virtex-4 (90 nm process), Virtex-5 (65 nm process), and Virtex-6 FPGA devices (40 nm process) achieved 50 ps, 17 ps, and 10 ps TDC resolutions (least significant bit, LSB) respectively, and the standard uncertainty values of the time interval measurements were 25 ps, 24.2 ps, and 19.6 ps-RMS, respectively [10]–[12].

However, unlike ASIC-based TDCs, most FPGA-based TDCs suffer from high nonlinearity because of innate delay differences. Additionally, operating voltage and temperature drift cause bin widths of both ASIC- and FPGA-based TDCs to be unstable.

One approach that mitigates this nonlinearity is shortening of the delay line using a fast reference clock [12] or a multi-phase clock [13]. A TDL-based TDC implemented in a Virtex-6 used 165 taps, with a resolution of 10 ps, along with a fast reference

1932-4545 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received June 16, 2014; revised October 10, 2014; accepted December 28, 2014. Date of publication March 10, 2015; date of current version February 22, 2016. This work was supported by grants from the National Research Foundation of Korea (NRF) funded by the Korean Ministry of Science, ICT and Future Planning (Grant NRF-2014M3C7034000), and the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Korea (Grant HI14C1135). This paper was recommended by Associate Editor D. Ham.

| Reference  | Method | Chip, Process    | TDC characteristics |                              |                   |                   | Calibration           |                  |
|------------|--------|------------------|---------------------|------------------------------|-------------------|-------------------|-----------------------|------------------|
|            |        |                  | LSB (ps)            | Standard<br>uncertainty (ps) | DNL (LSB)         | INL (LSB)         | Calibration frequency | LUT<br>generator |
| [10]       | TDL    | Virtex-4, 90 nm  | 50                  | 25                           | [-0.4, 1.4]       | [-1.3, 1.7]       | Discontinuous         | FPGA             |
| [11]       | TDL    | Virtex-5, 65 nm  | 17                  | 24.2                         | [-1, 3.55]        | [-2.99, 2.58]     | Discontinuous         | FPGA             |
| [12]       | TDL    | Virtex-6, 40 nm  | 10                  | 19.6                         | [-1, 1.5]         | [-2.25, 1.61]     | Discontinuous         | PC               |
| [13]       | TDL    | Spartan-3, 90 nm | 45                  | < 70                         | -0.9 <sup>a</sup> | -2.1 <sup>a</sup> | Discontinuous         | FPGA             |
| [14], [16] | VDL    | pASIC 1, 0.65 μm | 200                 | -                            | [-0.47, 0.44]     | [-0.2, 1.28]      | Discontinuous         | PC               |
| [15]       | VDL    | pASIC 2, 0.65 μm | 100                 | 70                           | 1.94 <sup>b</sup> | 2.08 <sup>b</sup> | Discontinuous         | PC               |
| [18]       | -      | -                | -                   | -                            | -                 | -                 | Semi-continuous       | FPGA             |
| This work  | TDL    | Virtex-6, 40 nm  | 10                  | 12.8                         | [-1, 1.91]        | [-2.20, 3.93]     | Continuous            | FPGA             |

TABLE I TDC COMPARISON TABLE

TDL = tapped delay line, VDL = Vernier delay line. The standard uncertainty is evaluated using the maximum standard deviation of the time interval measurements. <sup>a</sup>extreme value, <sup>b</sup>maximum value. The discontinuous calibration updates the calibration LUT intermittently. The semi-continuous calibration renews bin width information in succession, but sequential LUT update takes multiple clock cycles for a newly booked code to be integrated into the calibration LUT. The continuous calibration updates both bin width information and the calibration LUT in just a few clock cycles without dead time penalty.

clock at 600 MHz [12]. A two-stage interpolator consisting of a four-phase clock and a single TDL can reduce the minimum TDL delay to be as short as one-fourth of the clock period [13].

Another approach involves identification of the nonlinear characteristics of the TDC and compensating for this nonlinearity. The general calibration method used is bin-by-bin estimation [10]-[18]. Bin-by-bin estimation is based on a statistical method called a 'code density test'. When the random hits in a uniform distribution in time are asserted into the TDC, the number of hits collected into a single bin is proportional to the individual bin width. However, many bin calibration processes are microprocessor-aided [12], [14]-[16] or require processing dead time to obtain the calibration look-up table (LUT) [13]–[16]. An automatic and semi-continuous calibration method that uses random hits as a calibration sources and generates the calibration LUT sequentially using internal RAM block and accumulator has the advantage of no service dead time when updating the LUT [18]. However, sequential LUT update can take multiple clock cycles for newly booked bin width information to be integrated into the calibration LUT.

In Table I, the characteristics of this work are compared with those of other FPGA-based TDCs.

In this paper, we propose two main TDC architectures. The first is a dual-phase TDL TDC architecture that considers the clock distribution network. The second is a novel pipelined on-the-fly calibration method using a prefix adder. The dual-phase TDL TDC consists of a duty cycle estimator and two parallel TDLs. The duty cycle estimator measures the duty cycle generated by a mixed-mode clock manager (MMCM). The two TDLs cover the different halves of the clock period to interpolate the sub-clock period time. This architecture enables implementation of the TDL TDC with a moderate reference clock but minimizing the clock skew problem of the delay line. The novel pipelined on-the-fly calibration method calibrates the fine codes while the TDC measures the arrival times. The pipelined and parallel prefix adder generates the calibration LUT using the most up-to-date bin width information in just a few clock cycles without any additional dead time.

The considerations with regard to the clock distribution network and the dual-phase TDL TDC architecture will be described in Section II. In Section III, the pipelined on-the-fly calibration method using a prefix adder will be proposed. Evaluation of the TDC performance, including the standard uncertainty for variant time intervals, the differential nonlinearity (DNL), and the integral nonlinearity (INL), will be described in Section IV. Results that show the TDC implemented with on-the-fly bin calibration maintains TDC performance during temperature drift will also be given. Additionally, the TOF measurements conducted with the developed TDC will be given in Section IV. In Section V, our conclusions will be presented.

## II. DESIGN

In this section, considerations with respect to the clock distribution network of a Virtex-6 evaluation board (ML605, Xilinx), particularly for the clock region and the clock skew, along with the TDL TDC design constraints, the architecture of the dualphase TDL TDC, and the operating principle of the dual-phase TDL TDC (i.e., TDL selection and arrival time calculation) will be discussed.

# A. Considerations for the Clock Distribution Network

At present, most FPGA-based TDL TDCs use a carry chain as a delay line [10]–[13]. The propagation states along the TDL are sampled by the flip-flops, yielding a corresponding thermometer code (e.g., 11100000...).

Ideal conditions in the TDL TDC would require all delay elements to yield the same delay and the flip-flops to sample the propagation states simultaneously. It is impossible to meet these ideal conditions with an FPGA-based TDC, and thus FPGAbased TDCs suffer from high nonlinearity.

One approach to mitigate this nonlinearity is to minimize the clock skew between the sampling flip-flops. In the case of the Virtex-6 FPGA, use of the sampling flip-flops in the same clock region shortens the clock skew. As shown in Fig. 1(a), the ML605 contains 12 clock regions (from clock region X0Y0 to clock region X1Y5), and each clock region is 40 configurable logic blocks (CLBs) high [19]. Because the clock skew between adjacent CLBs is 0–2 ps for CLBs within a single clock region, but is more than 100 ps for CLBs that are located in different clock regions, as shown in Fig. 1(b) and (c), the clock region crossing of the TDL can be a source of nonlinearity [11],



Fig. 1. Clock distribution network of an ML605 device. (a) The ML605 device has a total of 12 clock regions and each clock region is 40 CLBs high (modified from [19]). The global clock buffer (center triangle) can drive all clock regions. A 960-bin TDL was located through six clock regions to analyze the effect of clock skew on the TDL. (b) Clock delay for CLBs located along the y-axis. (c) Clock skew between adjacent CLBs located along the y-axis. Note that clock region crossing introduces clock skew of more than 100 ps.

[12], [19]. For quantitative analysis of the effect of clock skew on the TDL TDC, we located a 960-bin delay line through six clock regions (from clock region X0Y0 to X0Y5) along the y-axis using 240 CLBs (four bins per CLB; from CLB X68Y0 to CLB X68Y239) and drove the sampling flip-flops with a global clock, as shown in Fig. 1(a). Random hits generated by detecting gamma rays irradiated from a <sup>22</sup>Na radioactive point source were used as a random source for a code density test. The experimental setup is discussed in detail in Section IV-A. The total number of valid events was 1,033,480.

As shown in Fig. 2, the numbers of events corresponding to the last bin of a long-clock-delay region (i.e., the 159th bin of clock region X0Y0 and the 319th bin of clock region X0Y1) are much higher than those corresponding to other bins. Additionally, invalid thermometer codes, including a bubble pattern like "propagated, unpropagated, propagated, unpropagated" (e.g., ... 11001110...), were observed in the vicinity of the last bin of a short delay region (i.e., the 639th bin of clock region X0Y3 and the 799th bin of clock region X0Y4), meaning that no valid fine codes were yielded from the 636th bin to the 641st bin and from the 793rd bin to the 803rd bin. These two phenomena introduce the nonlinearity; the former introduces the high positive DNL, and the latter introduces the continuous negative DNL.

As shown in Fig. 3(a), in the case where the TDL crosses from the long- to the short-clock-delay region, there is a period that is the length of the clock skew ( $T_{skew}$ ) where the flip-flops in the short-clock-delay region sample the propagation states, but those in the long-clock-delay region do not sample the propagation states. If a transition propagates through the short-clock-region during this period, then the propagation states stored in the flip-flops in the short-clock-delay region are "unpropagated" sequences (i.e., 0000...), even though the transition has already propagated. A few moments later, those in the long-clock-delay region sample the propagation states, in which the last bin of the long-clock-delay region is "propagated" (i.e., ... 1111). There-



Fig. 2. Code density test results for a 960-bin TDL (along the y-axis) implemented on the ML605 device.

fore, the last bin of the long-clock-delay region corresponds to many more events than the other bins, such as the 159th bin and the 319th bin.

As shown in Fig. 3(b), in the case where the TDL crosses from the short- to the long-clock-delay region, the flip-flops of the short-clock-delay region near the boundary can sample the valid propagation pattern of "*propagated*, *unpropagated*" (e.g.,

... 1100). However, a few moments (i.e.,  $T_{skew}$ ) later, those in the long-clock-delay region also sample the propagation pattern of "propagated, unpropagated" (e.g., 1100...), because the transition continues to propagate along the TDL. Therefore, the invalid thermometer pattern of "propagated, unpropagated, propagated, unpropagated" (e.g., ... 11001100...) appears in the vicinity of the boundary between the short- and the long-clock-delay region, such as at the 639th and 799th bins.

In the middle of the ML605 die, there is no significant clock skew between the different clock regions, and thus there are no large sources of positive and negative DNLs. As shown in Fig. 2, bins located near the boundary of the middle clock region,



Fig. 3. Timing diagram in the case of clock region crossing. The global clock buffer is located near the short-clock-delay region. (a) Crossing from the long- to the short-clock-delay region. (b) Crossing from the short- to the long-clock-delay region.



Fig. 4. TDC transfer function when considering the clock skew. The dotted line indicates the ideal TDC transfer function and the solid line indicates the practical TDC transfer function. (a) Clock region crossing from the long- to the short-clock-delay region. (b) Clock region crossing from the short- to the long-clock-delay region.

such as the 479th bin and the 480th bin, do not introduce high nonlinearity.

In the TDC transfer function, as shown in Fig. 4, in which  $t_{\rm p}$ ,  $t_{\rm s}$ , and  $T_{\rm LSB}$  correspond to the propagation time along the delay line, the relative sampling time, and the average propagation time of the delay element, respectively, early sampling shifts the transfer function of the short-clock-delay region to the right by the time  $T_{\rm skew}$ , and thus introduces the high positive DNL as shown in Fig. 4(a). On the other hand, late sampling shifts the transfer function of the long-clock-delay region to the left by the time  $T_{\rm skew}$ , causing the ambiguity in encoding of the corresponding fine code. Therefore, because  $T_{\rm skew}$  being longer than  $T_{\rm LSB}$  introduces either high positive DNL or continuous negative DNL, a careful design that considers the clock distribution network is essential for implementation of a low-nonlinearity TDL TDC.

## B. Constraints for TDL TDC Design

Most TDL TDC architectures consist of a coarse counter and a fine time interpolator [10]–[13]. The coarse counter, which is driven by a reference clock, measures the arrival time with the resolution of the clock period, and the fine time (i.e., the sub-clock-period time) is interpolated by the fine time interpolator. To measure the arrival time without the interpolation loss, the fine time dynamic range should be longer than the coarse counter resolution.

In the case of the Virtex-6 device, it is difficult to implement a TDL TDC within a single clock region. In the ML605 hardware, the maximum clocking frequency with low clock jitter is 600 MHz (1.67 ns) and each carry chain in a single clock region is 160 bins (40 CLBs) high. If we consider that the average bin width is 10 ps, then the maximum fine time dynamic range using



Fig. 5. Architecture of a dual-phase TDL TDC and pipelined on-the-fly calibrator.

a single TDL is 1.6 ns; this is shorter than the clock period of 1.67 ns, and there is an interpolation loss as high as 70 ps. A minimum of 167 bins are required at the 600 MHz reference clock, and it is therefore difficult to avoid the clock region crossing problem that leads to the high positive or continuous negative DNL, except for in the middle of the ML605 die, as stated above.

#### C. Dual-Phase TDL TDC Architecture

For implementation of a TDL TDC within a single clock region using a moderate clock frequency, we propose the dualphase TDL TDC.

As shown in Fig. 5, a TDC channel consists of a coarse counter and a fine time interpolator. The fine time interpolator uses two TDLs;  $TDL_0$  is driven by a 400 MHz reference clock  $(CLK_0)$  and  $TDL_{180}$  is driven by a 180° out-of-phase clock (CLK<sub>180</sub>). A built-in 200 MHz differential oscillator (SiT9102, SiTime) provides the input clock to the MMCM that generates the high-performance clocks of CLK<sub>0</sub> and CLK<sub>180</sub>. Each TDL consists of 128 bins using 32 CLBs and two TDLs are located in parallel within the same clock region. When a hit is asserted into a TDC, the hit is split and fed into the two TDLs almost simultaneously. The flip-flops of  $TDL_0$  and  $TDL_{180}$  sample the propagation states at the rising edges of  $CLK_0$  and  $CLK_{180}$ , respectively, and thus two different 128-bit thermometer codes are obtained for a single event. A set of multiplexers selects one of the propagation states from  $TDL_0$  or  $TDL_{180}$  depending on the state of CLK<sub>0</sub>. The clock state detector immediately stores the logic state of  $CLK_0$  (logical high or logical low) when the hit arrives; this one-bit code, S<sub>TDL</sub>, serves to select the TDL to interpolate the fine time. The principle of TDL selection will be discussed in detail in the next subsection (Section II-D). The fine code encoder converts a selected 128-bit thermometer code into a 7-bit binary fine code. The on-the-fly bin calibrator consists of a duty cycle estimator, a bin width estimator, a prefix adder, and a calibrated fine time calculator. The details of these modules will be discussed in Section III.

The coarse counter driven by  $CLK_0$  generates the coarse time with 2.5 ns resolution. Therefore, in this architecture, the two 128-bin delay lines driven by  $CLK_0$  and  $CLK_{180}$  provide the same effective dynamic range for the fine time as that of a 256-bin delay line without clock region crossing. The 10-ps delay bins and the effective 256-bin delay line allow a dynamic range of 2,560 ps that covers the single clock period of 2.5 ns.

#### D. Principles of TDL Selection and Arrival Time Calculation

Measurement of the sub-clock-period time is performed using the duty cycle estimator and two TDLs. The duty cycle estimator measures the period of  $CLK_0$  as a logical high. This period is the estimated duty cycle,  $T_d$ , as shown in Fig. 6.

In the principle of TDL selection, two cases should be considered. In the case where the hit arrives at a TDC when  $CLK_0$  is at a logical low, as shown in Fig. 6(a), the flip-flops of  $TDL_0$  and  $TDL_{180}$  are in the states sampled at  $L_1$  and  $L_2$ , respectively. Only the flip-flops of  $TDL_0$  have the valid propagation states required to interpolate the fine time  $t_{f0}$ . The coarse count is obtained at the rising clock of  $CLK_0$  immediately after the hit arrives, and the corresponding coarse count N yields the coarse time  $N \times T_0$ . The arrival time  $t_A$  is therefore derived from  $N \times T_0 - t_{f0}$ . In the case where the hit arrives at a TDC when  $CLK_0$  is at a logical high, as shown in Fig. 6(b), the flip-flops



Fig. 6. Principle of TDL selection and arrival time calculation. The timing diagram and the sampled states of two TDLs are shown. The filled states indicate propagated states and the blank states indicate the unpropagated states. (a) The case where a hit is asserted when  $CLK_0$  is at a logical low. (b) The case where a hit is asserted when  $CLK_0$  is at a logical high.

of  $TDL_0$  and  $TDL_{180}$  are in the states sampled at  $L_3$  and  $L_4$ , respectively. In this case, the sampled propagation states of  $TDL_0$ are either valid or invalid because  $t_{f0}$  can be longer than the total delay time of a single TDL  $(T_{\rm P})$ , as shown in Fig. 6(b). Note that the dynamic range of a single TDL is insufficient to meet the constraint for the TDL TDC in the given implementation environment, particularly in the FPGA, as discussed in the previous section (Section II-B). However, the sampled propagation states of  $TDL_{180}$  can be valid and provide the fine time  $t_{f180}$ , as shown in Fig. 6(b). Additionally,  $t_{f180}$  is always shorter than  $t_{f0}$ , which means that it has a low accumulated interpolation error. Therefore,  $t_{f180}$  is used when  $CLK_0$  is at a logical high. The fine time is then derived as the sum of the period during which the  $CLK_0$  is at a logical low  $(T_0 - T_d)$  with  $t_{f180}$ . As per the previous case, the coarse count is obtained from the rising clock of  $CLK_0$  immediately after the hit arrives, and the corresponding coarse count N yields the coarse time  $N \times T_0$ . The arrival time  $t_{\rm A}$  is therefore derived from  $N \times T_0 - (t_{\rm f180} + T_0 - T_{\rm d})$ .

## III. PIPELINED ON-THE-FLY CALIBRATION

Both the FPGA core voltage and the ambient temperature are closely associated with the overall TDC performance. In the case of the ML605, the core voltage is stabilized by a power module (PTD08A020WAD, Texas Instruments). However, ambient temperature drift can degrade the TDC performance. Therefore, real-time calibration is essential to maintain the TDC performance. Continuous calibration LUT update can compensate for not only the innate delay differences but also fluctuation in nonlinear characteristics of TDC due to the environmental drift.

Here, we propose the pipelined on-the-fly calibration method using the prefix adder. The calibration is conducted in two parallel steps.

The first step is the bin width identification based on the code density test; it is implemented using a fixed-depth FIFO (first in, first out) memory and a set of binary counters serving as a bin width estimator. Each bin corresponds to one counter, the collected value of which is proportional to its own bin width. As shown in Fig. 7, when the hit arrives at the TDC, the TDC yields the fine code. The new fine code is then enqueued into a FIFO while increasing the value of the corresponding counter by one. Then, the oldest stored fine code is dequeued while decreasing the value of the corresponding counter by one. Using this process, the FIFO stores the most recent fine codes and the counters have up-to-date bin width information. Therefore, a D-depth FIFO and an N-count bin yields a bin width w equal to  $(N/D) \times T_0$  at a given reference clock period  $T_0$ . For example, if the depth of the FIFO is 20,000 and the clock period is 2,500 ps, then the bin width of a 256-count bin is  $256/20,000 \times$ 2,500 ps = 32 ps. However, note that the bin width identification guarantees the calibration accuracy when a sufficient number of fine codes are booked in the FIFO.

The second step is the calibration LUT update, along with the continuous bin width identification. As shown in Fig. 8, the pipelined and parallel prefix adder integrates bin widths, stored in a bin width estimator, to generate the calibration LUT. This process takes  $2 \times \log_2(\text{number of bins})-1$  clock cycles. The fine time  $t_f$  is then calibrated to the center of the TDC bins (1).

$$t_{\rm f_i} = \frac{w_{\rm i}}{2} + \sum_{\rm j=0}^{\rm i-1} w_{\rm j} = \sum_{\rm j=0}^{\rm i} w_{\rm j} - \frac{w_{\rm i}}{2} \tag{1}$$

In (1),  $\sum_{j=0}^{i} w_j$  is obtained using the prefix adder and  $w_i/2$  can be obtained using the one-bit right-shift operator.

This novel calibration method was applied to our dual-phase TDL TDC. As shown in Fig. 5, a TDC channel has a bin calibrator that consists of a fixed-depth FIFO, two sets of 128 binary counters used to store the bin width information of  $TDL_0$ and  $TDL_{180}$ , and one binary counter used for the duty cycle estimator. When a hit arrives at the TDC, the fine code encoder yields the 7-bit fine code and the clock state detector yields the one-bit code  $S_{TDL}$  to indicate the TDL that has been selected to interpolate the fine time. These new codes are concatenated as an 8-bit code and are enqueued into the FIFO while the oldest code is dequeued. As stated above, these 8-bit codes correspond to two sets of 128 binary counters and provide the up-to-date bin width information. Additionally,  $S_{TDL}$  is used to estimate the duty cycle. If S<sub>TDL</sub> is at a logical high, it is involved in increasing or reducing the value of the duty cycle estimator. Using this process, the duty cycle estimator then provides up-to-date duty cycle information. Using the bin width information, two prefix adders can then generate the fine time, i.e., either  $t_{f0}$  or  $t_{f180}$ . The calibrated fine time calculator yields the calibrated fine time of either  $t_{f0}$  when  $S_{TDL}$  is at a logical low or  $t_{f180} + T_0 - T_d$  when  $S_{TDL}$  is at a logical high.



Fig. 7. Architecture for pipelined on-the-fly bin calibration. Bin width identification uses a FIFO memory and a bin width estimator (i.e., a set of binary counters). The prefix adder integrates bin width information and yields the calibration LUT. Among results of the prefix adder, the calibrated fine time corresponding to a new fine code is then selected.



Fig. 8. Example of pipelined and parallel prefix adder (Brent-Kung adder). The prefix adder updates all the calibration LUT simultaneously.

The main advantage of pipelined on-the-fly calibration is that the measured data are used to renew the calibration LUT immediately without incurring any additional dead time penalty. The other notable advantage of the process appears through the use of the dual-phase TDL TDC architecture; the duty cycle can be estimated more precisely than the other bin widths. At a given depth D in the FIFO, the average count of the duty estimator is expected to be 'D/2' and is much larger than 'D/number of bins', which is the average count of the bin width estimator. Statistically, because the uncertainty of an N-count measurement is  $1/\sqrt{N}$ ,  $T_{\rm d}$  estimation is more precise than bin width estimation. Precise estimation of the duty cycle using a simple binary counter can improve the standard uncertainty.

#### IV. MEASUREMENTS

## A. Experimental Setup

We implemented a two-channel dual-phase TDL TDC in an ML605 device for the time interval measurements; each TDC channel generated timestamps for physical events, and the relative time intervals were calculated. The random hits were generated using a <sup>22</sup>Na point source, a scintillation detector, and auxiliary electronics. The <sup>22</sup>Na point source emits radiation with a uniform time distribution. The scintillation detector, which consists of a Hamamatsu R9800 photomultiplier (PMT) combined with a LYSO scintillation crystal, converts a gamma event into an electrical signal. The auxiliary electronics, which contain a timing discriminator (N840, CAEN), a fan-in/fan-out unit (N401, CAEN), and translator units (N89, CAEN) in order, convert an electrical signal to two copies of FPGA-compatible digital hits. Before two hits were asserted to two TDC channels, respectively, a dual delay unit (N108A, CAEN) was added to provide the known time intervals between two hits. The ML605 device was located in a temperature-controlled box when conducting the code density test and temperature drift test.

#### **B.** Time Interval Measurements

The time intervals from 0 to 20 ns in steps of 0.5 ns were measured by two TDC channels 'with calibration' and 'without calibration'. In the 'without calibration' measurements, we assumed that all bins had the same bin width equal to  $T_{LSB}$ . For



Fig. 9. Time histograms of the variant time intervals, both with and without calibration. The mean and the standard deviation of the measurements were obtained by applying a Gaussian fit to the measurements. The mean and the standard deviation measured by the TDC with calibration and the mean measured by the oscilloscope are noted on the time histogram. (a) Both hits arrive within one reference clock period. (b) The second hit arrives one clock cycle later. (c) The second hit arrives multiple clock cycles later.



Fig. 10. (a) Measured time intervals by the TDC and the oscilloscope. (b) Standard uncertainty for the variant time intervals both with and without calibration.

each time interval, we collected 51,200 samples. The same time intervals were also measured using a 10 GSa/s oscilloscope (DSO9064A, Agilent) as a gold standard. We evaluated the 'difference in means' measured by TDCs and the oscilloscope. Additionally, we characterized the 'standard uncertainty' using the maximum standard deviation of the time interval measurements.

Fig. 9 shows the time histograms for both hits arriving within one reference clock period [Fig. 9(a)], the second hit arriving one clock cycle later [Fig. 9(b)], and the second hit arriving multiple clock cycles later [Fig. 9(c)]. We compensated for the propagation delay differences between the two TDC channels induced by the external devices and cables. Fig. 10 shows the measured time intervals [Fig. 10(a)] and the standard uncertainty for time intervals both with and without calibration [Fig. 10(b)]. The maximum difference in means values were 9.5 ps with calibration and 19.2 ps without calibration. The standard uncertainty were 12.83 ps-RMS with calibration and 21.99 ps-RMS without calibration. The pipelined on-the-fly calibration improved the standard uncertainty.

# C. Differential Nonlinearity and Integral Nonlinearity

Following the code density test, we measured the individual bin widths w and  $T_{LSB}$ . We used 102,400 samples obtained at 20°C. The DNL was calculated as follows:

$$DNL_{i} = \frac{w_{i} - T_{LSB}}{T_{LSB}}.$$
(2)

To characterize the INL, we used the end-point INL, which was calculated as follows:

$$INL_{i} = \sum_{k=0}^{i} DNL_{k}.$$
 (3)

TDL<sub>0</sub> and TDL<sub>180</sub> were characterized separately. As shown in Fig. 11, the DNLs of TDL<sub>0</sub> and TDL<sub>180</sub> were [-1.0, 1.91]and [-1.0, 1.88] LSB, respectively. The INLs of TDL<sub>0</sub> and TDL<sub>180</sub> were [-2.20, 2.60] and [-1.63, 3.93] LSB, respectively, as shown in Fig. 12.  $T_{\rm LSB}$  was 10.08 ps. The





Fig. 12. (a) INL of TDL<sub>0</sub>. (b) INL of TDL<sub>180</sub>.



Fig. 13. Temperature dependence of the bin widths.

positive DNL and the negative DNL appear alternately because of the unbalanced propagation delay of the carry primitive; the odd-numbered bins were wider than the even-numbered bins except for bin at the FPGA horizontal clock spine. The horizontal clock spine, crossing delay line perpendicularly, increases the carry interconnection delay between carry primitives. In the timing parameter for a simulator, the carry interconnection delay crossing the horizontal clock spine is 23 ps, in the other case, the interconnection delay parameter is 0 ps. This interconnection delay is added to the every fourth bin. Therefore, the width of bin at the clock spine becomes longer by the interconnection delay. However, because the innate delay of every fourth bin is shorter than other odd-numbered bins, and thus the DNL introduced by the horizontal clock spine was not significant; the maximum DNL obtained at the horizontal clock spine was 1.59 LSB.

Use of the dual-phase TDL TDC architecture allows us to implement the TDL TDC with minimizing the clock skew problem, and thus there was no high positive DNL longer than +2 LSB or continuous negative DNL (i.e., no successive missing bins).

# D. Temperature Drift Test

To verify the pipelined on-the-fly calibration, we conducted code density tests and time interval measurements under the varying ambient temperatures. We changed the ambient temperature from 10 to 50°C in steps of 5°C and obtained 51,200 samples for each measurement. For two-channel operation, the FPGA die temperature was higher than the ambient temperature by  $8 - 9^{\circ}$ C. The core voltage was stabilized but decreased with the temperature increase, with values of 1.013 V at  $10^{\circ}$ C, 1.011 V at 20°C, 1.009 V at 30°C, 1.006 V at 40°C, and 1.004 V at 50°C.

As the results of the code density test, the bin-width tendency was almost unchanged, as shown in Fig. 13. However,  $T_{LSB}$ increased with increasing ambient temperature, from 9.92 ps at  $10^{\circ}$ C to 10.08 ps at  $20^{\circ}$ C, 10.08 ps at  $30^{\circ}$ C, 10.25 ps at 40°C, and 10.33 ps at 50°C. The reduced electron mobility and core voltage results in lower propagation speeds along the carry chain and therefore  $T_{\text{LSB}}$  becomes longer. The duty cycle was also stable, with values of 1.230 ns at 10°C, 1.230 ns at 20°C, 1.227 ns at 30°C, 1.227 ns at 40°C, and 1.228 ns at 50°C. The increase in  $T_{\rm LSB}$  may deteriorate the quantization noise. However, this degradation was negligible due to the small increase in  $T_{\text{LSB}}$  for a wide temperature range. On the other hand, in the case where  $T_{\text{LSB}}$  decreases, the fine time dynamic range also shrinks. Therefore, the constraints for TDL TDC should be considered. In our case, two 128-taps TDLs were sufficient to cover a single reference clock period even at 10°C.

The time intervals, fixed to be zero, were measured in three different conditions. The first condition was with the active on-the-fly calibrator ('real-time calibration'). In this condition, the bin calibrator continuously compensates for the nonlinearity. The second and third conditions involved the on-the-fly calibrator being disabled ('non-real-time calibration'). Under these conditions, the bin calibrator generated the calibration LUT at specific temperatures (10 and 50°C in the second and third sets of measurements, respectively) and did not update the calibration LUT, even though the temperature changed. This calibrator operating mode conversion was performed by disabling the binary counters that serve as the duty estimator and the bin width estimator.

Although the duty cycle and the bin width tendency remained almost consistent, regardless of the temperature, only the TDC with 'real-time calibration' maintained the standard uncertainty, as shown in Fig. 14. Fig. 15 shows that the standard uncertainty for the time intervals conducted with 'real-time calibration' was less than 11.03 ps-RMS. In contrast, the standard uncertainty of the other measurements that were conducted with 'non-realtime calibration' increased up to 43.83 ps-RMS.

Therefore, real-time calibration is essential to maintain TDC performance during temperature drift. Additionally, because the parallel prefix adder can generate the calibration LUT based on



Fig. 14. Time histograms with real-time calibration and non-real-time calibration. (a) In the second set of measurements with non-real-time calibration, the time intervals were measured at  $50^{\circ}$ C and the calibration LUT was obtained at  $10^{\circ}$ C. (b) In the third set of measurements with non-real-time calibration, the time intervals were measured at  $10^{\circ}$ C and the calibration LUT was obtained at  $50^{\circ}$ C.



Fig. 15. Standard uncertainty for the time interval under varying ambient temperatures. Only the TDC with real-time calibration maintained the standard uncertainty.

all the most up-to-date bin widths, on-the-fly calibration based on the parallel prefix adder can compensate for the rapid drift.

# E. Application to TOF PET Detectors

Developed FPGA-based TDC was applied to the TOF measurement in a prototype TOF PET detector which consists of a Hamamatsu H10966A-100 PMT and a 15  $\times$  15 array of L<sub>0.95</sub>GSO (3 mm  $\times$  3 mm  $\times$  20 mm) scintillation crystals. The reference detector for the coincidence timing measurement was an R9800 PMT coupled with a single LYSO (4 mm  $\times$  4 mm  $\times$  10 mm) crystal which has the single timing resolution of 255 ps. Trigger signals were generated through a leading edge discriminator by comparing the last dynode signal of PMT with the predetermined threshold voltage and fed into the FPGA-based TDC to measure the differences of event arrival times.

The coincidence resolving time (CRT) of two detector modules, estimated by fitting the time difference distribution to the Gaussian function, was 350 ps (full-width at half maximum). Thus the CRT between two H10966A-100 PMT based PET detectors is estimated to be 340 ps.

TOF PET with CRT of 340 ps can improve the image signalto-noise ratio by a factor of 2.62 compared with non-TOF PET for the patient with a 35-cm effective diameter [6], and thus lead to better lesion contrast clinically [5]. In other clinical aspects, TOF PET can also reduce radiation doses to the patients and/or scan times, enhancing comfort for the patient.

# V. CONCLUSION

In this paper, we proposed two main TDC architectures.

The first architecture was the dual-phase TDL TDC architecture. We analyzed the effects of clock skew on a TDL TDC through quantitative analysis (the code density test) and the TDC transfer function. A sampling time difference between the adjacent sampling flip-flops that is longer than  $T_{\rm LSB}$  introduces high positive DNL or continuous negative DNL. The dual-phase TDL TDC architecture allowed implementation of a TDL TDC in an ML605 device with minimizing the TDL's clock skew problem. Additionally, this architecture enabled moderate clock frequency for the sampling flip-flops, and thus it led to less harsh timing constraints. Therefore, the dual-phase TDL TDC architecture can be applied to devices that suffer from hard constraints for TDL because of the clock skew and the clock frequency.

The second architecture was the pipelined on-the-fly calibration architecture. In this architecture, the most up-to-date duty cycle and each of the bin widths are estimated while the TDC measures the arrival times. Additionally, the pipelined and parallel prefix adder generates the calibration LUT by integrating all the most up-to-date bin widths while incurring no additional processing dead time. The results of the temperature drift test showed that the TDC with real-time calibration maintained its standard uncertainty during temperature drift. This fast bin calibration can be applied not only to FPGA-based TDCs but also to ASIC-based TDCs.

A synergy effect appeared when the two architectures were combined. The on-the-fly calibration involves a trade-off between calibration accuracy and resource usage. The use of a dual-phase clock allows measurement of the duty cycle with a single binary counter. This estimated duty cycle reduces statistical uncertainty in the fine time interpolation procedure. Using this combined architecture, we achieved the maximum 9.07 ps-RMS uncertainty value for a single TDC channel.

#### References

- E. Scapparone, "The time-of-flight detector of the ALICE experiment," J. Phys. G. Nucl. Part., vol. 34, no. 8, pp. S725–S728, Aug. 2007.
- [2] P. Schönmeier, D. Branford, M. Düren, M. Ehrenfried, W. Eyrich, K. Föhl, M. Hoek, R. Kaiser, A. Lehmann, S. Lu, O. Merle, B. Seitz, G. Schepers, R. Schmidt, and C. Schwarz, "Disc DIRC endcap detector for PANDA@FAIR," *Nucl. Instrum. Methods Phys. Res. A*, vol. 595, no. 1, pp. 108–111, Sep. 2008.
- [3] R. Gao, R. Cardinale, L. C. Garcia, T. Keri, T. Gys, N. Harnew, J. Fopma, R. Forty, C. Frei, and D. Piedigrossi, "Development of precision time-of-flight electronics for LHCb TORCH," *J. Instrum.*, vol. 9, no. C02025, pp. 1–5, Feb. 2014.
- [4] J. S. Karp, S. Surti, M. E. Daube-Witherspoon, and G. Muehllehner, "Benefit of time-of-flight in PET: experimental and clinical results," *J. Nucl. Med.*, vol. 49, no. 3, pp. 462–470, Mar. 2008.
- [5] D. J. Kadrmas, M. E. Casey, M. Conti, B. W. Jakoby, C. Lois, and D. W. Townsend, "Impact of time-of-flight on PET tumor detection," J. Nucl. Med., vol. 50, no. 8, pp. 1315–1323, Aug. 2009.
- [6] M. Ito, J. P. Lee, and J. S. Lee, "Timing performance study of new fast PMTs with LYSO for time-of-flight PET," *IEEE Trans. Nucl. Sci.*, vol. 60, no. 1, pp. 30–37, Feb. 2013.
- [7] J. P. Lee, M. Ito, and J. S. Lee, "Evaluation of a fast photomultiplier tube for time-of-flight PET," *Biomed. Eng. Lett.*, vol. 1, no. 3, pp. 174–179, Aug. 2011.
- [8] M. Ito, S. J. Hong, and J. S. Lee, "Positron emission tomography (PET) detectors with depth-of-interaction (DOI) capability," *Biomed. Eng. Lett.*, vol. 1, no. 2, pp. 70–81, May 2011.
- [9] J. Kalisz, "Review of methods for time interval measurements with picosecond resolution," *Metrologia*, vol. 41, no. 1, pp. 17–32, Feb. 2004.
- [10] J. Wang, S. Liu, Q. Shen, H. Li, and Q. An, "A fully fledged TDC implemented in field-programmable gate arrays," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 2, pp. 446–450, Apr. 2010.
- [11] C. Favi and E. Charbon, "A 17 ps time-to-digital converter implemented in 65 nm FPGA technology," *Proc. Int. Symp. Field-Pro*grammable Gate Arrays, pp. 113–120, 2009.
- [12] M. W. Fishburn, L. H. Menninga, C. Favi, and E. Charbon, "A 19.6 ps, FPGA-based TDC with multiple channels for open source applications," *IEEE Trans. Nucl. Sci.*, vol. 60, no. 3, pp. 2203–2208, Jun. 2013.
- [13] R. Szplet, J. Kalisz, and Z. Jachna, "A 45 ps time digitizer with a two-phase clock and dual-edge two-stage interpolation in a field programmable gate array device," *Meas. Sci. Technol.*, vol. 20, no. 2, pp. 1–11, Feb. 2009.
- [14] J. Kalisz, R. Szplet, J. Pasierbinski, and A. Poniecki, "Field-programmable-gate-array-based time-to-digital converter with 200-ps resolution," *IEEE Trans. Instrum. Meas.*, vol. 46, no. 1, pp. 51–55, Feb. 1997.
- [15] R. Szplet, J. Kalisz, and R. Szymanowski, "Interpolating time counter with 100 ps resolution on a single FPGA device," *IEEE Trans. Instrum. Meas.*, vol. 49, no. 4, pp. 879–883, Aug. 2000.
- [16] R. Pelka, J. Kalisz, and R. Szplet, "Nonlinearity correction of the integrated time-to-digital converter with direct coding," *IEEE Trans. Instrum. Meas.*, vol. 46, no. 2, pp. 449–453, Apr. 1997.
- [17] D. Tyndall, B. R. Rae, D. D. Li, J. Arlt, A. Johnston, J. A. Richardson, and R. K. Henderson, "A high-throughput time-resolved mini-silicon photomultiplier with embedded fluorescence lifetime estimation in 0.13 μm CMOS," *IEEE Trans. Biomed. Circuits Syst.*, vol. 6, no. 6, pp. 562–570, Dec. 2012.
- [18] J. Wu, "Several key issues on implementing delay line based TDCs using FPGAs," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 3, pp. 1543–1548, Jun. 2010.
- [19] Virtex-6 FPGA clocking resources user guide, UG362 (v2.5), Xilinx Inc., San Jose, CA, USA, Jan. 2014 [Online]. Available: http://www. xilinx.com/support/documentation/user\_guides/ug362.pdf.



**Jun Yeon Won** (S'13) received the B.S. degree (summa cum laude) in electrical and computer engineering from Seoul National University, Seoul, South Korea, in 2013.

Currently, he is working toward the Ph.D. degree in biomedical sciences at Seoul National University. Since 2013, he has been a Research Scientist in the Department of Biomedical Sciences, Seoul National University. His research interest includes the development of digital electronics for radiation detector and PET.

Mr. Won's awards and honors include the Best Oral Presentation (Korea-Japan Joint Meeting on Medical Physics) and the Best Oral Presentation (Korean Society of Medical Physics) in 2014.



**Sun II Kwon** (S'07–M'14) was born in Daegu, South Korea. He received the B.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejon, South Korea, in 2002, and the M.S. and Ph.D. degrees in the interdisciplinary program of radiation applied life science from Seoul National University, Seoul, South Korea, in 2013.

From 2007 to 2013, he was a Research Scientist in the Department of Nuclear Medicine and Medical Research Center, Seoul National University Hospital,

Seoul, South Korea. Since 2013, he has been a Postdoctoral Scholar in the Department of Biomedical Engineering, University of California, Davis, Davis, CA, USA. His research interests include novel gamma-ray detector development and system design for medical imaging technologies, especially positron emission tomography. He holds six patents.

Mr. Kwon was a recipient of the International Atomic Energy Agency Fellowship Award in 2010, IEEE Nuclear and Plasma Sciences Society Seoul Young Investigator's Award in 2010, and the Korea Research Foundation Brain Korea 21 Best Paper Award in 2011.



Hyun Suk Yoon received the B.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, and the Ph.D. degree in Biomedical Sciences from Seoul National University, Seoul, South Korea, in 2008 and 2015, respectively.

Since 2008, he has been a Research Scientist in the Functional and Molecular Imaging System Laboratory in the Department of Nuclear Medicine at Seoul National University Hospital, Seoul, South Korea. His research interests include the develop-

ment of electronics, detectors, and data acquisition system for PET.

Mr. Yoon's awards and honors include the Young Investigator's Award (Korean Society of Nuclear Medicine), the Best Oral Presentation (Korean Society of Medical and Biomedical Engineering), and the Travel Award (Society of Nuclear Medicine).



**Guen Bae Ko** (S'11) received the B.S. degree in electrical engineering from Seoul National University, Seoul, South Korea, in 2010.

Currently, he is working toward the Ph.D. degree in biomedical sciences at Seoul National University College of Medicine, Seoul, South Korea. Since 2013, he has been a Research Scientist in the Institute of Radiation Medicine, Medical Research Center, Seoul National University College of Medicine. His research interests include the development of high performance PET and PET/MRI systems using the

silicon photo-sensor, fundamental study of photon counting photo-sensors, and development of electronics for radiation detector.

Mr. Ko was a recipient of the Society of Nuclear Medicine Computer and Instrumentation Young Investigator Award (honorable mention) in 2013.



Jeong-Whan Son (S'13) was born in Seoul, South Korea, in 1991. He received the B.S. degree in electrical and computer engineering from Seoul National University, Seoul, South Korea, in 2012.

Currently, he is working toward the Ph.D. degree in biomedical sciences at Seoul National University, Seoul, South Korea. Since 2012, he has been a Research Scientist in the Biomedical Sciences Department, Seoul National University, Seoul, South Korea. His research interests include the development of analog circuits for PET detectors

and PET systems.

Mr. Son's awards and honors include the Young Investigator Award (Korea-Japan Joint Meeting on Medical Physics) in 2014.



Jae Sung Lee received the B.S. degree in electrical engineering and the Ph.D. degree in biomedical engineering from Seoul National University (SNU), Seoul, South Korea, in 1996 and 2001, respectively.

From 2001–2005, he worked as a Postdoctoral Fellow of Radiology at John Hopkins University (JHU), Baltimore, MD, USA. In 2005, he joined the SNU College of Medicine, where he is currently a Professor of Nuclear Medicine and Biomedical Sciences. His early academic achievements are mainly related with PET/SPECT imaging studies for

understanding the energetics and hemodynamics in brain and heart. The most notable achievement of his group since the foundation of his own lab at SNU is the development of PET systems based on a novel photo-sensor, silicon photomultiplier (SiPM). He has authored seven book chapters and more than 200 papers in peer-reviewed journals.

Dr. Lee serves as an editorial and advisory board member for several international scientific journals. He has served as the General Secretory of the IEEE NPSS Seoul Chapter for the last six years and was the MIC Program Chair of the 2013 NSS/MIC/RTSD meeting held in Seoul, South Korea. He has received multiple research awards from various scientific societies.