Dynamic time warping (DTW) is a time
series alignment algorithm developed originally for
speech recognition(1). It aims at aligning two
sequences of feature vectors by warping the time
axis iteratively until an optimal match (according to a suitable metrics) between the two sequences is found.
Consider two sequences of
feature vectors:
The two sequences can be
arranged on the sides of a grid, with one on the
top and the other up the left hand side. Both
sequences start on the bottom left of the
grid.
Inside each cell a distance measure can be
placed, comparing the corresponding elements of the
two sequences. To find the best match or alignment
between these two sequences one need to find a path
through the grid which minimizes the total distance
between them. The procedure for computing this
overall distance involves finding all possible
routes through the grid and for each one compute the
overall distance. The overall distance is the
minimum of the sum of the distances between the
individual elements on the path divided by the
sum of the weighting function. The weighting
function is used to normalise for the path length.
It is apparent that
for any considerably long sequences the number of
possible paths through the grid will be very
large. The major optimisations or constraints of the
DTW algorithm arise from the observations on the
nature of acceptable paths through the
grid:
- Monotonic condition:
the path will not turn back on itself, both the i and j indexes either stay the same or increase, they never decrease.
- Continuity condition:
the path advances one step at a time. Both i and
j can only increase by at most 1 on each step
along the path.
- Boundary condition: the path starts at the bottom left and ends at the top right.
- Warping window condition: a good path is unlikely to wander very far from the diagonal.
The distance that the path is allowed to wander is the window width.
- Slope constraint condition: The path should not be too steep or too shallow. This prevents
short sequences matching too long ones. The condition is expressed as a ratio p/q where p is the number of
steps allowed in the same (horizontal or vertical) direction. After p steps in the same direction is not
allowed to step further in the same direction before stepping at least q time in the diagonal direction.
The foregoing constraints allow to restrict the moves that can be made from any point in the
path and so limit the number of paths that need to be considered. The power of the DTW algorithm
is in the fact that instead finding all possible routes through the grid which satisfy the
above conditions, the DTW algorithm works by keeping track of the cost of the best path to
each point in the grid. During the calculation process of the DTW grid it is not known which path
is minimum overall distance path, but this can be traced back when the end point is reached.
A detailed technical presentation of the DTW algorithm can be
downloaded from here DTW algorithm.
(1)
Sakoe,H. and Chiba, S. Dynamic
programming algorithm optimization for spoken word
recognition. IEEE Trans. on Acoust., Speech,
and Signal Process., ASSP 26, 43-49
(1978).
|