现在的位置: 首页 > 综合 > 正文

gk metric

2013年06月03日 ⁄ 综合 ⁄ 共 1310字 ⁄ 字号 评论关闭
# Beeferman's Pk text segmentation evaluation metric

[docs]def pk(ref, hyp, k=None, boundary='1'):
    """
    Compute the Pk metric for a pair of segmentations A segmentation
    is any sequence over a vocabulary of two items (e.g. "0", "1"),
    where the specified boundary value is used to mark the edge of a
    segmentation.

    >>> s1 = "00000010000000001000000"
    >>> s2 = "00000001000000010000000"
    >>> s3 = "00010000000000000001000"
    >>> pk(s1, s1, 3)
    0.0
    >>> pk(s1, s2, 3)
    0.095238...
    >>> pk(s2, s3, 3)
    0.190476...

    :param ref: the reference segmentation
    :type ref: str or list
    :param hyp: the segmentation to evaluate
    :type hyp: str or list
    :param k: window size, if None, set to half of the average reference segment length
    :type boundary: str or int or bool
    :param boundary: boundary value
    :type boundary: str or int or bool
    :rtype: float
    """

    if k is None:
        k = int(round(len(ref) / (ref.count(boundary) * 2.)))
    
    n_considered_seg = len(ref) - k + 1
    n_same_ref = 0.0
    n_false_alarm = 0.0
    n_miss = 0.0

    for i in xrange(n_considered_seg):
        bsame_ref_seg = False
        bsame_hyp_seg = False

        if boundary not in ref[(i+1):(i+k)]:
            n_same_ref += 1.0
            bsame_ref_seg = True
        if boundary not in hyp[(i+1):(i+k)]:
            bsame_hyp_seg = True
        
        if bsame_hyp_seg and not bsame_ref_seg:
            n_miss += 1
        if bsame_ref_seg and not bsame_hyp_seg:
            n_false_alarm += 1

    prob_same_ref = n_same_ref / n_considered_seg
    prob_diff_ref = 1 - prob_same_ref
    prob_miss = n_miss / n_considered_seg
    prob_false_alarm = n_false_alarm / n_considered_seg

    return prob_miss * prob_diff_ref + prob_false_alarm * prob_same_ref

【上篇】
【下篇】

抱歉!评论已关闭.