递归在请求远程URL时的应用

两次测试中需要用到的公共函数
/**
 * 每次调用requestData前需要将$GLOBALS['recursion_sum']置为0,千万不要忘记了
 * 因为如果一个程序中多次调用requestData函数的时候,如果不将$GLOBALS['recursion_sum']置为0,则$GLOBALS['recursion_sum']的值一直是在递增的,有可能会造成下一次不会递增了(也就是empty($str) && $GLOBALS['recursion_sum'] < $recursion_count的条件不成立了)
 *
 * 有时当file_get_contents返回为true时请求虽然成功了,但是有可能返回的数据是空的,这种情况可能会出现在比如客户端或服务器端的网络比较繁忙的情况下,数据是空的当然没有什么意义了,此时有必要在requestData内部再次递归调用一下自身,直到请求的数据不为空为止。有了递调调用,问题应该就不大了。如果仍然失败,那就算了,说明某个url的返回值很有可能确实是空的。
 *
 * @param string $url 被请求的url地址
 * @param int $try_count 如果请求失败,重试的次数
 * @param int $recursion_count 如果请求成功,但是返回的结果为空,递归调用的次数,否则如果不控制,有可能会造成死循环
 * @version 2.0 2010-1-23 caihuafeng <caihuafeng@gmail.com>
 */
function requestData($url, $try_count = 100, $recursion_count = 10) {
    if (empty($url)) return false;
   
    $cnt = 0;
    while ($cnt < $try_count && ($str = @file_get_contents($url, false)) === FALSE) {
        //do_check_mysql($cnt);
        //writeLog(sprintf("%s spider url: |%s| try %d times", __FUNCTION__, $url, $cnt + 1));
        echo "cnt: $cnt<br />\n";
        sleep(1);
        $cnt++;
    }
   
    //writeLog(sprintf("%s download file from url: |%s| %s!", __FUNCTION__, $url, strlen($str) ? 'succeed!' : 'failed!'));
    echo sprintf("%s download file from url: |%s| %s!", __FUNCTION__, $url, strlen($str) ? 'succeed!' : 'failed!');
   
    if (empty($str) && $GLOBALS['recursion_sum'] < $recursion_count) {
        $GLOBALS['recursion_sum']++;
        //writeLog(sprintf("%s recursion index %d fetch content is empty, so recursion invoke itself!", __FUNCTION__, $GLOBALS['recursion_sum']));
        echo sprintf("%s recursion index %d fetch content is empty, so recursion invoke itself!<br />\n", __FUNCTION__, $GLOBALS['recursion_sum']);
         sleep(5); //在递归调用之前等待一下,以等待服务器的负载降低
        requestData($url, $try_count, $recursion_count);
    }
   
    return $str;
}

1.测试一
$url = 'http://localhost/research/empty.php';
$GLOBALS['recursion_sum'] = 0;
requestData($url, 10, 10);

echo "<hr>\n";

$GLOBALS['recursion_sum'] = 0;
requestData($url, 10, 10);

输出如下:
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 1 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 2 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 3 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 4 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 5 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 6 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 7 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 8 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 9 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 10 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!
--------------------------------------------------------------------------------
 requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 1 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 2 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 3 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 4 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 5 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 6 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 7 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 8 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 9 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 10 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!

2.测试二
我们把测试代码改为如下,也就是在第二次调用requestData前注释$GLOBALS['recursion_sum'] = 0;那一行
$url = 'http://localhost/research/empty.php';
$GLOBALS['recursion_sum'] = 0;
requestData($url, 10, 10);

echo "<hr>\n";

//$GLOBALS['recursion_sum'] = 0;
requestData($url, 10, 10);

输出如下:
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 1 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 2 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 3 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 4 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 5 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 6 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 7 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 8 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 9 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!requestData recursion index 10 fetch content is empty, so recursion invoke itself!
requestData download file from url: |http://localhost/research/empty.php| failed!!
--------------------------------------------------------------------------------
 requestData download file from url: |http://localhost/research/empty.php| failed!!

 我们从输出可以看到第二次调用requestData时没有递归调用它自身,因为在调用requestData前没有将$GLOBALS['recursion_sum']置为0。
 



上一篇: PHP中json的用法
下一篇: 音乐抓取总结
文章来自: 本站原创
引用通告: 查看所有引用 | 我要引用此文章
Tags: php
相关日志:
评论: 0 | 引用: 0 | 查看次数: 1615
发表评论
昵 称:
密 码: 游客发言不需要密码.
邮 箱: 邮件地址支持Gravatar头像,邮箱地址不会公开.
网 址: 输入网址便于回访.
内 容:
验证码:
选 项:
虽然发表评论不用注册,但是为了保护您的发言权,建议您注册帐号.
字数限制 1000 字 | UBB代码 开启 | [img]标签 关闭

 广告位

↑返回顶部↑