如何在PHP中删除隐藏字符

用户名

我有以下一段代码,它从控制器读取文本文件。我使用了停用词列表,并且在从这些文件中删除停用词后,这些文件的词及其位置随后会出现多余的空白字符来代替停用词在文档中的位置。

例如,读为

计算机科学系//文件

当我遍历文档时从文档中删除停用词'of'之后,输出如下:

部门(0)(1)计算机(2)科学(3)//输出

但是空白不应该在那里。

这是代码:

<?php
$directory = "archive/";
$dir = opendir($directory);
while (($file = readdir($dir)) !== false) {
  $filename = $directory . $file;
  $type = filetype($filename);
  if ($type == 'file') {
    $contents = file_get_contents($filename);
    $texts = preg_replace('/\s+/', ' ',  $contents);
    $texts = preg_replace('/[^A-Za-z0-9\-\n ]/', '', $texts);
    $text = explode(" ", $texts);
    $text = array_map('strtolower', $text);
    $stopwords = array("a", "an", "and", "are", "as", "at", "be", "by", "for", "from", "has", "he", "in", "it","i","is", "its", "of", "on", "that", "the", "to","was", "were", "will", "with", "or", " ");
    $text = (array_diff($text,$stopwords));
    echo "<br><br>";
    $total_count = count($text);
    $b = -1;
   foreach ($text as $a=>$v)
   {
     $b++;
     echo $text[$b]. "(" .$b. ")" ." ";
   } 
 } 
}
closedir($dir); 
?>
贾科莫1968

真正地不是100%确定字符串位置的最终输出,而是假设您将其放置在此处仅供参考。使用regex的此测试代码preg_replace似乎运行良好。

header('Content-Type: text/plain; charset=utf-8');

// Set test content array.
$contents_array = array();
$contents_array[] = "Department of Computer Science // A document";
$contents_array[] = "Department of Economics // A document";

// Set the stopwords.
$stopwords = array("a", "an", "and", "are", "as", "at", "be", "by", "for", "from", "has", "he", "in", "it","i","is", "its", "of", "on", "that", "the", "to","was", "were", "will", "with", "or");

// Set a regex based on the stopwords.
$regex = '/(' . implode('\b|', $stopwords) . '\b)/i';

foreach ($contents_array as $contents) {

  // Remove the stopwords.
  $contents = preg_replace($regex, '', $contents);

  // Clear out the extra whitespace; anything 2 spaces or more in a row.
  $contents = preg_replace('/\s{2,}/', ' ', $contents);

  // Echo contents.
  echo $contents . "\n";

}

输出将按照以下格式进行清理和格式化:

部门计算机科学//文档

部门经济学//文件

因此,要将其集成到您的代码中,您应该这样做。请注意我是如何移动$stopwords$regex该外while循环,因为它没有任何意义,在每个重置这些值while循环迭代。在循环外设置一次,并让循环中的内容仅专注于循环中您需要的内容:

<?php
$directory = "archive/";
$dir = opendir($directory);

// Set the stopwords.
$stopwords = array("a", "an", "and", "are", "as", "at", "be", "by", "for", "from", "has", "he", "in", "it","i","is", "its", "of", "on", "that", "the", "to","was", "were", "will", "with", "or");

// Set a regex based on the stopwords.
$regex = '/(' . implode('\b|', $stopwords) . '\b)/i';

while (($file = readdir($dir)) !== false) {
  $filename = $directory . $file;
  $type = filetype($filename);
  if ($type == 'file') {

    // Get the contents of the filename.
    $contents = file_get_contents($filename);

    // Remove the stopwords.
    $contents = preg_replace($regex, '', $contents);

    // Clear out the extra whitespace; anything 2 spaces or more in a row.
    $contents = preg_replace('/\s{2,}/', ' ', $contents);

    // Echo contents.
    echo $contents;

 } 
}
closedir($dir); 
?>

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如何在PHP中删除隐藏字符

来自分类Dev

如何从NSString中删除隐藏的字符?

来自分类Dev

如何在php中从字符串中删除文本?

来自分类Dev

如何在Notepad ++中删除或隐藏折线?

来自分类Dev

如何在php变量中的正斜杠之间删除字符

来自分类Dev

如何在php中删除字符串的单引号(')

来自分类Dev

如何在Notepad ++中查看隐藏的字符?

来自分类Dev

如何在vim中显示隐藏字符?

来自分类Dev

如何在vim中显示隐藏字符?

来自分类Dev

PHP如何删除或规范化隐藏的特殊字符

来自分类Dev

如何在PHP中隐藏index?q =?

来自分类Dev

如何在php中显示或隐藏标题

来自分类Dev

如何在PHP中隐藏类别?

来自分类Dev

如何在PHP Laravel中删除特殊字符,但字符“ñ/Ñ”和破折号“-”除外

来自分类Dev

如何在Excel 2010中删除字符“ *”?

来自分类Dev

如何在csv文件中删除字符','

来自分类Dev

如何在python中删除nonAscii字符

来自分类Dev

如何在神经网络中删除整个隐藏层?

来自分类Dev

如何在Ubuntu 14.04中隐藏/删除“ postgres”用户?

来自分类Dev

如何在PHP中删除字符串中的重复字母

来自分类Dev

如何从数组 PHP 中删除特殊字符?

来自分类Dev

如何在Java中删除字符串中的字符?

来自分类Dev

如何在C ++中从字符串中删除字符?

来自分类Dev

如何在Oracle中隐藏字符串的日期

来自分类Dev

如何在Excel 2013中显示隐藏的字符?

来自分类Dev

如何在模板文字中隐藏空字符串?

来自分类Dev

如何在PHP中删除像英文字符这样的Unicode?

来自分类Dev

如何在PHP中删除特殊字符并保留任何语言的字母?

来自分类Dev

如何在PHP的当前网页链接中删除某些字符以提供更改语言的链接?

Related 相关文章

热门标签

归档