博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
使用Pspell查找和纠正拼写错误的单词
阅读量:2514 次
发布时间:2019-05-11

本文共 8639 字,大约阅读时间需要 28 分钟。

Every one of us has made a spelling mistake in a Google search: "alternitive music", for example. In doing so, you may have noticed that Google was trying to help you by displaying: "Did you mean alternative music?". If your site has a search function, to indicate misspellings if no or too few results have been found is a very useful feature, especially if the bad English of a visitor can make you miss a sale. Fortunately, PHP's Pspell module allows for checking the spelling of a word and suggesting a replacement from its default dictionary (you can also create a custom dictionary).

我们每个人在Google搜索中都犯了一个拼写错误:例如“另类音乐”。 这样做时,您可能已经注意到Google试图通过显示以下内容来帮助您:“您是说另类音乐吗?”。 如果您的站点具有搜索功能,则在未找到结果或搜索结果很少的情况下显示拼写错误是一项非常有用的功能,尤其是如果访问者的英语不好会导致您错过一笔交易。 幸运的是,PHP的Pspell模块允许检查单词的拼写并建议从其默认词典中进行替换(您也可以创建自定义词典)。

To begin, we need to check if Pspell is installed:

首先,我们需要检查是否已安装Pspell:

If you get an error, it isn't. On Linux systems, follow to solve the problem.

如果出现错误,则不是。 在Linux系统上,请按照解决问题。

Use the default dictionary

使用默认字典

Here is a small function to help you understand how Pspell works:

这是一个小功能,可帮助您了解Pspell的工作方式:

$value) { if(!pspell_check($dictionary, $value)) { $suggestion = pspell_suggest($dictionary, $value); // Suggestions are case sensitive. Grab the first one. if(strtolower($suggestion [0]) != strtolower($value)) { $string [$key] = $suggestion [0]; $replacement_suggest = true; } } } if ($replacement_suggest) { // We have a suggestion, so we return to the data. return implode('', $string); } else { return null; }}

To use this function, it is sufficient to pass to it a string parameter:

要使用此函数,只需将一个字符串参数传递给它即可:

If the string you submit to Pspell is "here is my mispellid word", the previous script will return: "Try with this spelling: Here is my misspelled word." However, Pspell is no miracle worker, especially if you're automatically using the first suggested spelling alternative! For best results, you can use all the suggestions offered by Pspell. The following script returns twenty proposals around the word "lappin":

如果您提交给Pspell的字符串是“这是我的拼写错误的单词”,则先前的脚本将返回:“尝试使用此拼写:这是我的拼写错误的单词”。 但是,Pspell并不是奇迹,特别是当您自动使用第一个建议的拼写替代项时! 为了获得最佳效果,您可以使用Pspell提供的所有建议。 以下脚本返回围绕“ lappin”一词的二十个建议:

"; }}

You must configure a dictionary to initialize Pspell. To do this, create a descriptor toward a configuration file of the dictionary, change some options of this descriptor, then use the configuration dictionary to create a second descriptor for the real dictionary. If this sounds a bit complicated, do not worry: The code rarely changes and you can usually copy it from another script. However, here we will study it step by step. Here is the code that configures the dictionary:

您必须配置字典以初始化Pspell。 为此,请为字典的配置文件创建一个描述符,更改该描述符的某些选项,然后使用配置字典为真实字典创建第二个描述符。 如果这听起来有些复杂,请不要担心:代码很少更改,您通常可以从另一个脚本复制它。 但是,这里我们将逐步研究它。 这是配置字典的代码:

// Suggests possible words in case of misspelling    $config_dic = pspell_config_create('en');    // Ignore words under 3 characters    pspell_config_ignore($config_dic, 3);    // Configure the dictionary    pspell_config_mode($config_dic, PSPELL_FAST);

$config_dic is the initial template which controls the options for your dictionary. You must load all the options in $config_dic, then use it to create the dictionary. pspell_config_create() creates an English dictionary (en). To use the English language and specify that you prefer American spelling, specify ‘en’ as the first parameter and 'american' as the second. pspell_config_ignore() indicates that your dictionary will ignore all words of 3 letters or less. Finally, pspell_config_mode() indicates to Pspell the operating mode:

$config_dic是控制字典选项的初始模板。 您必须将所有选项加载到$config_dic ,然后使用它来创建字典。 pspell_config_create()创建英语字典(en)。 要使用英语并指定您喜欢美式拼写,请将“ en”作为第一个参数,将“ american”作为第二个参数。 pspell_config_ignore()表示字典将忽略所有3个字母或更少的单词。 最后, pspell_config_mode()指示Pspell操作模式:

• PSPELL_FAST is a quick method that will return the minimum of suggestions. • PSPELL_NORMAL returns an average number of suggestions at normal speed. • PSPELL_SLOW provides all possible suggestions, although this method takes some time to perform the spell check. We could still use other configuration options (to add, for example, a custom dictionary, as we shall see later), but as this is a quick check, we will simply create the dictionary with this line:

•PSPELL_FAST是一种快速方法,将返回最少的建议。 •PSPELL_NORMAL以正常速度返回平均建议数。 •PSPELL_SLOW提供了所有可能的建议,尽管此方法需要一些时间来执行拼写检查。 我们仍然可以使用其他配置选项(例如,添加自定义词典,如我们将在后面看到的),但是由于这是一个快速检查,因此我们将使用此行简单地创建词典:

$dictionary = pspell_new_config($config_dic);

From this point you can use the dictionary in two ways: 1. pspell_check($dictionary, "word") returns true if "word" is in the dictionary. 2. pspell_suggest($dictionary, "word") returns an array of suggested words if "word" is not in the dictionary (the first element of this array is the most likely candidate). The number of words obtained varies, but you get more with PSPELL_SLOW and fewer with PSPELL_FAST.

从这一点出发,您可以通过两种方式使用字典:1. pspell_check($dictionary, "word")返回true 。 2. pspell_suggest($dictionary, "word")返回建议单词的数组(此数组的第一个元素是最有可能的候选者)。 所获得的单词数量各不相同,但使用PSPELL_SLOW可获得更多,而使用PSPELL_FAST可获得较少。

Now that the dictionary is ready, we cut the string that was passed as a parameter to obtain an array of words: ‘here my sentence‘ becomes an array of three elements, "here", "my", and "sentence". Then we check the spelling of each word using the default dictionary. Because it does not like commas, we also delete them before exploding the string. If the word has more than three characters, verification takes place and in case of misspelling, we conduct the following operations:

现在字典已经准备好了,我们剪切作为参数传递的字符串,以获得一个单词数组:“这里我的句子”变成三个元素的数组,“这里”,“我”和“句子”。 然后,我们使用默认词典检查每个单词的拼写。 因为它不喜欢逗号,所以我们在爆炸字符串之前也将其删除。 如果单词的字符数超过三个,则会进行验证,如果拼写错误,我们将执行以下操作:

  1. We ask Pspell to provide an array of suggestions for correction.

    我们要求Pspell提供一系列纠正建议。
  2. We take the most likely suggestion (the first element of the array $suggestion) and we replace the misspelled word with it.

    我们采用最可能的建议(数组$ suggestion的第一个元素),并用它替换拼写错误的单词。
  3. We set the $replacement_suggest flag to true so that at the end of the processing loop, we know that we have found a spelling mistake somewhere in $string. At the end of the loop, if there were spelling corrections, we are reforming the string from elements of the corrected array and we return this chain. Otherwise, the function returns null to indicate that it has not detected misspelling.

    我们将$replacement_suggest标志设置为true,以便在处理循环结束时,我们知道在$string某处发现了拼写错误。 在循环的最后,如果有拼写更正,我们将从更正后的数组的元素中重新组成字符串,然后返回该链。 否则,该函数返回null,以指示它没有检测到拼写错误。

Add a custom dictionary to Pspell

将自定义词典添加到Pspell

If a word is not in the default dictionary, you can easily add it. However, you can also create a custom dictionary to be used with the default. Create a directory on your site where PHP has the right to write and initialize the new dictionary in it. To create a new dictionary file called perso.pws in the directory path of your server, use the following script:

如果单词不在默认词典中,则可以轻松添加它。 但是,您也可以创建自定义词典以与默认词典一起使用。 在您的站点上创建一个目录,PHP有权在该目录中编写和初始化新字典。 要在服务器的目录路径中创建一个名为perso.pws的新词典文件,请使用以下脚本:

This is the same script as in the previous section, but with an essential addition: calling pspell_config_personal() initializes a personal dictionary file. If this file does not already exist, Pspell creates a new one for you. You can add to this dictionary as many words as you want by using the following function:

该脚本与上一部分相同,但有一个重要的补充:调用pspell_config_personal()初始化个人词典文件。 如果该文件尚不存在,Pspell会为您创建一个新文件。 您可以使用以下功能将任意数量的单词添加到此词典中:

`pspell_add_to_personal($dic, "word");`

As long as you have not saved the dictionary, words are added to it temporarily. Therefore, after inserting the words you want, add this line to the end of the script:

只要您还没有保存字典,单词就会临时添加到字典中。 因此,在插入所需的单词之后,将此行添加到脚本的末尾:

pspell_save_wordlist($dic);

Then call pspell_config_personal() as above in the demo script and your new dictionary will be ready.

然后,按上面在演示脚本中调用pspell_config_personal() ,新字典将准备就绪。

结论 (Conclusion)

Pspell can help you with your conversion rate by providing your visitors with a way to automatically correct and notice their typos. It can enhance search experiences, forum submissions, and general linguistic accuracy of a web site with user submitted content. If you'd like to take a deeper look at Pspell, or have implemented it in an interesting manner, let us know in the comments below!

Pspell通过为访问者提供一种自动更正和注意到他们的错别字的方式,可以帮助您提高转化率。 它可以增强搜索体验,论坛提交以及具有用户提交内容的网站的总体语言准确性。 如果您想深入了解Pspell或以有趣的方式实现它,请在下面的评论中告诉我们!

翻译自:

转载地址:http://zargb.baihongyu.com/

你可能感兴趣的文章
表达式求值-201308081712.txt
查看>>
centos中安装tomcat6
查看>>
从Vue.js窥探前端行业
查看>>
学习进度
查看>>
poj3368 RMQ
查看>>
“此人不存在”
查看>>
github.com加速节点
查看>>
解密zend-PHP凤凰源码程序
查看>>
python3 序列分片记录
查看>>
Atitit.git的存储结构and 追踪
查看>>
atitit 读书与获取知识资料的attilax的总结.docx
查看>>
B站 React教程笔记day2(3)React-Redux
查看>>
找了一个api管理工具
查看>>
C++——string类和标准模板库
查看>>
zt C++ list 类学习笔记
查看>>
git常用命令
查看>>
探讨和比较Java和_NET的序列化_Serialization_框架
查看>>
1、jQuery概述
查看>>
数组比较大小的几种方法及math是方法
查看>>
FTP站点建立 普通电脑版&&服务器版
查看>>