我使用的图像处理API会读取图像上的文本,然后从返回的字符串数据中提取英语或词典中的单词以及常用的姓和名。换句话说,我在字符串中得到了我需要的文本,但在结果中还需要过滤掉一些垃圾(非单词)。最好的方法是什么?我已经调查过了,NSLinguisticTagger
但不是100%正确地说我在做什么,还有其他建议吗?
REGEX会在这里帮助我吗?我不知道如何为仅与单词匹配的模式形成语法?
以下是我尝试从中提取单词/名称的示例字符串的2个示例:
(1)“ Pumper im CasSICI 1111 Cassu与安德鲁·韦伯斯特PUMPE im CasSICI 1111 Cassu与安德鲁·韦伯斯特”
//我需要提取:“ Pumper With Andrew Webster”
(2)“强大的Hazelwood High三部曲中的SHARON M DRAPER000kFORGEDBY FIRESWINNER SHARE M强大的Hazelwood High三部曲中的SHARON M DRAPER000k由FIRE S WINNER锻造”
//我需要提取“ Sharon Hazelwood High Draper在强大的三部曲中,由胜利者强迫”
我把这个类拼凑在一起,它是真实代码和伪代码的混合体。我将为名字和姓氏创建一个单例类。有关详细信息,请参见代码中的注释。这不是全部,但它应该可以解决您的大多数问题。
更新cleanUpString
使用switch语句调整了该方法。
更新2添加了此功能以照顾所有UITextChecker
不想要的东西...
return UIReferenceLibraryViewController.dictionaryHasDefinition(forTerm: self)
无论您从何处获取OCR文本,都可以像这样使用它:
let stringParser = StringParser()
let cleanedUpText = stringParser.cleanUpString(yourOCRText)
这是课程:
import UIKit // need this so UITextChecker will work
import Foundation
class StringParser: NSObject {
// TODO: You'll need to create a singleton class for your first and last names
// https://krakendev.io/blog/the-right-way-to-write-a-singleton
func cleanUpString(_ inputString: String) -> String {
// chuck stuff separated by a space into an array as an invdividual string
let inputStringArray = inputString.characters.split(separator: " ").map(String.init)
var outputArray = [String]()
for word in inputStringArray {
// Switch to check if word satisfies any of the desired conditions...if so, chuck in outputArray
switch word {
case _ where word.isRealWord():
outputArray.append(word)
break
case _ where word.isFirstName():
outputArray.append(word.capitalized)
break
case _ where word.isLastName():
outputArray.append(word.capitalized)
break
default:
break
}
}
// reassemble the cleaned up words into an output array and return it as a single string
return outputArray.joined(separator: " ")
}
}
extension String {
func isFirstName() -> Bool {
let firstNameArray = ["Andrew", "Sharon"] // FIXME: this should be your singleton
return firstNameArray.contains(self.capitalized)
}
func isLastName() -> Bool {
let lastNameArray = ["Webster", "Hazelwood"] // FIXME: this should be your singleton
return lastNameArray.contains(self.capitalized)
}
func isRealWord() -> Bool {
// adapted from https://www.hackingwithswift.com/example-code/uikit/how-to-check-a-string-is-spelled-correctly-using-uitextchecker
let checker = UITextChecker()
let range = NSRange(location: 0, length: self.utf16.count)
let misspelledRange = checker.rangeOfMisspelledWord(in: self, range: range, startingAt: 0, wrap: false, language: "en")
if misspelledRange.location == NSNotFound {
// cleans up what UITextChecker misses
return UIReferenceLibraryViewController.dictionaryHasDefinition(forTerm: self) // returns yes if there's a definition for it
}
return false
}
}
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句