将SQL Server中的文本值从UTF8转换为ISO 8859-1

鲍勃

我在SQL Server中有一列使用 utf8 SQL_Latin1_General_CP1_CI_AS编码的列。如何以ISO 8859-1编码转换和保存文本?我想在SQL Server的查询中做一些事情。有小费吗?

你好。我喜欢这个游戏。当我下载它时,我什至以为我不会非常喜欢它

安东尼·福尔

我编写了一个函数来修复存储在varchar字段中的UTF-8文本

要检查固定值,可以像这样使用它:

CREATE TABLE #Table1 (Column1 varchar(max))

INSERT #Table1
VALUES ('Olá. Gostei do jogo. Quando "baixei" até achei que não iria curtir muito')

SELECT *, NewColumn1 = dbo.DecodeUTF8String(Column1)
FROM Table1
WHERE Column1 <> dbo.DecodeUTF8String(Column1)

输出:

Column1
-------------------------------
Olá. Gostei do jogo. Quando "baixei" até achei que não iria curtir muito

NewColumn1
-------------------------------
Olá. Gostei do jogo. Quando "baixei" até achei que não iria curtir muito

编码:

CREATE FUNCTION dbo.DecodeUTF8String (@value varchar(max))
RETURNS nvarchar(max)
AS
BEGIN
    -- Transforms a UTF-8 encoded varchar string into Unicode
    -- By Anthony Faull 2014-07-31
    DECLARE @result nvarchar(max);

    -- If ASCII or null there's no work to do
    IF (@value IS NULL
        OR @value NOT LIKE '%[^ -~]%' COLLATE Latin1_General_BIN
    )
        RETURN @value;

    -- Generate all integers from 1 to the length of string
    WITH e0(n) AS (SELECT TOP(POWER(2,POWER(2,0))) NULL FROM (VALUES (NULL),(NULL)) e(n))
        , e1(n) AS (SELECT TOP(POWER(2,POWER(2,1))) NULL FROM e0 CROSS JOIN e0 e)
        , e2(n) AS (SELECT TOP(POWER(2,POWER(2,2))) NULL FROM e1 CROSS JOIN e1 e)
        , e3(n) AS (SELECT TOP(POWER(2,POWER(2,3))) NULL FROM e2 CROSS JOIN e2 e)
        , e4(n) AS (SELECT TOP(POWER(2,POWER(2,4))) NULL FROM e3 CROSS JOIN e3 e)
        , e5(n) AS (SELECT TOP(POWER(2.,POWER(2,5)-1)-1) NULL FROM e4 CROSS JOIN e4 e)
    , numbers(position) AS
    (
        SELECT TOP(DATALENGTH(@value)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM e5
    )
    -- UTF-8 Algorithm (http://en.wikipedia.org/wiki/UTF-8)
    -- For each octet, count the high-order one bits, and extract the data bits.
    , octets AS
    (
        SELECT position, highorderones, partialcodepoint
        FROM numbers a
        -- Split UTF8 string into rows of one octet each.
        CROSS APPLY (SELECT octet = ASCII(SUBSTRING(@value, position, 1))) b
        -- Count the number of leading one bits
        CROSS APPLY (SELECT highorderones = 8 - FLOOR(LOG( ~CONVERT(tinyint, octet) * 2 + 1)/LOG(2))) c
        CROSS APPLY (SELECT databits = 7 - highorderones) d
        CROSS APPLY (SELECT partialcodepoint = octet % POWER(2, databits)) e
    )
    -- Compute the Unicode codepoint for each sequence of 1 to 4 bytes
    , codepoints AS
    (
        SELECT position, codepoint
        FROM
        (
            -- Get the starting octect for each sequence (i.e. exclude the continuation bytes)
            SELECT position, highorderones, partialcodepoint
            FROM octets
            WHERE highorderones <> 1
        ) lead
        CROSS APPLY (SELECT sequencelength = CASE WHEN highorderones in (1,2,3,4) THEN highorderones ELSE 1 END) b
        CROSS APPLY (SELECT endposition = position + sequencelength - 1) c
        CROSS APPLY
        (
            -- Compute the codepoint of a single UTF-8 sequence
            SELECT codepoint = SUM(POWER(2, shiftleft) * partialcodepoint)
            FROM octets
            CROSS APPLY (SELECT shiftleft = 6 * (endposition - position)) b
            WHERE position BETWEEN lead.position AND endposition
        ) d
    )
    -- Concatenate the codepoints into a Unicode string
    SELECT @result = CONVERT(xml,
        (
            SELECT NCHAR(codepoint)
            FROM codepoints
            ORDER BY position
            FOR XML PATH('')
        )).value('.', 'nvarchar(max)');

    RETURN @result;
END
GO

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

从UTF8转换为ISO 8859-5,改为获取ISO 8859-1

来自分类Dev

将iso-8859-1转换为utf-8 javascript

来自分类Dev

Javascript将字符串从utf-8转换为iso-8859-1

来自分类Dev

将ISO-8859-1转换为utf-8(øæå)

来自分类Dev

如何在Ruby 2.0中将UTF-8转换为ISO-8859-1?

来自分类Dev

将字符从ISO-8859-1转换为Javascript中的UTF-8编码时,保持字符表示

来自分类Dev

根据.procmailrc文件中的文本将utf8解码为ISO-8859-1邮件

来自分类Dev

jQuery或Javascript:从ISO-8859-1转换为utf8

来自分类Dev

在Delphi中将UTF8转换为ANSI(ISO-8859-1)

来自分类Dev

Symfony2 + Doctrine:如何将iso8859-1转换为utf-8,反之亦然?

来自分类Dev

Ruby将字符串编码从ISO-8859-1转换为UTF-8无效

来自分类Dev

从utf8转换为iso-8859-1后未显示丹麦语字符

来自分类Dev

如何从C中的八进制ISO-8859-1转储utf8

来自分类Dev

从 UTF8 转换为 ISO-8859-1 不起作用

来自分类Dev

将UTF-8转换为ISO-8859-2(抛光)

来自分类Dev

如何将ISO-8859-1字符正确替换为UTF-8?

来自分类Dev

字符编码问题-UTF8 / iso-8859-1

来自分类Dev

如何将 iso8859_6 中的文件名转换为 utf-8?

来自分类Dev

为什么在JSP上UTF8文本显示为ISO-8859-1

来自分类Dev

从UTF-8转换为ISO8859-1并重新转换为UTF-8后,符号会丢失

来自分类Dev

使用Dart语言将字符串从ISO-8859-2转换为UTF-8

来自分类Dev

使用Dart语言将字符串从ISO-8859-2转换为UTF-8

来自分类Dev

从ISO-8859-1转换大型XML为UTF-8与外部DTD实体

来自分类Dev

ASP:我无法将某些字符从utf-8解码为iso-8859-1

来自分类Dev

ASP:我无法将某些字符从utf-8解码为iso-8859-1

来自分类Dev

将合并的迪亚尔人转换为ISO 8859-1

来自分类Dev

C#XmlSerializer强制将编码类型转换为ISO-8859-1

来自分类Dev

将合并的迪亚尔人转换为ISO 8859-1

来自分类Dev

如何将ISO-8859-1字符转换为Sublime Text中的实体名称?

Related 相关文章

  1. 1

    从UTF8转换为ISO 8859-5,改为获取ISO 8859-1

  2. 2

    将iso-8859-1转换为utf-8 javascript

  3. 3

    Javascript将字符串从utf-8转换为iso-8859-1

  4. 4

    将ISO-8859-1转换为utf-8(øæå)

  5. 5

    如何在Ruby 2.0中将UTF-8转换为ISO-8859-1?

  6. 6

    将字符从ISO-8859-1转换为Javascript中的UTF-8编码时,保持字符表示

  7. 7

    根据.procmailrc文件中的文本将utf8解码为ISO-8859-1邮件

  8. 8

    jQuery或Javascript:从ISO-8859-1转换为utf8

  9. 9

    在Delphi中将UTF8转换为ANSI(ISO-8859-1)

  10. 10

    Symfony2 + Doctrine:如何将iso8859-1转换为utf-8,反之亦然?

  11. 11

    Ruby将字符串编码从ISO-8859-1转换为UTF-8无效

  12. 12

    从utf8转换为iso-8859-1后未显示丹麦语字符

  13. 13

    如何从C中的八进制ISO-8859-1转储utf8

  14. 14

    从 UTF8 转换为 ISO-8859-1 不起作用

  15. 15

    将UTF-8转换为ISO-8859-2(抛光)

  16. 16

    如何将ISO-8859-1字符正确替换为UTF-8?

  17. 17

    字符编码问题-UTF8 / iso-8859-1

  18. 18

    如何将 iso8859_6 中的文件名转换为 utf-8?

  19. 19

    为什么在JSP上UTF8文本显示为ISO-8859-1

  20. 20

    从UTF-8转换为ISO8859-1并重新转换为UTF-8后,符号会丢失

  21. 21

    使用Dart语言将字符串从ISO-8859-2转换为UTF-8

  22. 22

    使用Dart语言将字符串从ISO-8859-2转换为UTF-8

  23. 23

    从ISO-8859-1转换大型XML为UTF-8与外部DTD实体

  24. 24

    ASP:我无法将某些字符从utf-8解码为iso-8859-1

  25. 25

    ASP:我无法将某些字符从utf-8解码为iso-8859-1

  26. 26

    将合并的迪亚尔人转换为ISO 8859-1

  27. 27

    C#XmlSerializer强制将编码类型转换为ISO-8859-1

  28. 28

    将合并的迪亚尔人转换为ISO 8859-1

  29. 29

    如何将ISO-8859-1字符转换为Sublime Text中的实体名称?

热门标签

归档