我将如何在 Python 中创建另一个 int() 函数以便我理解它？

debugcn 发表于 Dev

玫瑰湖

我需要知道我将如何在 Python 中重新创建 int() 函数的一个版本，以便我可以完全理解它并创建我自己的版本，其中包含超过基数 36 的多个基数。我可以从十进制转换为我自己的base（base 54，已更改）就好了，但我需要弄清楚如何将 base 格式的字符串转换为整数（base 10）。

基本上，我想知道如何从我的基数（我称之为基数 54）到基数 10。我不需要具体细节，因为如果我有一个例子，我可以自己解决。不幸的是，我在 int() 函数上找不到任何东西，尽管我知道它必须在某个地方，因为 Python 是开源的。

这是我能找到的最接近它的方法，但它不提供函数本身的源代码。Python int() 测试。

如果你能帮忙，谢谢。如果没有，那么感谢您阅读本文，我猜？

maor10

你不会喜欢这个答案，但 int(num, base) 是在 C 中定义的（它是一个内置的）

我四处寻找，找到了：

https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Objects/longobject.c

/* Parses an int from a bytestring. Leading and trailing whitespace will be
 * ignored.
 *
 * If successful, a PyLong object will be returned and 'pend' will be pointing
 * to the first unused byte unless it's NULL.
 *
 * If unsuccessful, NULL will be returned.
 */
PyObject *
PyLong_FromString(const char *str, char **pend, int base)
{
    int sign = 1, error_if_nonzero = 0;
    const char *start, *orig_str = str;
    PyLongObject *z = NULL;
    PyObject *strobj;
    Py_ssize_t slen;

    if ((base != 0 && base < 2) || base > 36) {
        PyErr_SetString(PyExc_ValueError,
                        "int() arg 2 must be >= 2 and <= 36");
        return NULL;
    }
    while (*str != '\0' && Py_ISSPACE(Py_CHARMASK(*str))) {
        str++;
    }
    if (*str == '+') {
        ++str;
    }
    else if (*str == '-') {
        ++str;
        sign = -1;
    }
    if (base == 0) {
        if (str[0] != '0') {
            base = 10;
        }
        else if (str[1] == 'x' || str[1] == 'X') {
            base = 16;
        }
        else if (str[1] == 'o' || str[1] == 'O') {
            base = 8;
        }
        else if (str[1] == 'b' || str[1] == 'B') {
            base = 2;
        }
        else {
            /* "old" (C-style) octal literal, now invalid.
               it might still be zero though */
            error_if_nonzero = 1;
            base = 10;
        }
    }
    if (str[0] == '0' &&
        ((base == 16 && (str[1] == 'x' || str[1] == 'X')) ||
         (base == 8  && (str[1] == 'o' || str[1] == 'O')) ||
         (base == 2  && (str[1] == 'b' || str[1] == 'B')))) {
        str += 2;
        /* One underscore allowed here. */
        if (*str == '_') {
            ++str;
        }
    }
    if (str[0] == '_') {
        /* May not start with underscores. */
        goto onError;
    }

    start = str;
    if ((base & (base - 1)) == 0) {
        int res = long_from_binary_base(&str, base, &z);
        if (res < 0) {
            /* Syntax error. */
            goto onError;
        }
    }
    else {
/***
Binary bases can be converted in time linear in the number of digits, because
Python's representation base is binary.  Other bases (including decimal!) use
the simple quadratic-time algorithm below, complicated by some speed tricks.
First some math:  the largest integer that can be expressed in N base-B digits
is B**N-1.  Consequently, if we have an N-digit input in base B, the worst-
case number of Python digits needed to hold it is the smallest integer n s.t.
    BASE**n-1 >= B**N-1  [or, adding 1 to both sides]
    BASE**n >= B**N      [taking logs to base BASE]
    n >= log(B**N)/log(BASE) = N * log(B)/log(BASE)
The static array log_base_BASE[base] == log(base)/log(BASE) so we can compute
this quickly.  A Python int with that much space is reserved near the start,
and the result is computed into it.
The input string is actually treated as being in base base**i (i.e., i digits
are processed at a time), where two more static arrays hold:
    convwidth_base[base] = the largest integer i such that base**i <= BASE
    convmultmax_base[base] = base ** convwidth_base[base]
The first of these is the largest i such that i consecutive input digits
must fit in a single Python digit.  The second is effectively the input
base we're really using.
Viewing the input as a sequence <c0, c1, ..., c_n-1> of digits in base
convmultmax_base[base], the result is "simply"
   (((c0*B + c1)*B + c2)*B + c3)*B + ... ))) + c_n-1
where B = convmultmax_base[base].
Error analysis:  as above, the number of Python digits `n` needed is worst-
case
    n >= N * log(B)/log(BASE)
where `N` is the number of input digits in base `B`.  This is computed via
    size_z = (Py_ssize_t)((scan - str) * log_base_BASE[base]) + 1;
below.  Two numeric concerns are how much space this can waste, and whether
the computed result can be too small.  To be concrete, assume BASE = 2**15,
which is the default (and it's unlikely anyone changes that).
Waste isn't a problem:  provided the first input digit isn't 0, the difference
between the worst-case input with N digits and the smallest input with N
digits is about a factor of B, but B is small compared to BASE so at most
one allocated Python digit can remain unused on that count.  If
N*log(B)/log(BASE) is mathematically an exact integer, then truncating that
and adding 1 returns a result 1 larger than necessary.  However, that can't
happen:  whenever B is a power of 2, long_from_binary_base() is called
instead, and it's impossible for B**i to be an integer power of 2**15 when
B is not a power of 2 (i.e., it's impossible for N*log(B)/log(BASE) to be
an exact integer when B is not a power of 2, since B**i has a prime factor
other than 2 in that case, but (2**15)**j's only prime factor is 2).
The computed result can be too small if the true value of N*log(B)/log(BASE)
is a little bit larger than an exact integer, but due to roundoff errors (in
computing log(B), log(BASE), their quotient, and/or multiplying that by N)
yields a numeric result a little less than that integer.  Unfortunately, "how
close can a transcendental function get to an integer over some range?"
questions are generally theoretically intractable.  Computer analysis via
continued fractions is practical:  expand log(B)/log(BASE) via continued
fractions, giving a sequence i/j of "the best" rational approximations.  Then
j*log(B)/log(BASE) is approximately equal to (the integer) i.  This shows that
we can get very close to being in trouble, but very rarely.  For example,
76573 is a denominator in one of the continued-fraction approximations to
log(10)/log(2**15), and indeed:
    >>> log(10)/log(2**15)*76573
    16958.000000654003
is very close to an integer.  If we were working with IEEE single-precision,
rounding errors could kill us.  Finding worst cases in IEEE double-precision
requires better-than-double-precision log() functions, and Tim didn't bother.
Instead the code checks to see whether the allocated space is enough as each
new Python digit is added, and copies the whole thing to a larger int if not.
This should happen extremely rarely, and in fact I don't have a test case
that triggers it(!).  Instead the code was tested by artificially allocating
just 1 digit at the start, so that the copying code was exercised for every
digit beyond the first.
***/
        twodigits c;           /* current input character */
        Py_ssize_t size_z;
        Py_ssize_t digits = 0;
        int i;
        int convwidth;
        twodigits convmultmax, convmult;
        digit *pz, *pzstop;
        const char *scan, *lastdigit;
        char prev = 0;

        static double log_base_BASE[37] = {0.0e0,};
        static int convwidth_base[37] = {0,};
        static twodigits convmultmax_base[37] = {0,};

        if (log_base_BASE[base] == 0.0) {
            twodigits convmax = base;
            int i = 1;

            log_base_BASE[base] = (log((double)base) /
                                   log((double)PyLong_BASE));
            for (;;) {
                twodigits next = convmax * base;
                if (next > PyLong_BASE) {
                    break;
                }
                convmax = next;
                ++i;
            }
            convmultmax_base[base] = convmax;
            assert(i > 0);
            convwidth_base[base] = i;
        }

        /* Find length of the string of numeric characters. */
        scan = str;
        lastdigit = str;

        while (_PyLong_DigitValue[Py_CHARMASK(*scan)] < base || *scan == '_') {
            if (*scan == '_') {
                if (prev == '_') {
                    /* Only one underscore allowed. */
                    str = lastdigit + 1;
                    goto onError;
                }
            }
            else {
                ++digits;
                lastdigit = scan;
            }
            prev = *scan;
            ++scan;
        }
        if (prev == '_') {
            /* Trailing underscore not allowed. */
            /* Set error pointer to first underscore. */
            str = lastdigit + 1;
            goto onError;
        }

        /* Create an int object that can contain the largest possible
         * integer with this base and length.  Note that there's no
         * need to initialize z->ob_digit -- no slot is read up before
         * being stored into.
         */
        double fsize_z = (double)digits * log_base_BASE[base] + 1.0;
        if (fsize_z > (double)MAX_LONG_DIGITS) {
            /* The same exception as in _PyLong_New(). */
            PyErr_SetString(PyExc_OverflowError,
                            "too many digits in integer");
            return NULL;
        }
        size_z = (Py_ssize_t)fsize_z;
        /* Uncomment next line to test exceedingly rare copy code */
        /* size_z = 1; */
        assert(size_z > 0);
        z = _PyLong_New(size_z);
        if (z == NULL) {
            return NULL;
        }
        Py_SIZE(z) = 0;

        /* `convwidth` consecutive input digits are treated as a single
         * digit in base `convmultmax`.
         */
        convwidth = convwidth_base[base];
        convmultmax = convmultmax_base[base];

        /* Work ;-) */
        while (str < scan) {
            if (*str == '_') {
                str++;
                continue;
            }
            /* grab up to convwidth digits from the input string */
            c = (digit)_PyLong_DigitValue[Py_CHARMASK(*str++)];
            for (i = 1; i < convwidth && str != scan; ++str) {
                if (*str == '_') {
                    continue;
                }
                i++;
                c = (twodigits)(c *  base +
                                (int)_PyLong_DigitValue[Py_CHARMASK(*str)]);
                assert(c < PyLong_BASE);
            }

            convmult = convmultmax;
            /* Calculate the shift only if we couldn't get
             * convwidth digits.
             */
            if (i != convwidth) {
                convmult = base;
                for ( ; i > 1; --i) {
                    convmult *= base;
                }
            }

            /* Multiply z by convmult, and add c. */
            pz = z->ob_digit;
            pzstop = pz + Py_SIZE(z);
            for (; pz < pzstop; ++pz) {
                c += (twodigits)*pz * convmult;
                *pz = (digit)(c & PyLong_MASK);
                c >>= PyLong_SHIFT;
            }
            /* carry off the current end? */
            if (c) {
                assert(c < PyLong_BASE);
                if (Py_SIZE(z) < size_z) {
                    *pz = (digit)c;
                    ++Py_SIZE(z);
                }
                else {
                    PyLongObject *tmp;
                    /* Extremely rare.  Get more space. */
                    assert(Py_SIZE(z) == size_z);
                    tmp = _PyLong_New(size_z + 1);
                    if (tmp == NULL) {
                        Py_DECREF(z);
                        return NULL;
                    }
                    memcpy(tmp->ob_digit,
                           z->ob_digit,
                           sizeof(digit) * size_z);
                    Py_DECREF(z);
                    z = tmp;
                    z->ob_digit[size_z] = (digit)c;
                    ++size_z;
                }
            }
        }
    }
    if (z == NULL) {
        return NULL;
    }
    if (error_if_nonzero) {
        /* reset the base to 0, else the exception message
           doesn't make too much sense */
        base = 0;
        if (Py_SIZE(z) != 0) {
            goto onError;
        }
        /* there might still be other problems, therefore base
           remains zero here for the same reason */
    }
    if (str == start) {
        goto onError;
    }
    if (sign < 0) {
        Py_SIZE(z) = -(Py_SIZE(z));
    }
    while (*str && Py_ISSPACE(Py_CHARMASK(*str))) {
        str++;
    }
    if (*str != '\0') {
        goto onError;
    }
    long_normalize(z);
    z = maybe_small_long(z);
    if (z == NULL) {
        return NULL;
    }
    if (pend != NULL) {
        *pend = (char *)str;
    }
    return (PyObject *) z;

  onError:
    if (pend != NULL) {
        *pend = (char *)str;
    }
    Py_XDECREF(z);
    slen = strlen(orig_str) < 200 ? strlen(orig_str) : 200;
    strobj = PyUnicode_FromStringAndSize(orig_str, slen);
    if (strobj == NULL) {
        return NULL;
    }
    PyErr_Format(PyExc_ValueError,
                 "invalid literal for int() with base %d: %.200R",
                 base, strobj);
    Py_DECREF(strobj);
    return NULL;
}

如果你想在 C 中定义它，继续尝试使用它 - 如果不是，你将不得不自己编写它

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-07-21

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

我将如何在 Python 中创建另一个 int() 函数以便我理解它？

我将如何在 Python 中创建另一个 int() 函数以便我理解它？

我将如何在Rust中编写此C函数？

如何获得在另一个函数中调用的函数以返回值？PYTHON

如何在python中从另一个类调用函数？

一个int如何等于另一个返回long int的函数？

我想将一个函数的返回值作为python中另一个函数的参数传递

我如何遍历Python中另一个数组内部的数组长度？

在我的python脚本中调用另一个python函数时遇到importerror

Python：如何在另一个函数中包含一个函数？

如何在一个函数中创建setTimeOut并在另一个函数中销毁它？

如果我在一个函数中创建了小部件，如何使用Python Tkinter在另一个函数中访问它们？

如何在python中的另一个函数中使用一个函数

如何在Python中的类中对另一个函数使用一个函数

我将如何在Python 3.3.2中编写该程序？

如何在python中从另一个类调用函数

我如何正确使用函数调用。我要做2批python脚本。一个在输入中，另一个在输出中

如何在python中的另一个def函数中运行一个def函数？

如何在另一个函数python中访问返回类值

我应该如何在模块中导入另一个python文件

我将如何从一个类到另一个类访问一个int变量？（Java）

在一个文件中创建一个函数并从另一个文件访问它 - Python

我不知道如何在另一个函数中调用先前创建的函数？

我如何在 python tkinter 中不点击按钮的情况下进入另一个页面

我将如何在 python 中创建一个（异步/线程/任务）背景队列？

如何在 Python 类的另一个函数中关闭文件？

如何创建一个函数以供稍后在另一个函数中重用

我将如何使用一个变量的内容来创建另一个变量的名称？

我如何在 Python 的另一个 while 循环中正确地创建一个 while 循环？

如何在另一个函数的for循环中使用函数到python中

如何在另一个python函数中执行一个python文件？