Pandas 정수`dtypes`가 Unix와 Windows에서 동일하게 작동하지 않는 이유는 무엇입니까?

debugcn 에 게시 Dev

스티븐 C. 하웰

dtypesPandas 의 for 열을 확인하면서 DataFrame정수 열의 데이터 유형이라는 것을 깨달았 np.int64지만 놀라운 것은 Unix에서는 이것이 int같지만 Windows에서는 그렇지 않다는 것입니다. 왜 동일하게 작동하지 않습니까? 사용하여 비교할 때 결과가 동일한 방식으로 DataFrame을 만드는 방법이 df.dtypes == int있습니까?

다음은 설명 할 몇 가지 샘플 코드입니다.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: pd.__version__
Out[3]: '1.0.1'

In [4]: np.__version__
Out[4]: '1.18.1'

In [5]: data = pd.DataFrame({'col_1': range(5), 'col_2': np.linspace(0, 1, 5)})

In [6]: data.dtypes
Out[6]: 
col_1      int64
col_2    float64
dtype: object

In [7]: data.dtypes == float
Out[7]: 
col_1    False
col_2     True
dtype: bool

그 모든 것이 Windows와 Unix에서 동일한 결과를 생성하지만 dtype을 Windows와 비교 int하면

In [8]: data.dtypes == int
Out[8]: 
col_1    False
col_2    False
dtype: bool

그리고 Unix에서는

In [8]: data.dtypes == int
Out[8]:
col_1     True
col_2    False
dtype: bool

데이터 유형을 지정해 보았습니다. 이것은 Unix에서 작동합니다. 추가하여 데이터 유형을 입력 할 수 있습니다.dtype=(int, float)

In [9]: data = pd.DataFrame({'col_1': range(5), 'col_2': np.linspace(0, 1, 5)}, dtype=(int, float))

In [10]: data.dtypes
Out[10]:
col_1      int64
col_2    float64
dtype: object

하지만 Windows에서이 코드는 ValueError

In [10]: data = pd.DataFrame({'col_1': range(5), 'col_2': np.linspace(0, 1, 5)}, dtype=(int, float))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-284f0f12d3b6> in <module>
----> 1 data = pd.DataFrame({'col_1': range(5), 'col_2': np.linspace(0, 1, 5)}, dtype=(int, float))

~\Miniconda3\envs\pandas_test\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
    423             data = {}
    424         if dtype is not None:
--> 425             dtype = self._validate_dtype(dtype)
    426
    427         if isinstance(data, DataFrame):

~\Miniconda3\envs\pandas_test\lib\site-packages\pandas\core\generic.py in _validate_dtype(self, dtype)
    257
    258         if dtype is not None:
--> 259             dtype = pandas_dtype(dtype)
    260
    261             # a compound dtype

~\Miniconda3\envs\pandas_test\lib\site-packages\pandas\core\dtypes\common.py in pandas_dtype(dtype)
   1872     # raise a consistent TypeError if failed
   1873     try:
-> 1874         npdtype = np.dtype(dtype)
   1875     except SyntaxError:
   1876         # np.dtype uses `eval` which can raise SyntaxError

ValueError: mismatch in size of old and new data-descriptor

스티븐 C. 하웰

모든 유형의 정수 또는 부동 소수점을 비교하는 플랫폼 독립적 수단의 경우 다음을 사용할 수 있습니다.

In [10]: [np.issubdtype(dtype, np.integer) for dtype in data.dtypes]
Out[10]: [True, False]

In [11]: [np.issubdtype(dtype, np.float) for dtype in data.dtypes]
Out[11]: [False, True]

윈도우 / 유닉스 차이 뒤에 주요 문제는 즉 int팬더 총점 dtype의 np.int64유닉스 및 np.int32Windows에서.

이 코드는 동작의 차이를 보여줍니다.

import numpy as np
import pandas as pd

print(f'numpy version: {np.__version__}')
print(f'pandas version: {pd.__version__}')

data = pd.DataFrame({
    'col_i': range(5),
    'col_f': np.linspace(0, 1, 5),
})
data['col_i32'] = data.col_i.astype(np.int32)
data['col_i64'] = data.col_i.astype(np.int64)
data['col_f32'] = data.col_i.astype(np.float32)
data['col_f64'] = data.col_i.astype(np.float64)
print(f'\ndata.dtypes: \n{data.dtypes}')
print(f'\ndata.dtypes == int: \n{data.dtypes == int}')
print(f'\ndata.dtypes == float: \n{data.dtypes == float}')

Windows의 결과는 다음과 같습니다.

numpy version: 1.18.1
pandas version: 1.0.1

data.dtypes:
col_i        int64
col_f      float64
col_i32      int32
col_i64      int64
col_f32    float32
col_f64    float64
dtype: object

data.dtypes == int:
col_i      False
col_f      False
col_i32     True  # the np.int32 column
col_i64    False
col_f32    False
col_f64    False
dtype: bool

data.dtypes == float:
col_i      False
col_f       True
col_i32    False
col_i64    False
col_f32    False
col_f64     True
dtype: bool

다음은 Unix의 출력입니다.

numpy version: 1.18.1
pandas version: 1.0.1

data.dtypes:
col_i        int64
col_f      float64
col_i32      int32
col_i64      int64
col_f32    float32
col_f64    float64
dtype: object

data.dtypes == int:
col_i       True  # a np.int64 column
col_f      False
col_i32    False
col_i64     True  # a np.int64 column
col_f32    False
col_f64    False
dtype: bool

data.dtypes == float:
col_i      False
col_f       True
col_i32    False
col_i64    False
col_f32    False
col_f64     True
dtype: bool

dtypeWindows에서를 지정 하지 못하는 이유는 유형이 동일한 메모리 크기를 가져야하기 때문입니다. 팬더 문서 상태 "오직 하나의 DTYPE이 허용됩니다."그 그러나 이것은 다음 중 하나가 Windows 및 Unix에서 작동하기 때문에 분명히 사실이 아닙니다.

data = pd.DataFrame({'col_i32': range(5), 'col_f32': np.linspace(0, 1, 5)}, dtype=(np.int32, np.float32))

data = pd.DataFrame({'col_i32': range(5), 'col_f32': np.linspace(0, 1, 5)}, dtype=(np.int64, np.float64))

실제로 의미하는 바는 "단일 [데이터 크기] 만 허용됩니다"라고 생각합니다.

지정 오류의 문제는 dtype=(int, float)Windows가 캐스팅되는 경우 기본 문제는, 위의 그림에 돌아 간다 int로 np.int32와 float같은 np.float64유닉스 대신 캐스트 동안, int같은 np.int64과 float같은 np.float64. Pandas는 동일한 메모리 크기를 필요로하며 Unix에서는 작동하지만 Windows에서는 작동하지 않습니다.

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정2021-04-5

몇 마디 만하겠습니다

0리뷰

로그인참여 후 검토

Related 관련 기사

기사

Pandas 정수`dtypes`가 Unix와 Windows에서 동일하게 작동하지 않는 이유는 무엇입니까?

Pandas 정수`dtypes`가 Unix와 Windows에서 동일하게 작동하지 않는 이유는 무엇입니까?

Nullable <T>에서 Linq 조인이 ==와 동일하게 작동하지 않는 이유는 무엇입니까?

IIf가 If / Then / EndIf와 동일하게 작동하지 않는 이유는 무엇입니까?

Windows 8.1에서는 localhost가 작동하지 않지만 127.0.0.1은 작동하는 이유는 무엇입니까?

if, elif 또는 else가 Python에서 .lower ()와 함께 작동하지 않는 이유는 무엇입니까?

Sidekiq 4가 메일러에서 올바르게 작동하지 않는 이유는 무엇입니까?

dart의 listSync ()가 Windows와 Ubuntu에서 다르게 작동하는 이유는 무엇입니까?

find와 동일한 쉘 스크립트에서 작동하지 않는 이유는 무엇입니까?

이 정규식에서 지연 일치가 작동하지 않는 이유는 무엇입니까?

getDeclaredMethod가 특정 메서드와 instanceof 관계에있는 인수와 함께 작동하지 않는 이유는 무엇입니까?

mongodump와 mongorestore가 작동하지 않는 이유는 무엇입니까?

$ cookies와 $ localStorage가 함께 작동하지 않는 이유는 무엇입니까?

appendChild가 createDocumentFragment와 함께 작동하지 않는 이유는 무엇입니까?

fullCalendar와 datePicker가 모두 작동하지 않는 이유는 무엇입니까?

13.10에서`exec startx`가 작동하지 않는 이유는 무엇입니까?

Ubuntu에서 JSLint / JSHint가 작동하지 않는 이유는 무엇입니까?

SwiftUI : AppDelegate에서 ObservedObject가 작동하지 않는 이유는 무엇입니까?

내 onload가 Wordpress에서 작동하지 않는 이유는 무엇입니까?

`fprintf / sprintf`가`keypressfcn`에서 작동하지 않는 이유는 무엇입니까?

AspectRatio가 ListView에서 작동하지 않는 이유는 무엇입니까?

PHP에서 XPath count ()가 작동하지 않는 이유는 무엇입니까?

Visibility가 Flutter에서 작동하지 않는 이유는 무엇입니까?

내 customscrollview에서 sliverlist가 작동하지 않는 이유는 무엇입니까?

내 bottomnavigationbar가 flutter에서 작동하지 않는 이유는 무엇입니까?

ParDo가 DataflowRunner에서 작동하지 않는 이유는 무엇입니까?

SETUID가 Raspberry Pi에서 작동하지 않는 이유는 무엇입니까?

@AppStorage가 SwiftUI에서 작동하지 않는 이유는 무엇입니까?

PanResponder가 useEffect Hook에서 작동하지 않는 이유는 무엇입니까?

sudo가 curl에서 작동하지 않는 이유는 무엇입니까?

@Resource가 HttpServlet에서 작동하지 않는 이유는 무엇입니까?