我正在使用SAS的python API,并通过以下方式上传了表格:
s.upload("./data/hmeq.csv", casout=dict(name=tbl_name, replace=True))
我可以通过查看表格的详细信息s.tableinfo()
。
§ TableInfo
Name Rows Columns IndexedColumns Encoding CreateTimeFormatted ModTimeFormatted AccessTimeFormatted JavaCharSet CreateTime ... Repeated View MultiPart SourceName SourceCaslib Compressed Creator Modifier SourceModTimeFormatted SourceModTime
0 HMEQ 5960 13 0 utf-8 2020-02-10T16:48:02-05:00 2020-02-10T16:48:02-05:00 2020-02-10T21:10:34-05:00 UTF8 1.896990e+09 ... 0 0 0 0 aforoo 2020-02-10T16:48:02-05:00 1.896990e+09
1 rows × 23 columns
但是,我无法在python中访问表的任何值。例如,假设我要获取行数和列数作为python标量。我知道可以使用来将SAS表放入pandas
表中pd.DataFrame
,但是不适用于该表,并且得到:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __init__(self, data, index, columns, dtype, copy)
346 dtype=dtype, copy=copy)
347 elif isinstance(data, dict):
--> 348 mgr = self._init_dict(data, index, columns, dtype=dtype)
349 elif isinstance(data, ma.MaskedArray):
350 import numpy.ma.mrecords as mrecords
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _init_dict(self, data, index, columns, dtype)
457 arrays = [data[k] for k in keys]
458
--> 459 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
460
461 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
7354 # figure out the index, if necessary
7355 if index is None:
-> 7356 index = extract_index(arrays)
7357
7358 # don't force copy because getting jammed in an ndarray anyway
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in extract_index(data)
7391
7392 if not indexes and not raw_lengths:
-> 7393 raise ValueError('If using all scalar values, you must pass'
7394 ' an index')
7395
ValueError: If using all scalar values, you must pass an index
我对casout
SAS中的任何其他表有相同的问题。感谢您的帮助或评论。
我在下面找到了解决方案,并且工作正常。例如,在这里我使用dataSciencePilot.exploreData
动作,可以通过以下方式获得结果:
casout = dict(name = 'out1', replace=True)
s.dataSciencePilot.exploreData(table=tbl_name, target='bad', casout=casout)
fetch_opts = dict(maxrows=100000000, to=1000000)
df = s.fetch(table='out1', **fetch_opts)['Fetch']
features = pd.DataFrame(df)
type(features)
返回pandas.core.frame.DataFrame
。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句