matlab数据文件到pandas DataFrame

Ramon Martinez 发表于 Dev

拉蒙·马丁内斯（Ramon Martinez）

是否有将matlab .mat（matlab格式的数据）文件转换为Panda的标准方法DataFrame？

我知道可以通过使用一种解决方法，scipy.io但我想知道是否有一种直接的方法。

德斯特里夫

我发现了两种方式：scipy或mat4py。

mat4py

从MAT文件加载数据

loadmat函数仅使用Python的dict和list对象将存储在MAT文件中的所有变量加载到简单的Python数据结构中。数字和单元格数组将转换为按行排序的嵌套列表。压缩数组以消除仅包含一个元素的数组。结果数据结构由与JSON格式兼容的简单类型组成。

示例：将MAT文件加载到Python数据结构中：

data = loadmat('datafile.mat')

从：

https://pypi.python.org/pypi/mat4py/0.1.0

Scipy：

例子：

import numpy as np
from scipy.io import loadmat  # this is the SciPy module that loads mat-files
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import pandas as pd

mat = loadmat('measured_data.mat')  # load mat-file
mdata = mat['measuredData']  # variable in mat file
mdtype = mdata.dtype  # dtypes of structures are "unsized objects"
# * SciPy reads in structures as structured NumPy arrays of dtype object
# * The size of the array is the size of the structure array, not the number
#   elements in any particular field. The shape defaults to 2-dimensional.
# * For convenience make a dictionary of the data using the names from dtypes
# * Since the structure has only one element, but is 2-D, index it at [0, 0]
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
# Reconstruct the columns of the data table from just the time series
# Use the number of intervals to test if a field is a column or metadata
columns = [n for n, v in ndata.iteritems() if v.size == ndata['numIntervals']]
# now make a data frame, setting the time stamps as the index
df = pd.DataFrame(np.concatenate([ndata[c] for c in columns], axis=1),
                  index=[datetime(*ts) for ts in ndata['timestamps']],
                  columns=columns)