我需要从WIKI标记页面中读取要读取的数据,并将其存储为表结构。我试图弄清楚如何正确地将以下标记语法解析为C#中的某些表数据结构
这是一个示例表:
|| Owner || Action || Status || Comments ||
| Bill | Fix the lobby | In Progress | This is easy |
| Joe | Fix the bathroom | In Progress | Plumbing \\
\\
Electric \\
\\
Painting \\
\\
\\ |
| Scott | Fix the roof | Complete | This is expensive |
这是它直接出现的方式:
|| Owner|| Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|
如您所见:
我尝试逐行阅读,然后连接在其间具有“ \”的行,但这似乎有点不客气。
我还尝试将其读为完整字符串,然后仅通过“ ||”进行解析 首先,然后继续阅读,直到我碰到相同数量的“ |” 然后转到下一行。这似乎可行,但感觉使用正则表达式或类似方法可能会更优雅。
谁能建议解析此数据的正确方法?
由于您编辑后的输入格式与之前发布的输入格式大不相同,因此我在很大程度上替换了先前的答案。这导致了某种不同的解决方案。
由于行后不再有换行符,因此确定行结束位置的唯一方法是要求每行与表头具有相同的列数。这至少是如果您不想依赖一个仅提供的示例字符串中存在的某些潜在的脆弱空白约定(即,行分隔符是唯一|
不带空格的字符串)。您的问题至少不提供此作为行定界符的规范。
下面的“解析器”至少提供了可以从您的格式规范和示例字符串得出的错误处理有效性检查,并且还允许没有行的表。这些注释说明了基本步骤。
public class TableParser
{
const StringSplitOptions SplitOpts = StringSplitOptions.None;
const string RowColSep = "|";
static readonly string[] HeaderColSplit = { "||" };
static readonly string[] RowColSplit = { RowColSep };
static readonly string[] MLColSplit = { @"\\" };
public class TableRow
{
public List<string[]> Cells;
}
public class Table
{
public string[] Header;
public TableRow[] Rows;
}
public static Table Parse(string text)
{
// Isolate the header columns and rows remainder.
var headerSplit = text.Split(HeaderColSplit, SplitOpts);
Ensure(headerSplit.Length > 1, "At least 1 header column is required in the input");
// Need to check whether there are any rows.
var hasRows = headerSplit.Last().IndexOf(RowColSep) >= 0;
var header = headerSplit.Skip(1)
.Take(headerSplit.Length - (hasRows ? 2 : 1))
.Select(c => c.Trim())
.ToArray();
if (!hasRows) // If no rows for this table, we are done.
return new Table() { Header = header, Rows = new TableRow[0] };
// Get all row columns from the remainder.
var rowsCols = headerSplit.Last().Split(RowColSplit, SplitOpts);
// Require same amount of columns for a row as the header.
Ensure((rowsCols.Length % (header.Length + 1)) == 1,
"The number of row colums does not match the number of header columns");
var rows = new TableRow[(rowsCols.Length - 1) / (header.Length + 1)];
// Fill rows by sequentially taking # header column cells
for (int ri = 0, start = 1; ri < rows.Length; ri++, start += header.Length + 1)
{
rows[ri] = new TableRow() {
Cells = rowsCols.Skip(start).Take(header.Length)
.Select(c => c.Split(MLColSplit, SplitOpts).Select(p => p.Trim()).ToArray())
.ToList()
};
};
return new Table { Header = header, Rows = rows };
}
private static void Ensure(bool check, string errorMsg)
{
if (!check)
throw new InvalidDataException(errorMsg);
}
}
当这样使用时:
public static void Main(params string[] args)
{
var wikiLine = @"|| Owner|| Action || Status || Comments || | Bill\\ | fix the lobby |In Progress | This is eary| | Joe\\ |fix the bathroom\\ | In progress| plumbing \\Electric \\Painting \\ \\ | | Scott \\ | fix the roof \\ | Complete | this is expensive|";
var table = TableParser.Parse(wikiLine);
Console.WriteLine(string.Join(", ", table.Header));
foreach (var r in table.Rows)
Console.WriteLine(string.Join(", ", r.Cells.Select(c => string.Join(Environment.NewLine + "\t# ", c))));
}
它将产生以下输出:
其中,"\t# "
代表由于\\
输入中存在引起的换行符。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句