我正在尝试使用我认为有些不完整的表中的数据,而且我无法弄清楚如何解决该问题或如何开始构建问题以查看我尝试完成的任务是否甚至可以使用SQL 。这是我正在使用的数据的假设表示(我以CSV格式输入数据,因为此文本字段不支持表格格式):
Date,Time,Traveler,Source,Destination,Travel Status
9/20/2014,1:00pm,James,Station A,Station B,Scheduled
9/20/2014,1:10pm,James,Station A,Station B,Traveling
9/20/2014,1:40pm,James,,Station B,Arrived
9/20/2014,1:00pm,Ann,Station B,Station A,Scheduled
9/20/2014,1:10pm,Ann,Station B,Station A,Traveling
9/20/2014,1:40pm,Ann,,Station A,Arrived
9/20/2014,1:00pm,Karl,Station A,Station B,Scheduled
9/20/2014,1:10pm,Karl,Station A,Station B,Traveling
9/20/2014,1:40pm,Karl,,Station B,Arrived
9/20/2014,1:00pm,Joyce,Station B,Station A,Scheduled
9/20/2014,1:10pm,Joyce,Station B,Station A,Traveling
9/20/2014,1:40pm,Joyce,,Station A,Arrived
9/20/2014,1:00pm,Kelly,Station B,Station B,Scheduled
9/20/2014,1:10pm,Kelly,Station B,Station B,Traveling
9/20/2014,1:40pm,Kelly,,Station B,Arrived
9/20/2014,1:00pm,Sam,Station A,Station A,Scheduled
9/20/2014,1:10pm,Sam,Station A,Station A,Traveling
9/20/2014,1:40pm,Sam,,Station A,Arrived
我试图计算到达的“类型”,例如A-> A类型的到达,B-> B类型的到达以及A-> B和B-> A的到达。
如果数据是这样的:
Date,Time,Traveler,Source,Destination,Travel Status
9/20/2014,1:00pm,James,Station A,Station B,Scheduled
9/20/2014,1:10pm,James,Station A,Station B,Traveling
9/20/2014,1:40pm,James,Station A,Station B,Arrived
9/20/2014,1:00pm,Ann,Station B,Station A,Scheduled
9/20/2014,1:10pm,Ann,Station B,Station A,Traveling
9/20/2014,1:40pm,Ann,Station B,Station A,Arrived
这个简单的查询将针对每种到达类型(即对于A-> B类型)完成此操作:
SELECT COUNT(*) FROM TRAVEL_TBL WHERE
Travel Status = 'Arrived' AND Source = 'Station A'
AND Destination = 'Station B';
但是,由于包含“已到达”条目的记录中缺少“源”字段,如何执行查询以查找计数?我猜唯一的办法是,按顺序对每个旅行者按时间顺序比较每个记录,并跟踪谁安排了一次旅行的时间以及他们是否到达,并增加此基础的计数。是否可以使用SQL来实现?还是只能用Java或PHP或任何宿主语言来编写应用程序来完成逻辑?
与MS SQL 2012+配合使用的一种解决方案是使用LAG()函数访问先前的行:
SELECT COUNT(*) AS "Count A-B"
FROM (
SELECT
Date, Time, Traveler,
CASE
WHEN Source IS NULL THEN LAG(Source,1) OVER (PARTITION BY Date, Traveler ORDER BY Date)
ELSE Source
END AS Source,
Destination,
[Travel Status]
from TRAVEL_TBL) derived_table
WHERE [Travel Status] = 'Arrived' AND Source = 'Station A' AND Destination = 'Station B';
或者在带有自连接的cte中使用ROW_NUMBER()(这是大多数主要数据库中应该使用的功能)的更通用版本:
;WITH cte AS (
SELECT
Date, Time, Traveler,
ROW_NUMBER() OVER (ORDER BY Traveler, Date, Time) rn,
Source,
Destination,
[Travel Status]
FROM TRAVEL_TBL
)
SELECT COUNT(*) AS "Count A-B"
FROM (
SELECT
c.Date, c.Time, c.Traveler,
CASE
WHEN c.Source IS NULL THEN c2.source
ELSE c.Source
END AS Source,
c.Destination,
c.[Travel Status]
FROM cte c
LEFT JOIN cte c2 ON c.rn = c2.rn+1
) derived_table
WHERE [Travel Status] = 'Arrived' AND Source = 'Station A' AND Destination = 'Station B';
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句