|
From: | Alan Mead |
Subject: | Re: String variables combining files |
Date: | Thu, 26 Mar 2015 06:10:36 -0500 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 |
On 3/26/2015 3:59 AM, ftr wrote:So this means that the programs that produce the CSV files produce output with different string variable width ? ftr, It sounds like you don't run into this problem, so maybe this discussion isn't relevant for you. But to repeat the reasons why this change is a good idea: (1) it would still be EASIER to have PSPP deal with this problem automatically, rather than forcing me to deal with this issue; and (2) and it would be a simple way to create another point distinguishing PSPP as superior to SPSS. I have given some thought to why SPSS has this limitation. One possibility is that it's simply an old limitation due to some original hardware or software issues. I speculate below that at the time of SPSS's inception, string data was not particularly common nor important and that variable lengths would be rare. Also, it could be due to performance issues, but if so I'm sure it would be faster for PSPP to resolve this issue than for me to due so manually; I assume that fixing this issue wouldn't generally slow down merge/join files? I cannot imagine a situation where having this restriction on matching string length would be a feature. But if PSPP solves the problem by truncating longer strings, then some data would be lost and sometimes that will be unacceptable so it would be good to issue a warning or force people to turn on this feature. If the solution can be to change the final string length to the longest encountered string length (and, I assume, therefore truncate no data) then I cannot see a problem arising from this feature. I also speculate that this problem is far more of an issue today than when SPSS was first created, because string data is easier (sometimes more natural) to collect today. SPSS would have originally (i.e., cerca 1970) been fed punch cards and most string data would have been generated either by the researcher (like a coding) or by something like a scantron or a scantron-like response grid. I'm sure someone had participants respond by writing something in but it would have been keyed into the computer into a fixed width. Using a physical storage medium (cards) would have discouraged strings unless they were necessary and encouraged researchers to use the shortest possible length. Compare that to now: my web-based surveys often have variable length strings like email, useragent and other string-based meta-data and often the survey includes fill-in-the-blank or short answer questions. Often I get datasets where responses are strings, rather than numeric codes (e.g., "male" and "female"). Even if they are the same data (e.g., email), it would be natural for these variables to have different lengths across different surveys. I don't foresee these conditions changing. -Alan -- Alan D. Mead, Ph.D. President, Talent Algorithms Inc. science + technology = better workers +815.588.3846 (Office) +267.334.4143 (Mobile) http://www.alanmead.org Announcing the Journal of Computerized Adaptive Testing (JCAT), a peer-reviewed electronic journal designed to advance the science and practice of computerized adaptive testing: http://www.iacat.org/jcat |
[Prev in Thread] | Current Thread | [Next in Thread] |