Re: String variables combining files

pspp-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: String variables combining files

From:	Alan Mead
Subject:	Re: String variables combining files
Date:	Thu, 26 Mar 2015 06:10:36 -0500
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

On 3/26/2015 3:59 AM, ftr wrote:

So this means that the programs that produce the CSV files produce output with different string variable width ?
This is due to the programs or to the people that use the progs ?

In general, when you import text files you fix the variable width in the DATA LIST.
Or you use GET DATA/TYPE=
http://www.gnu.org/software/pspp/manual/pspp.html#GET-DATA

And why don't you set FORMAT on each of the separate files before you integrate them ?

When I worked in a project that sounds similar to yours we did a serious pre-field work training of the local data producers that succeeded in making the local projects aware what was on stake (motivation), that made the local heads control the consistency of data to be sent - something we could not do because we had no direct access to the local projects, for which the local heads had better knowledge, and it would have cost us too much (data control) - and that assured that data were sent in a coherent format and at time.

Maybe you have to train your local people ?

Just some ideas for local problem solving. I am happy that we have volunteers doing the programming work so we should not overcharge them with more work that we can at our side.

ftr,

It sounds like you don't run into this problem, so maybe this discussion isn't relevant for you.

But to repeat the reasons why this change is a good idea: (1) it would still be EASIER to have PSPP deal with this problem automatically, rather than forcing me to deal with this issue; and (2) and it would be a simple way to create another point distinguishing PSPP as superior to SPSS.

I have given some thought to why SPSS has this limitation. One possibility is that it's simply an old limitation due to some original hardware or software issues. I speculate below that at the time of SPSS's inception, string data was not particularly common nor important and that variable lengths would be rare. Also, it could be due to performance issues, but if so I'm sure it would be faster for PSPP to resolve this issue than for me to due so manually; I assume that fixing this issue wouldn't generally slow down merge/join files?

I cannot imagine a situation where having this restriction on matching string length would be a feature. But if PSPP solves the problem by truncating longer strings, then some data would be lost and sometimes that will be unacceptable so it would be good to issue a warning or force people to turn on this feature. If the solution can be to change the final string length to the longest encountered string length (and, I assume, therefore truncate no data) then I cannot see a problem arising from this feature.

I also speculate that this problem is far more of an issue today than when SPSS was first created, because string data is easier (sometimes more natural) to collect today. SPSS would have originally (i.e., cerca 1970) been fed punch cards and most string data would have been generated either by the researcher (like a coding) or by something like a scantron or a scantron-like response grid. I'm sure someone had participants respond by writing something in but it would have been keyed into the computer into a fixed width. Using a physical storage medium (cards) would have discouraged strings unless they were necessary and encouraged researchers to use the shortest possible length. Compare that to now: my web-based surveys often have variable length strings like email, useragent and other string-based meta-data and often the survey includes fill-in-the-blank or short answer questions. Often I get datasets where responses are strings, rather than numeric codes (e.g., "male" and "female"). Even if they are the same data (e.g., email), it would be natural for these variables to have different lengths across different surveys. I don't foresee these conditions changing.

-Alan

-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

+815.588.3846 (Office)
+267.334.4143 (Mobile)

http://www.alanmead.org

Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing: http://www.iacat.org/jcat

[Prev in Thread]

Current Thread

[Next in Thread]

String variables combining files, Frans Houweling, 2015/03/25
- Re: String variables combining files, Alan Mead, 2015/03/25
  - Re: String variables combining files, Frans Houweling, 2015/03/25
    - Re: String variables combining files, John Darrington, 2015/03/25
    - Re: String variables combining files, Frans Houweling, 2015/03/25
- Re: String variables combining files, ftr, 2015/03/25
  - Re: String variables combining files, Alan Mead, 2015/03/25
    - Re: String variables combining files, ftr, 2015/03/26
    - Re: String variables combining files, Alan Mead <=
    - Re: String variables combining files, Frans Houweling, 2015/03/26
    - Re: String variables combining files, Alan Mead, 2015/03/26
    - Re: String variables combining files, Frans Houweling, 2015/03/26
    - Re: String variables combining files, Alan Mead, 2015/03/26

Prev by Date: Re: String variables combining files
Next by Date: Re: String variables combining files
Previous by thread: Re: String variables combining files
Next by thread: Re: String variables combining files
Index(es):
- Date
- Thread