![]() |
![]() |
Search DSS site Finding Data Search DSS data (searches selected DSS data, ICPSR, and Roper) Using Data About Us |
Reshape Datastream DataThis is an example of reshaping a data file created by Datastream. I downloaded stock price of some sample companies for 2007 from Datastream and opened it in Excel. As it is, it is not in a Stata friendly format. I have the company names (Name), ticker (Code), and Currency going across, then date and price in columns. To analyze this data in Stata, I need to convert it so that it will have three columns as variables - date, company name, ticker, and stock price (see the picture on the bottom of the page).
Here is my strategy.
I. Save top two rows for labeling1. Prepare the codes(a) It is easier to do the first step in Excel. I add a row of sequential number and call it id. I then delete the third row, and remove the extra symbols from the ticker. Now the file looks like below.
(b) Select cells A1 through F3, copy, then paste this into a new worksheet. In pasting, select paste special from edit. Transpose the data by selecting the transpose box in the paste special dialog box.
The result looks like this.
If this code sheet is a large file, you can save it as a .csv file in Excel and import it into Stata. Here the file is very small, so I copy and paste it into Stata's data editor. Once it was put into Stata, I save it as code.dta. sort id save code.dta
2. Clean the rest of the file for reshaping and mergingNow back to the Datastream data. The id numbers will be used for merging later on, and for reshaping, it helps to have variable names. I will call the values of columns B through F as price1 - price5. The numbers attached to the "price" should correspond to the id numbers created earlier. Easy way to do this is to delete the values from the Name and the Code row, enter "price" in row below the numbers, then put an equation =B1&C1 in row below. It will combine the word "price" and the number "1". You can then copy and paste for the rest of the cells. The resulting values will be the top row as shown in the picture. Once you copy and paste special, selecting values, then you can delete the top two rows without affecting the result row. I then save the file as Datastream.csv. Now the file looks like this.
II. Reshape the fileIn Stata, I "insheeted" the data file, then reshaped it into a long format. insheet using Datastream.csv, c reshape long price, i(date) j(id)
Now the resulting file looks like this:
III. Add labelsNow I add the labels for those ids by merging the Datastream.dta and the code.dta. sort id merge id using code.dta You may check the result of the merge by tabulating the _merge variable. If all the records merged together, it should be 3.
Now it's ready for analysis. To convert the string date variable to date format, see Stata date format page. |