Princeton University Data and Statistical Services Princeton University Library

Search DSS site
(does NOT search data archives)


Finding Data Search DSS data
(searches selected DSS data, ICPSR, and Roper)


Using Data About Us

Reshape Datastream Data

This is an example of reshaping a data file created by Datastream. I downloaded stock price of some sample companies for 2007 from Datastream and opened it in Excel. As it is, it is not in a Stata friendly format. I have the company names (Name), ticker (Code), and Currency going across, then date and price in columns. To analyze this data in Stata, I need to convert it so that it will have three columns as variables - date, company name, ticker, and stock price (see the picture on the bottom of the page).

Here is my strategy.

  • I. Save the top two rows to use as a label.
  • II. Reshape the rows 4 and below into a long format.
  • III. Add the label.

I. Save top two rows for labeling

1. Prepare the codes

(a) It is easier to do the first step in Excel. I add a row of sequential number and call it id. I then delete the third row, and remove the extra symbols from the ticker. Now the file looks like below.

(b) Select cells A1 through F3, copy, then paste this into a new worksheet. In pasting, select paste special from edit. Transpose the data by selecting the transpose box in the paste special dialog box.

The result looks like this.

If this code sheet is a large file, you can save it as a .csv file in Excel and import it into Stata. Here the file is very small, so I copy and paste it into Stata's data editor. Once it was put into Stata, I save it as code.dta.

	sort id
	save code.dta

2. Clean the rest of the file for reshaping and merging

Now back to the Datastream data. The id numbers will be used for merging later on, and for reshaping, it helps to have variable names. I will call the values of columns B through F as price1 - price5. The numbers attached to the "price" should correspond to the id numbers created earlier. Easy way to do this is to delete the values from the Name and the Code row, enter "price" in row below the numbers, then put an equation =B1&C1 in row below. It will combine the word "price" and the number "1". You can then copy and paste for the rest of the cells. The resulting values will be the top row as shown in the picture. Once you copy and paste special, selecting values, then you can delete the top two rows without affecting the result row. I then save the file as Datastream.csv. Now the file looks like this.

II. Reshape the file

In Stata, I "insheeted" the data file, then reshaped it into a long format.

	insheet using Datastream.csv, c
	reshape long price, i(date) j(id)

Now the resulting file looks like this:

III. Add labels

Now I add the labels for those ids by merging the Datastream.dta and the code.dta.

	sort id
	merge id using code.dta

You may check the result of the merge by tabulating the _merge variable. If all the records merged together, it should be 3.

Now it's ready for analysis. To convert the string date variable to date format, see Stata date format page.

This page was last updated on March 20, 2008