You will conduct a statistical study on a topic of your choice using Structural Equation Modeling. This task will require you to write a project proposal, acquire and analyze relevant data, present your results in poster form, and hand in a complete written manuscript report all of the details of your study and its findings. Your project must involve fitting a Sturctual Equation Model, but you do not need to have any latent variables, therefore, it may be a technically a Path Analysis. The project is an opportunity to show off what you’ve learned about fitting SEMs, creating causal diagrams, and communicating your processes. Ideally, you would create a manuscript that would be ready, or near ready, for submission to a journal for publication.

Group Formation

You have the choice to work in up to groups of three, of your choosing. You may also choose to work on your own, but a solo project will be relatively difficult to execute.


You should pose a problem that you find interesting and which may be addressed (at least in part) through the analysis of data using Structural Equation Modeling. Problems that can be readily addressed with SEM usually involve understanding the processes by which a set of variables effect each other in a causal system. But also, questions involving understanding how variables affect each other overtime, or how observations are linked together in some larger units or blocks, also can be addressed with SEM

The ShowNTell papers can give you a sense of the variety of questions that are answerable with SEM.

It might help to start by proposing a phenomenon/focal variable that you want to better understand, then sketch out a theoretical system of causes and effects. You might not be able to find data for all of these variables, but you will have a better sense of where you might look for data. In this sketch, you might also think about what indicators could comprise each variable if you were to specify it as a latent variable as opposed to a measured variable. Next, identify the population from which you would want a sample, and think about how you will obtain relevant data. The sketch of your model is the set of a priori hypotheses that you will test.

Most of you will decide on your own phenomenon and acquire data from the Internet, some may wish to analyze data that someone else (e.g., a professor or office at Smith, published data in a magazine or newspaper) has collected for another purpose.

General Rules

You may discuss your project with other students, but each of you will have a different topic, so there is a limit to how much you can help each other. You will also consult other sources for information about the theoretical, substantive issues of your problem, but you should properly cite these sources in your report. Feel free to consult with me about statistical questions.


Please see the syllabus for due dates.


All deliverables described above must be delivered electronically via Moodle by 11:55pm (five minutes before midnight) on the dates in the schedule. Only one person from the group should submit the group’s product for each checkpoint.


Places to Find Data

Finding the right data to answer your particular question is part of your responsibility for this assignment. Public data sets are available from hundreds of different websites, on virtually any topic. You might not be able to find the exact data that you want, but you should be able to find data that is relevant to your topic. You may also want to refine your research question so that it can be more clearly addressed by the data that you found. But be creative! Go find the data that you want!

Below is a list of places to get started, but this list should be considered grossly non-exhaustive:

Keep the following in mind as you select your topic and dataset:

  • You need to have enough data to make meaningful inferences. There is no magic number of individuals required for all projects. But aim for at least 200 cases.


Your initial proposals should be a Markdown file (*.Rmd) containing the following content:

  1. Group Members: List the members of your group

  2. Title: Your title

  3. Purpose: Describe the general topic/phenomenon you want to study, as well some focused questions that you hope to answer and general hypotheses that you intend to assess.

  4. Data: Describe the data that you plan to use, with specifications of where it can be found (URL) and a short description. Eventually, you will probably want to combine data from multiple sources into one file. Your data does not need to be completely cleaned by this stage, but for now you can simply list multiple sources if you have them.

  5. Population: Specify what the observational units are (i.e. the rows of the data frame), describe the larger population/phenomenon to which you’ll try to generalize.

  6. Model: Include a diagram of the initial model you would like to fit. You might not have every variable you need to test this model, but ideally, you would have most of them. What are the endogenous variables? What are the exogenous variables? Which variables are latent and which are measured? What are all of the indicators for the latent variables? Define each variable and describe how it is measured if you already have it in your data. For categorical variables, list the possible categories; for quantitative variables, specify the units of measurement. You may want to add more variables later on or change your model, that is even expected.