Christopher Dishop
/
Recent content on Christopher DishopHugo -- gohugo.ioen-USTue, 15 Jan 2019 00:00:00 +0000Recommended Reading
/rec_reading/
Mon, 01 Jan 0001 00:00:00 +0000/rec_reading/Quantifying Life: A Symbiosis of Computation, Mathematics, and Biology
Dmitry A. Kondrashov
Click here for link
How We Know What Isn’t So: The Fallibility of Human Reason in Everyday Life
Thomas Gilovich
Click here for link
Complexity: A Guided Tour
Melanie Mitchell
Click here for link
The Drunkard’s Walk: How Randomness Rules Our Lives
Leonard Mlodinow
Click here for link
The Order Of Time
Carlo Rovelli
Click here for linkTurning Unequal Dates into Days
/computational_notes/uneven_time/
Tue, 15 Jan 2019 00:00:00 +0000/computational_notes/uneven_time/Longitudinal data of a group or team often have missing days. For example, only Bob reports a stress score on January 3rd even though Joe and Sam are also part of the sample.
## id date stress ## 1 bob 2019-01-01 4 ## 2 joe 2019-01-01 5 ## 3 sam 2019-01-01 6 ## 4 bob 2019-01-02 6 ## 5 joe 2019-01-02 5 ## 6 bob 2019-01-03 4 ## 7 bob 2019-01-04 5 ## 8 joe 2019-01-04 6 ## 9 sam 2019-01-04 7 We want to create an additional column called “day” and use integers rather than dates to make plotting easier/prettier.Generating Time in a Data Frame
/computational_notes/time_index/
Mon, 14 Jan 2019 00:00:00 +0000/computational_notes/time_index/There are two code variations I use to generate time indexes. If I need time cycles
## id score time ## 1 a 15.62265 1 ## 2 b 21.73583 2 ## 3 c 13.21569 3 ## 4 a 13.96619 1 ## 5 b 16.67478 2 ## 6 c 18.14021 3 ## 7 a 22.91621 1 ## 8 b 15.42282 2 ## 9 c 24.37991 3 then I use a sequence command.Only Store Successful Output - A Counter Placement Issue
/computational_notes/successful_output/
Sun, 13 Jan 2019 00:00:00 +0000/computational_notes/successful_output/Sometimes I store every result in my initialized vector/matrix.
Here is the data.
## people values day ## 1 john 8.125192 1 ## 2 teddy 10.624786 1 ## 3 clare 9.755946 1 ## 4 john 8.320525 2 ## 5 teddy 8.758530 2 ## 6 john 9.597217 3 ## 7 teddy 10.947977 3 ## 8 clare 9.416608 3 Now the code. I want to find the days where I have responses from John, Teddy, and Clare (as you can tell, I only have responses from all three of them on days 1 and 3).Row Labels Needed to Spread
/computational_notes/row_labels_spread/
Fri, 11 Jan 2019 00:00:00 +0000/computational_notes/row_labels_spread/No explanation for this set of notes, just a few reminders when spreading and gathering.
## b_partial b_wo_partial se_partial se_wo_partial ## 1 1 4 6 3 ## 2 2 5 7 2 ## 3 3 6 8 1 We want the columns to be “model,” “result,” and “value.”
Here is my incorrect attempt.
cd_try <- cd_try %>% gather(b_partial, b_wo_partial, key = 'model', value = 'b1') cd_try ## se_partial se_wo_partial model b1 ## 1 6 3 b_partial 1 ## 2 7 2 b_partial 2 ## 3 8 1 b_partial 3 ## 4 6 3 b_wo_partial 4 ## 5 7 2 b_wo_partial 5 ## 6 8 1 b_wo_partial 6 cd_try <- cd_try %>% gather(se_partial, se_wo_partial, key = 'se_model', value = 'sd') cd_try # not evaluated because it won't work Instead, I need to gather everything in at the same time, split, and then spread.Reveal Hidden NA's in Longitudinal Data
/computational_notes/reveal_na/
Thu, 10 Jan 2019 00:00:00 +0000/computational_notes/reveal_na/Longitudinal data sets often have hidden NAs when they are in long-form. For example, in the data set below Zoe is missing on days 2 and 4, but it isn’t obvious because there are no specific “NA’s” within the data.
## time id q1 q2 ## 1 1 Jac 4 3 ## 2 1 Jess 5 2 ## 3 1 Zoe 3 4 ## 4 2 Jac 6 1 ## 5 2 Jess 7 2 ## 6 3 Jac 5 3 ## 7 3 Jess 4 4 ## 8 3 Zoe 3 2 ## 9 4 Jac 4 3 ## 10 4 Jess 5 4 Usually I recommend cleaning within the tidyverse package, but in this case I prefer reshape.Column Names As Parameters with GGplot2
/computational_notes/ggplot_column_parameters/
Sun, 06 Jan 2019 00:00:00 +0000/computational_notes/ggplot_column_parameters/Another example of using column names as parameters with quo, this time within ggplot2. A snippet of the data:
## day id stress performance ## 1 1 Josh 9 18 ## 2 2 Josh 5 7 ## 3 3 Josh 6 7 ## 4 4 Josh 5 6 ## 5 5 Josh 4 11 ## 6 6 Josh 4 15 Let’s say we want to plot each person’s stress over time: three time-series trajectories.The Premature Covariate
/computational_notes/premature_covariate/
Sun, 06 Jan 2019 00:00:00 +0000/computational_notes/premature_covariate/A replication of Patricia Cohen’s wonderful, “problem of the premature covariate” (chapter 2 in Collins & Horn, 1991). Here is a simple version of the problem. Imagine that we want to know the influence of a life event, like meeting a friend, on happiness. We conduct a study where we measure people’s happiness at time one, wait two weeks, and then measure their happiness again along with whether or not they met a friend since we last observed them.Mutating Scale Items with NA
/computational_notes/mutate_na/
Sat, 05 Jan 2019 00:00:00 +0000/computational_notes/mutate_na/Creating item totals with a data set containing NAs is surprisingly difficult. Here is the data.
library(tidyverse) cd <- data.frame( "q1" = c(1,2,NA), "q2" = c(2,2,2), 'q3' = c(NA, NA,2), 'id' = c('201', '202', '203') ) cd ## q1 q2 q3 id ## 1 1 2 NA 201 ## 2 2 2 NA 202 ## 3 NA 2 2 203 Mutating directly over columns with NA does not work.
cd %>% mutate(cohesion = q1 + q2 + q3) ## q1 q2 q3 id cohesion ## 1 1 2 NA 201 NA ## 2 2 2 NA 202 NA ## 3 NA 2 2 203 NA Filtering removes the data we are interested in.Be Careful With Characters and Matrices
/computational_notes/matrix_characters/
Tue, 01 Jan 2019 00:00:00 +0000/computational_notes/matrix_characters/If you fill a matrix cell with a character, R will convert the entire matrix into character values…so be careful = )
time <- c(1:4) numbers <- c(1:4) characters <- c('a', 'b', 'c', 'd') count <- 0 df_mat <- matrix(, ncol = 3, nrow = length(time)) for(i in 1:length(time)){ count <- count + 1 df_mat[count, 1] <- time[i] df_mat[count, 2] <- numbers[i] df_mat[count, 3] <- characters[i] } df_mat ## [,1] [,2] [,3] ## [1,] "1" "1" "a" ## [2,] "2" "2" "b" ## [3,] "3" "3" "c" ## [4,] "4" "4" "d" Notice that all cells are now characters.More on Column Names as Parameters
/computational_notes/col_names_parameters_quo/
Thu, 06 Sep 2018 00:00:00 +0000/computational_notes/col_names_parameters_quo/Use quo or enquo when you want to include column names as parameters in a function. For example, a function like the following would not work:
bad_function <- function(data, col_name){ newdf <- data %>% mutate('adjusted_column' = col_name + 1) return(newdf) } bad_function(df, column_i_care_about) because column_i_care_about isn’t specified in a form that mutate can work with.
Examples The data are contained in df1.
df1 <- data.frame( a = c(1,2,NA), b = c(NA,3,4) ) df1 ## a b ## 1 1 NA ## 2 2 3 ## 3 NA 4 The function: take the column specified by the parameter and add one to every value.Monte Carlo Approximation
/computational_notes/mc_approximation/
Sun, 12 Aug 2018 00:00:00 +0000/computational_notes/mc_approximation/Monte Carlo helps us understand processes that we can describe but don’t yet have analytic solutions for. Here are two examples: the birthday problem and the tasting tea problem.
Birthday Problem If you are standing in a room with 25 other people, what is the probability that at least two people share the same birthday? This question has a mathematical solution, but if we don’t know it we can use Monte Carlo to help.Longitudinal Plotting
/computational_notes/longitudinal_plotting/
Wed, 04 Jul 2018 00:00:00 +0000/computational_notes/longitudinal_plotting/A few random notes about plotting, describing, and thinking about trajectories.
Plotting Trajectories Imagine we record “affect” (\(Y\)) for five people over 20 time points. ggplot2 produces poor longitudinal trajectories if you only specify time and affect as variables:
library(ggplot2) library(tidyverse) plot1 <- ggplot(df1, aes(x = time, y = affect)) + geom_point() + geom_line() plot1 Instead, specify “id” either as the grouping variable:
plot2 <- ggplot(df1, aes(x = time, y = affect, group = id)) + geom_point() + geom_line() plot2 or a color.Column Names As Parameters
/computational_notes/column_name_parameters/
Sat, 02 Jun 2018 00:00:00 +0000/computational_notes/column_name_parameters/I always forget how to use column names as function parameters, so here is an example.
Function with no column name parameters Function:
Select columns
Replace the Jimmy and James ‘v_1’ values with 99
library(tidyverse) dish <- data.frame( 'person' = c('jimmy', 'james', 'johnny'), 'v_1' = c(rnorm(3, 0, 1)), 'v_2' = c(rnorm(3, 10, 5)), 'v_3' = c(rnorm(3, 50, 10)), 'v_4' = c(rnorm(3, 25, 15)) ) mini <- dish %>% select(person, v_1, v_2) mini[mini$person == 'jimmy', 2] <- 99 mini[mini$person == 'james', 2] <- 99 The original data:Spline Modeling
/computational_notes/spline/
Sat, 05 May 2018 00:00:00 +0000/computational_notes/spline/A few spline models (also known as piecewise models). As in previous posts, ‘affect’ is the name given to values of \(y\) throughout.
1) Growth and Even More Growth A model that captures a process that increases initially and then increases at an even greater rate once it reaches time point 5. The data generating process:
\[\begin{equation} y_{it} = \begin{cases} 4 + 0.3t + error_{t}, & \text{if time < 5}\\ 8 + 0.Latent Growth Curves
/computational_notes/latent_growth/
Sun, 15 Apr 2018 00:00:00 +0000/computational_notes/latent_growth/Latent Growth Curves I will progress through three models: linear, quadratic growth, and latent basis. In every example I use a sample of 400, 6 time points, and ‘affect’ as the variable of interest.
Don’t forget that multiplying by time
\(0.6t\) is different from describing over time
\(0.6_t\). 1) Linear The data generating process:
\[\begin{equation} y_{it} = 4 - 0.6t + e_{t} \end{equation}\]
library(tidyverse) library(ggplot2) library(MASS) N <- 400 time <- 6 intercept <- 4 linear_growth <- -0.Social Trait Development Computational Model
/computational_notes/social_trait_comp_model/
Fri, 30 Mar 2018 00:00:00 +0000/computational_notes/social_trait_comp_model/I built the following simple computational model for an individual differences class in the Spring of 2018 to demonstrate how to incorporate explantory elements for trait development into a computational framework. This model assumes that an individual’s trait development depends on 1) the environment and 2) interactions with others inside and outside of the individual’s social group. Moreover, the model assumes traits are somewhat stable and exhibit self-similarity across time. The main properties I am trying to capture, therefore, include:Numerical Integration and Optimization
/computational_notes/integration_optimization/
Fri, 16 Feb 2018 00:00:00 +0000/computational_notes/integration_optimization/Integration Trapezoid Rule
To find the area under a curve we can generate a sequence of trapezoids that follow the rules of the curve (i.e., the data generating function for the curve) along the \(x\)-axis and then add all of the trapezoids together. To create a trapezoid we use the following equation:
let \(w\) equal the width of the trapezoid (along the \(x\)-axis), then
Area = (\(w/2\) * \(f(x_i)\)) + \(f(x_i+1)\) for a single trapezoid.Random Walks
/computational_notes/random_walks/
Thu, 11 Jan 2018 00:00:00 +0000/computational_notes/random_walks/Some random walk fun. I use 400 steps in each example.
One-Dimensional Random Walk A random walk using a recursive equation.
# Empty vector to store the walk rw_1 <- numeric(400) # Initial value rw_1[1] <- 7 # The Random Walk equation in a for-loop for(i in 2:400){ rw_1[i] <- 1*rw_1[i - 1] + rnorm(1,0,2) } plot(rw_1) A random walk using R’s “cumsum” command. Here, I will generate a vector of randomly selected 1’s and -1’s.Combining CSV Files
/computational_notes/load_csv/
Wed, 03 Jan 2018 00:00:00 +0000/computational_notes/load_csv/A couple quick pieces of code to assist any time I need to work with many CSV files.
Into List This first code chunk loads all of the CSV files in a folder, makes each into data frame, and stores each separately in a list.
setwd("enter path") # A character vector of every file name files <- Sys.glob("*.csv") # A list of all CSV files in the respective folder as data.Formatting Qualtrics Responses
/computational_notes/formatting_qualtrics/
Tue, 02 Jan 2018 00:00:00 +0000/computational_notes/formatting_qualtrics/Here is a quick piece of code to create numeric response scores when data are read in as strings (e.g., “Strongly Agree, Agree, Neutral”).
library(tidyverse) library(dplyr) library(plyr) df <- read.csv("path") labels_to_values1 <- function(x){ mapvalues(x, from = c("Strongly Agree", "Agree", "Slightly Agree", "Slightly Disagree", "Disagree", "Strongly Disagree"), to = c(6,5,4,3,2,1)) } recode_df <- df %>% select(column_to_modify1, column_to_modify2, column_to_modify2, etc) %>% apply(2, FUN = labels_to_values1) %>% data.frame() Note that R will throw you warnings if all of the response options are not used, but the code will still work.Why Detecting Interactions is Easier in the Lab
/computational_notes/interactions_fve/
Wed, 15 Nov 2017 00:00:00 +0000/computational_notes/interactions_fve/A fun simulation by McClelland and Judd (1993) in Psychological Bulletin that demonstrates why detecting interactions outside the lab (i.e., in field studies) is difficult. In experiments, scores on the independent variables are located at the extremes of their respective distributions because we manipulate conditions. The distribution of scores across all of the independent variables in field studies, conversely, is typically assumed to be normal. By creating “extreme groups” in experiments, therefore, it becomes easier to detect interactions.Workforce Dynamics
/computational_notes/role_dynamics/
Tue, 22 Aug 2017 00:00:00 +0000/computational_notes/role_dynamics/We can model the states of a system by applying a transition matrix to values represented in an initial distribution and repeating it until we reach an equilibrium.
Suppose we want to model how job roles in a given company change over time. Let us assume the following:
There are three (hierarchical) positions in the company:
Analyst
Project Coordinator
Manager
30 new workers enter the company each year, and they all begin as analystsConvert Text File
/computational_notes/convert_text/
Sun, 09 Apr 2017 00:00:00 +0000/computational_notes/convert_text/A quick piece of code that reads a text file, changes something, saves a new text file, and iterates that process for every text file in that folder.
setwd("path to the text files") library(readr) all_files = Sys.glob("*.txt") for(i in 1:length(all_files)){ data = all_files[i] mystring = read_file(paste(data)) new_data = gsub("old piece of text", "new piece of text", mystring) write_file(new_data, path = paste("something", code, ".txt", sep = "") } Bo\(^2\)m =)Art With Monte Carlo
/computational_notes/art_montecarlo/
Sat, 18 Mar 2017 00:00:00 +0000/computational_notes/art_montecarlo/I like to think of Monte Carlo as a counting method. If a condition is satisfied we make a note (e.g., 1), and if the condition is not satisfied we make a different note (e.g., 0). We then iterate and evaluate the pattern of 1’s and 0’s to learn about our process. Art can be described in a similar way: if a condition is satisfied we use a color, and if a condition is not satisfied we use a different color.The Binomial Effect Size Display
/computational_notes/besd/
Sun, 01 Jan 2017 00:00:00 +0000/computational_notes/besd/Effect sizes provide information about the magnitude of an effect. Unfortunately, they can be difficult to interpret or appear “small” to anyone unfamiliar with the typical effect sizes in a given research field. Rosenthal and Rubin (1992) provide an intuitive effect size, called the Binomial Effect Size Display, that captures the change in success rate due to a treatment.
The calculation is simple:
Treamtment BESD = 0.50 + (r / 2)