Visualizing the Transit Map of the Spread of an Infectious Disease

From CSclasswiki
Jump to: navigation, search

--Thiebaut 09:29, 25 January 2016 (EST)


Kyra Gan's Independent Study Page

--Kyra Gan 18:14, 17 April 2016 (EDT)


Publication

--Kyra Gan 20:49, 30 November 2016 (EST)

Gan, J., and D. Thiebaut, Data Visualization of Agent-Based Modeling of Virus Spread, INFOCOMP 2017, June 25-29, 2017, Venice, Italy.

Presentation

--Kyra Gan 15:36, 1 February 2017 (EST)

Gan, J. (2017, April). Data Visualization of Agent-Based Modeling of Virus Spread. Poster Accepted at National Conference on Undergraduate Research 2017, Memphis, TN.

Poster

Kyra poster.png


Here is the link to pdf version of the poster.


Poster related data and program can be found here


Abstract

--Kyra Gan 15:34, 18 April 2016 (EDT)
In this independent special study, I wrote a Java program to simulate how an infectious virus spreads in a closed campus by tracing each individual student. In my case, the campus is Smith College, and I have available data from the Registrar's office, providing the individual hourly schedule of all 2,624 students for a full semester. Using this program, I experimented with various infection probabilities (including both deterministic probability and stochastic probability), incubation length, length of contagious state, and simulation period, in days. With my simulation program, it is possible to experiment with vaccination, or quarantine, or both, to slow down the spread of the virus. If stochastic probability is used, it is possible to modify the noise of the normal distribution used to decide whether a student gets infected by an contagious student (noise amplifier). Scenarios where a portion of the student population is vaccinated are possible, and the percentage of vaccinated students is programmable. Similarly, my program allows one to set the probability of a vaccinated student being a virus carrier. The program also allows one to define a quarantine scenario, where classes larger than some threshold are cancelled. It is also possible to setup a quarantine for a given student residence. The program allows one to pick the identity of the first infected student, or to pick this person randomly. All these parameters are easily accessible in a Java file containing constants.
After a simulation run, a csv file is saved. This file is transformed into a DOT file, which is the language used by the GraphViz package. The dot file is then read by the Twopi visualization program, included in the GraphViz package. I wrote another program, called ToDot, that gives the user the freedom to choose among three tree styles. It is possible to control the scale with which tree edges are drawn, where the edge length is proportional to a number of days (Edge discount factor). The colors of the nodes and edges are also programmable. Once the DOT file is saved in our workspace, the following command is used at the terminal prompt to generate a tree.

twopi -o image.png -T png image.dot


where image should be replaced with the actual file name.

Problem Statement


Introduction


Very little is known about the way a virus, such as Ebola or the Measles, spreads in a closed population. Some preliminary work has been done to model the growth of the infected population. See Lujun Jian's page on the independent study she conducted on this problem. Jian's study concentrates on running many (50 or more) simulations of a population of 2400 students, assuming that 1 of them is infected at the beginning of the semester, and through interactions happening in class, and in houses, this student starts infecting other students, who then propagate the virus. The simulation uses the SIR model, and shows the various rate of growth of the infected population as a function of the probability of infection, p, as illustrated below. Some of Lujun's software is also available on her github repository.

DifferentPofInfection.png


Dominique Thiebaut enhanced the data visualization part of the graph by creating a similar population-growth graph that is interactive. The graph uses R and Shiny, which is an R package specialized in generating interactive data visualizations. A video of the Shiny app along with its code can be found here, and is replicated below, for completeness.


Open Problem


The purpose of the current research is to take the data generated by the Java simulator written by Lujun Jian (available here), and stored in a database, and to use it to show the transit map of the spread of the virus. Examples of transit maps can be found in this Wired article. The article shows one possible example of transit map (shown below), but others might be possible, or more informative.

TransitMapWired04012015.png


Questions of Interest


--Thiebaut 15:00, 1 February 2016 (EST) (Added --Thiebaut 14:32, 1 February 2016 (EST))
Information gathered from informal conversation with Profs. Sarah Moore and Rob. Dorit.

  • Quantity often of interest when viruses spread: contract tracing. Contact tracing is finding everyone who comes in direct contact with a sick patient.
  • Can the module be used to see effect of quarantine? If the simulation uses real data, we can figure out how a quarantine on some of the houses may affect spread.
  • Can the simulation help spot super infectors?, people who infect a much larger number of people than others.
  • Simulation, as opposed to a stochastic model, would allow one to generate historical data. Such data can be used to generate a tree of contacts. It would be interesting to see the variation of the tree for slow vs. fast infectivity.
  • Information of interest:
  • distribution of people infected by a given person
  • what percentage of a population needs to be immunized to prevent a virus to spread?

Research Plan


BookOfTrees.png
  • Become familiar with Lujun's software. Run it again. Improve it and, if possible, make it faster.
  • Research the literature to see what data visualization choices have been made to represent transit map. Study the new book by Manuel Lima: The Book of Trees and locate possible examples.
  • Implement several tree visualizations of the transit map from one experiment. Probably R is the language to adopt. Several packages are of interest: iGraph and gephi.
  • Update this wiki with
  • A more complete description of the problem at hand
  • A comprehensive list of resources (papers, graphs, software packages) relating to this problem.
  • One or more static tree visualization of the transit map for a disease spreading on campus.
  • A dynamic visualization of a transit map using Shiny.
  • A poster ready for presentation



Registration Data



This section is only visible to computers located at Smith College


Main Results

Matlab tree 1

  • The highest node represents the first infected student, and the second level nodes represent the direct recipients who have a relatively large amount of recipients, and so on. The lowest level of nodes represent those who do not have direct recipients.
  • Notice that this tree is not time stamped; it only shows the source-recipient relationship.
Treewholecampus0.1.jpg
  • This is the tree that I generated with MatLab (see specific code on weekly log page).
  • Simulation parameters: whole campus population, incubation period=2 days, contagious period=6 days, deterministic probability, no vaccination, no quarantine, probability of infection=0.1.

Matlab tree 2

Exampletree1V2.jpg
  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, no vaccination, no quarantine, probability of infection=0.01.
  • In the second graph, we first sort students' recipient size, then plot the number of these students' direct recipients
  • In the third graph, we plot the number of each student's direct recipients without sort.
  • In the fourth graph, we first count the number of students who have a direct recipient size that equals to n, then scale the hight of the bin to get a discrete probability distribution.
  • In the fifth graph, we first calculate the probability of a random student has a direct recipient size that is >=n, then we use log-log scale to plot the graph

Twopi trees

--Kyra Gan 18:56, 17 April 2016 (EDT)
In this section, I use Twopi, which is an engine from the graphViz package, to generate radial trees.

Tree1

Twopiv1.png
  • Simulation parameters: whole campus population, incubation period=2 days, contagious period=6 days, deterministic probability, no vaccination, no quarantine, probability of infection = 0.01.


  • This is the first tree that I generated with Twopi.
  • Notice that this is also not time stamped; it only shows the source-recipient relationship. In addition, the nodes are not adjusted to the sizes of their direct recipients.

Tree2 Radius Ratio

  • In this tree, we are testing the functionality of my program by setting the infection probability to 1.
  • The length of the edge is adjusted to the number of days, and the size of the node is radius ratio proportional to the size of the node's direct recipients.

FirstInKey82RadiusRatioProb1DE.png

  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, no vaccination, no quarantine, probability of infection=1.


  • I pick the first infected student to be 82 (so we have less randomness and thus can compare with the trees below)
  • The maroon bold line is the time scale; if we look closely in the original file (which can be found on the weekly log page), we can see an annotation of days in letters.
  • I use invisible nodes to scale the length of the edges
  • During the experiment, I discovered that Twopi does not plot trees in png format when the number of nodes excess some limit
  • To handle this problem, I let the length of one edge to represent multiple days (i.e. I create a constant variable called edge discount factor). Though I lose some time information this way, I can now generate a png tree over a long simulation period.
  • I also discovered that we can still use Twopi to generate svg images when the size of the nodes is big.
  • If a user would like to generate the svg version file, she/he only needs to set the edge discount factor to 1, and run the command (where image should be replaced by the actual file name)
twopi -o image.svg -T svg image.dot

Tree3 Area

FirstInKey82AreaProb1DE.png

  • Same as the previous tree but with a different rule for the node size.


  • This is one of the final versions of the trees that I generated
  • The length of the edges is adjusted to the number of days
  • I pick the first infected student to be 82
  • The sizes of the nodes are proportional to the sizes of the nodes' direct recipients.
  • Since it is sometimes hard to differentiate the sizes of the nodes, I use the following way to fix the problem
  • the lightest yellow represents a certain node has 1-5 recipients.
  • the bright yellow represents a certain node has 6-10 recipients.
  • gold represents a certain node has more than 10 recipients.

Tree4 Radius Log

FirstInKey82RadiusLogProb1DE.png

  • Same as the previous two trees but with a different rule for the node size.


  • This is one of the final versions of the trees that I generated
  • The length of the edges is adjusted to the number of days
  • I pick the first infected student to be 82
  • The sizes of the nodes are logarithm proportional to the sizes of the nodes' direct recipients. (radius=log(1+sizeOfRecipients)*0.3)


Tree5 Radius Ratio, infection prob=0.01

FirstInKey82RadiusRatioProb0.01DE.png

  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, no vaccination, no quarantine, probability of infection=0.01.
  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • This is the Radius Ratio version of the tree.
  • see more versions of this tree on the weekly log page


Tree6 Radius Ratio, infection prob=0.01, stochastic probability with volatility=1

FirstInKey82RadiusRatioProb0.01SE.png

  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, stochastic probability, no vaccination, no quarantine, probability of infection=0.01.


  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • This is the Radius Ratio version of the tree.
  • see more versions on the weekly log page
  • I implemented the most basic stochastic model; I randomize the probability of a certain student getting infected in a certain place at a certain time by adding a random variable that has mean 0 and standard deviation sigma; I do not allow to probability to be negative (if the probability becomes negative, nothing happens-> waiting time (or reject)).
  • From this result we can see that with stochastic, we can push the whole campus to be infected
  • In general, we are expecting that the stochastic trees generated with the same simulation parameters have much more volatile shapes (within the group)
  • see more experiments with stochastic on the weekly log page


tree7 Area, infection prob=0.01, vaccination =50% of the population, vaccineContagiousProb=0

FirstInKey82AreaProb0.01DEVaccine.png

  • The time scale for the graph above is 3 days per tick.
TimeScaleTree7.png


  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, vaccination = 50% of the population, after vaccination, we assume the person has 0 probability to be a virus carrier, no quarantine, probability of infection=0.01.


  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • Notice in this case though we used Area version of the tree, we don't have different shades for the nodes. This is because I reserved the color green to represent a vaccinated student who is also a virus carrier.
  • The code was set up in such a way that once vaccination is turned on, the different shades of yellow is gone.
  • We can see that with 50 percent of the population being vaccinated, the total number infected student has gone down dramatically.

tree8 Area, infection prob=0.01, vaccination =50% of the population, vaccineContagiousProb=0.001

FirstInKey82AreaProb0.01DEVaccineV2.png

  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, vaccination = 50% of the population, after vaccination, we assume the person has 0.001 probability to be a virus carrier, no quarantine, probability of infection=0.01.


  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • This is the Area version of the tree.
  • We can see that with 50 percent of the population being vaccinated, and the probability of a vaccinated person to be a virus carrier being 0.001, the total number of the infected student has gone down from tree7, and here are my reasons:
  • I am picking student to be infected randomly (while controlling the total number of students infected in a certain place at a certain time to fixed (determined by some functions in the probability class, i.e. I pre-calculate the number of student being infected and then pick recipients and assign source randomly )). Thus we are expecting each simulation (path) to be different.
  • I setup the ToDot class in such a way that if a student is vaccinated but is a virus carrier, that student is going to show up on graph in green; here we do not observe any green nodes, which means that all vaccinated students were not successful in being a virus carrier
  • Furthermore, when we run the simulation with infection probability=0.001, we get 1 infected people (the source) frequently, which can also be verified by LuJun's previous project. This explains why with 0.001 VaccineContagiousProbability we observe less students getting infected (i.e. sometimes 0.001 probability is too low to take in account, and this is only one single simulation).
  • For the logistics of the code:
  • I first generate the vaccinated ID list and store them in class containing constants before running any other simulations
  • I then partition students into healthyID group and healthy vaccinated ID group when creating the place hash table
  • see more details of the code on the program source page

Tree9 Area, infection prob=0.01, Quarantine=true (for more than 30 students in class), QuarantineHouse=false

FirstInKey82AreaProb0.01DEQ.png

  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, no vaccination, we stop the classes which have sizes larger equal to 30, but students can still walk around the house, eat meals and do homework with friends, probability of infection=0.01.


  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • This is the Area version of the tree.
  • We can see that compared with 50 percent of the population being vaccinated, the total number of the infected student has gone up, and here are my reasons:
  • When the house size is larger than 100 (which is very likely the situation of 82's house), 82 is going to infected some student at the beginning, and as more students inside the house getting infected, there will be more people in the house getting infected.
  • Then in a class of size 20, if there are 10 contagious students, infection probability becomes 0.1 and there will be one healthy student getting infected, and when that student goes back to her house, she is going to infect more people
  • If we pick someone has a smaller house size, we should expect way less people getting infected in the quarantine case (with QuarantineHouse=false)
  • For the logistics of the code:
  • The quarantine part is implemented in the updatePlaceHashTableGenerateInfections method under WrapperStatusV5 class.
  • see the specifics on the weekly log page and program source page

Tree10 Area, infection prob=0.01, Quarantine=true, QuarantineHouse=true




FirstInKey82AreaProb0.01DEQH.png


  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, no vaccination, we stop the classes and quarantine the houses which have sizes larger equal to 30, probability of infection=0.01.


  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • This is the Area version of the tree.
  • We can see that 82 is the only student that got infected, and this matches our expectation.
  • If 82 lives in a house that has less than 30 people, then the probability of any student getting infected in her house will be 0.01, and the number of people infected will be Math.round(someNumer<0.3)=0, and thus the virus will never spread.
  • If 82 lives in a house that has more than 30 people, then she will not able to move around the house and infecting other people.
  • since this a deterministic probability model, we will not see surprises.
  • For the logistics of the code:
  • same as above

Tree11 Radius Ratio, infection prob=0.005

FirstInKey82RRProb0.01DEQ.png

  • Simulation parameters: whole campus population, incubation period=6 days, contagious period=8 days, deterministic probability, no vaccination, no quarantine; probability of infection=0.005.


  • The length of the edges is adjusted to the number of days
  • The first infected student is 82
  • This is the Radius Ratio version of the tree.
  • We observe that only half of the people got infected in this simulation.
  • Note that since some nodes are so big that they cover part of the edges, it is hard to read accurately where these edges are coming from. If we change the style of the nodes to area, we might get a more readable graph.
  • For the logistics of the code:
  • see the specifics on the weekly log page and program source page

Time performance

  • Usually one simulation is completed within seconds; see more time performance records on the weekly log page
  • ToDot class will run according to the size of the output (but still, it only takes seconds to generate the dot file)
  • Generate Twopi trees can be time consuming, but one can expect the program to be completed within 20s.

Student 82's schedule

  • Here is student 82's class schedule of the week:
  • 82 lives in house number 1, which has a student population size 140
  • Wednesday 14:40 - 16:30, AINSWO 304 (this class contains 20 students, and 3 students are from house number 1)
  • Tu Th 15:00 - 16:30, SEELYE 202 (this class contains 9 students, and 2 students are from house number 1)
  • Monday 14:40 - 16:30, AINSWO 151 (this class contains 23 students, and 2 students are from house number 1)
  • M W 13:10 - 14:30, BOAT HOUSE (this class contains 5 students, and 82 is the only one who comes from house number 1)
  • M W F 9:00 - 9:50, BURTON 101 (this class contains 28 students, and 2 students are from house number 1)
  • M W F 10:00 - 10:50, BURTON 209 (this class contains 21 students, and 3 students are from house number 1)
  • Here is the code that generates the above statistics (the data file can be found somewhere on the weekly log page):
campusData.r
  • In addition to that, from Monday to Sunday, 82 has breakfast with everyone in the house from 8:00-8:30, lunch from 12:20-12:50, dinner from 17:30 to 18:30, and do homework with everyone in the house from 19:00-21:00.

Some synthesized trees just for fun

--Kyra Gan 22:01, 17 April 2016 (EDT)

  • Here is what we will get if we put together all the radius ratio trees from week 11 (see the weekly log page for more info)
FirstInKey82RadiusRatiosynthesize(small).png


  • Here is what we will get if we put together all the Area trees from week 11 (see the weekly log page for more info)
FirstInKey82AreaProb0.01synthesize(small).png

Meetings and To-Do Lists


Go to this page for the TO-DO list

Go to Weekly Log Page for more information

Go to this page for the weekly log page

How to run the program

Go to this Program Source Page
Go to this page for early stage program testing