Subhalo orbits in the Via Lactea II simulation

Dr. Michael Kuhlen (UC Berkeley)


The Via Lactea II (that's Latin'ish for "Milky Way") is a cosmological N-body simulation following the formation and evolution of the dark matter halo of galaxy like our own Milky Way.

The simulation consists of just over 1 billion N-body particles, tracing the dark matter density field in a statistical realization of the matter distribution in a ~100 million lightyear region of our universe. During the simulation the gravitational interactions between all particles are calculated, and the resulting forces are used to update the positions and velocities of each particle. As a result the simulation is able to follow the dynamical evolution of the dark matter density field over the full age of our universe. This kind of simulation is very expensive to run. We used the highly optimized parallel N-body code PKDGRAV2, which utilizes a Fast Multipole Method to (approximately, but with well controlled errors) solve the gravitational N-body problem in O(N) time. Nevertheless the simulation required the use of a supercomputer. It was run back in 2008 on ~3000 cores of the Jaguar supercomputer "Jaguar") at Oak Ridge National Lab. In total it cost about 1 million cores hours to complete, and produced ~20 TB of raw data (400 output snapshots of 50 GB each).

In the movie that I have embedded below, you can watch the evolution and formation of this one halo of interest. (It's better in HD on vimeo.) Only the central 2.4 million lightyears are shown, and only 1/100th of all (high resolution) particles are actually plotted. The visualization is shown in proper coordinates (as opposed to ones comoving with the expansion of the universe, which are actually used in the calculation), and thus you initially see the material expanding outwards with the "Hubble flow", before the mutual gravitational attraction of the overdensity overwhelms the expansion. As you can see, the evolution process is hierarchical: small object collapse first, and merge together to form ever larger objects, until at the end of the simulation a single halo of about the right mass to host our Milky Way galaxy has formed.

The remarkable result of this calculation (and others like it) is that the merging process appears to be incomplete. The building blocks are not fully disrupted during the merging, and many, many small self-bound "subhalos" remain and continue to orbit within the larger host halo. The orbits of these subunits is the topic of this notebook.

In [1]:
from IPython.display import VimeoVideo

In order to identify (sub)halos in the simulation, we ran the 6DFOF halo finder on a subset of 27 of all 400 outputs. This halo finder implements the Friends-Of-Friends of algorithm, but links particles both by proximity in configuration space as well as in velocity space. As a result it is able to identify the dense and tightly bound cores of subhalos, and even in regions where they have low density contrast (e.g. at pericenter passage). Centering on each of the subhalo cores we then determined spherically averaged density profiles, which were used to obtain estimates of the total mass ("tidal mass") of each subhalo.

Starting with the 20,000 most massive subhalos at z=4.56 (when the universe was 1.4 Gyr old), I linked subhalos to their descendants in the next simulation snapshot. This linking was done by identifying the subhalo with the largest degree of overlap in linked particles: this means that we get most-massive-progenitor tracks. If one halo merges with another, the track of the less massive one ends.

Each track is stored in an HDF5 file, which looks like this:

$ h5ls tracks.hdf5
age_of_universe          Dataset {361}
dumpid                   Dataset {361}
host_halo                Soft Link {/track19982}
redshift                 Dataset {361}
track00000               Group
track00001               Group
track00002               Group
track00003               Group
track19998               Group
track19999               Group

The contents of each track????? Group are:

$ h5ls tracks.hdf5/track00000
ID                       Dataset {361}
Mt                       Dataset {361}
RVmax                    Dataset {361}
Rt                       Dataset {361}
Vmax                     Dataset {361}
pos_com_code             Dataset {361, 3}
pos_com_rel              Dataset {361, 3}
pos_denmax_code          Dataset {361, 3}
pos_denmax_rel           Dataset {361, 3}
vel_com_code             Dataset {361, 3}
vel_com_rel              Dataset {361, 3}

Each dataset (ID, Mt, etc.) has 361 elements, corresponding to simulation outputs from z=4.56 to z=0 (today).

In a separate routine I have read in all this data and pre-processed it a bit. In particular I have determined the extrema of distance from the host halo center, giving a list of peri- and apocenters for each track, and the infall times and masses, i.e. the time at which they first fell into the host halo (if they did) and the mass they had at that time. Next I filtered the list to only include tracks that (a) did in fact cross over into the host halo and (b) have had at least one pericenter passage, meaning that we reject all halo tracks that aren't actual subhalo orbits. In total there are 8,534 such orbiting tracks, which I have stored in a list of dictionaries and saved to a python cPickle.

In [2]:
import cPickle
f = open('orbits.p','rb')
orbits = cPickle.load(f)

Each element of orbits is a dictionary, with the following keys.

In [3]:

Some of these fields are scalars, others arrays:

In [4]:
print orbits[0]['z_infall']
print orbits[0]['Nperi']
print orbits[0]['Dperi']  ## the units here are proper kpc.
print orbits[0]['cpos'].shape
[ 13.04389787   5.08774786  20.1829438 ]
(181, 3)

Here are some statistics of these orbits:

In [5]:
### Number of orbits
Norbits = len(orbits)
print "Number of tracks with at least one pericenter (i.e. orbits) = %d" % Norbits

### Number of pericenters
Nperi = np.array( map(lambda x: x['Nperi'], orbits))
print "Mean number of pericenter passages per orbiting track = %.3f" % Nperi.mean()

### Number of orbits with more than X pericenter passages
hist, bins = np.histogram(Nperi, bins=max(Nperi)-1)
cumhist = (Norbits - np.hstack((0,hist)).cumsum())
print "Number of tracks with more than 1, 5, 10 pericenter passages = %d, %d, %d" % tuple(cumhist[[1,5,10]])
### Number of orbits with more than X (Dperi < 10 kpc) pericenter passage
this_orbits = filter(lambda x: min(x['Dperi']) < 10.0, orbits)
this_Norbits = len(this_orbits)
this_Nperi = np.array( map(lambda x: x['Nperi'], this_orbits))
print "Number of tracks with at least one (D_peri < 10 kpc) pericenter passage = %d" % len(this_orbits)
del this_orbits    
this_hist, this_bins = np.histogram(this_Nperi, bins=max(this_Nperi)-1)
this_cumhist = (this_Norbits - np.hstack((0,this_hist)).cumsum())
print "Number of tracks with more than 1, 5, 10 (D_peri < 10 kpc) pericenter passages = %d, %d, %d" % tuple(this_cumhist[[1,5,10]])
Number of tracks with at least one pericenter (i.e. orbits) = 8534
Mean number of pericenter passages per orbiting track = 5.544
Number of tracks with more than 1, 5, 10 pericenter passages = 7725, 3299, 972
Number of tracks with at least one (D_peri < 10 kpc) pericenter passage = 3300
Number of tracks with more than 1, 5, 10 (D_peri < 10 kpc) pericenter passages = 3235, 2054, 686

Here is a scatter plot of the minimum pericenter distance (deepest penetration) vs. the infall redshift. The size and color of the symbols indicate the total mass lost by tidal stripping: the maximum mass the halo ever had minus the mass it has at z=0.

In [6]:
import matplotlib.pyplot as plt
from matplotlib import colors
import matplotlib.colorbar as cb

xvar = np.array( [o['z_infall'] for o in orbits] )
yvar = np.array( [np.min(o['Dperi']) for o in orbits] )
cvar = np.array( [(np.nanmax(o['Mt']) - o['Mt'][-1]) for o in orbits] ) ## mass lost

fig = plt.figure(1, figsize=(12.0,10.0))
ax = plt.axes([0.125,0.125,0.70,0.8])

ind = (xvar >= 0.0) & (yvar > 0.0) & (cvar > 0.0)
xvar = xvar[ind]
yvar = yvar[ind]
cvar = cvar[ind]

ind = cvar.argsort()
xvar = xvar[ind]
yvar = yvar[ind]
cvar = cvar[ind]

cmap = 'jet'
norm = colors.LogNorm(vmin=1e6, vmax=5e10)
size = norm(cvar) * 0.98 + 0.02
np.clip(size, 0.02, 1.0, out=size)
size *= 800.0
plt.scatter(xvar, yvar, c=cvar, cmap=cmap, norm=norm, marker='o', s=size, alpha=0.5)#, edgecolors='none')

plt.xlabel(r'$\rm infall \, redshift$', size=18)
plt.ylabel(r'$\rm min(D_{peri}) \; [kpc]$', size=18)
for t in ax.xaxis.get_major_ticks():
for t in ax.yaxis.get_major_ticks():
cax = plt.axes([0.9,0.125,0.025,0.8])
ticks = np.array([1e6, 1e7, 1e8, 1e9, 1e10])
tickn = [r'$10^{6}$',r'$10^{7}$',r'$10^{8}$',r'$10^{9}$',r'$10^{10}$']
cbar = cb.ColorbarBase(cax, cmap=cmap, norm=norm, ticks=ticks)
tickloc = cax.yaxis.get_ticklocs()
Nticks = len(ticks)
ticksize = 500.0 * ( norm(ticks) * 0.98 + 0.02 )
p = cax.scatter(np.zeros(Nticks)-1.2,tickloc, c=ticks, cmap=cmap, norm=norm, marker='o', s=ticksize, alpha=0.5)
for t in cax.get_yticklabels():

cax.text(0.5,1.02,r'$\rm mass \, lost \, [M_\odot]$', size=18, ha='center', va='bottom')

A couple of trends are visible: (1) The most mass is lost by halos that penetrate deepest. (2) A large fraction of the stripped mass comes from halos that fell in around z=2, roughly coincident with the last major merger the host halo experienced.

Next let's visualize some of these orbits in 3D, by looking at an animation of the orbits of 20 massive subhalos with more than 5 pericenter passages.

[The inspiration for this animation came from a couple of blog posts at the excellent Pythonic Perambulations. My implementation very closely follows what Jake presented here and here.]

In [7]:
## Select orbits with at least 10 pericenter passages.
this_orbits = [o for o in orbits if o['Nperi'] >= 5]

## Sort by infall mass, and select top 20.
this_orbits.sort(key=lambda x: x['Mt_infall'], reverse=True)
this_orbits = this_orbits[:20]
In [8]:
print 'Infall masses of these subhalos:'
print('\n'.join('  %2i: %.3e Msun' % (i+1,o['Mt_infall']) for i,o in enumerate(this_orbits)))
Infall masses of these subhalos:
   1: 2.735e+10 Msun
   2: 1.068e+10 Msun
   3: 9.476e+09 Msun
   4: 8.842e+09 Msun
   5: 8.257e+09 Msun
   6: 7.874e+09 Msun
   7: 7.075e+09 Msun
   8: 5.759e+09 Msun
   9: 5.170e+09 Msun
  10: 4.055e+09 Msun
  11: 3.961e+09 Msun
  12: 3.677e+09 Msun
  13: 3.669e+09 Msun
  14: 3.506e+09 Msun
  15: 3.137e+09 Msun
  16: 2.695e+09 Msun
  17: 2.669e+09 Msun
  18: 2.494e+09 Msun
  19: 2.017e+09 Msun
  20: 1.889e+09 Msun

Now we define some helper functions to assist with embedding the animation in this notebook. (Again, this is straight from here.)

In [9]:
from tempfile import NamedTemporaryFile

VIDEO_TAG = """<video controls>
 <source src="data:video/x-m4v;base64,{0}" type="video/mp4">
 Your browser does not support the video tag.

def anim_to_html(anim):
    if not hasattr(anim, '_encoded_video'):
        with NamedTemporaryFile(suffix='.mp4') as f:
  , fps=20, extra_args=['-vcodec', 'libx264'])
            video = open(, "rb").read()
        anim._encoded_video = video.encode("base64")
    return VIDEO_TAG.format(anim._encoded_video)

from IPython.display import HTML
def display_animation(anim):
    return HTML(anim_to_html(anim))

Next we calculate the animation. It is automatically converted (using ffmpeg on my machine) to an mp4 suitable for HTML5 embedding.

In [10]:
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import animation

Norbits = len(this_orbits)
Nsteps = this_orbits[0]['cpos'].shape[0]

# Set up figure & 3D axis for animation
fig = plt.figure(figsize=(10,8))
ax = fig.add_axes([0, 0, 1, 1], projection='3d')

# choose a different color for each trajectory
colors =, 1, Norbits))

# set up lines and points
lines = sum([ax.plot([], [], [], '-', c=c, alpha=0.5)
             for c in colors], [])
pts = sum([ax.plot([], [], [], 'o', c=c)
           for c in colors], [])

# prepare the axes limits
ax.set_xlim((-100, 100))
ax.set_ylim((-100, 100))
ax.set_zlim((-100, 100))

# set point-of-view: specified by (altitude degrees, azimuth degrees)
ax.view_init(30, 0)

# initialization function: plot the background of each frame
def init():
    for line, pt in zip(lines, pts):
        line.set_data([], [])

        pt.set_data([], [])
    return lines + pts

# animation function.  This will be called sequentially with the frame number
def animate(i):
    for line, pt, so in zip(lines, pts, this_orbits[:Norbits]):
        x, y, z = so['cpos'][:i].T
        line.set_data(x, y)

        pt.set_data(x[-1:], y[-1:])

    ax.view_init(30, 0.3 * i)
    return lines + pts

# instantiate the animator.
anim = animation.FuncAnimation(fig, animate, init_func=init,
                               repeat=False, frames=Nsteps+200, interval=30, blit=True)

# call our new function to display the animation