Building a Job Lanscape from Directional Transition Data Marc Scott

advertisement
Manifold Learning and Its Applications: Papers from the AAAI Fall Symposium (FS-10-06)
Building a Job Lanscape from Directional Transition Data
Dominique Perrault-Joncas and Marina Meilă
Marc Scott
Department of Statistics
University of Washington
Department of Humanities
and Social Sciences
New York University
robust to sampling fluctuations, but that some of the coordinates are related to significant demographic variables like
gender, wages, and time/age.
In doing this, our goal is to provide a robust instrument
for visualizing data on careers that will allow social scientists to uncover trajectory differences between different demographic groups, or groups with different levels of education. For example, do all workers, regardless of education,
begin their careers fairly similarly? What distinguishes the
careers of economic winners and losers, in terms of the timing and structure of their traversals? The job embedding will
allow scientists to answer questions such as these.
In the following sections, we present the data used for the
study (Section 2), we show how we built the jobs landscape
(Section 3) and we examine the landscape’s robustness to
sampling noise by using the boostrap method (Section 4).
We then use the jobs landscape to visualize various characteristics of the job market (Section 5), and the evolution in
time of individuals with varying education levels (Section
6). We also compare our embedding to other embedding
methods from the literature (Section 7). A discussion (Section 8) concludes the paper.
Abstract
The analysis of career paths suffers from a lack of exploratory
tools and dynamic models, due in part to the inherent high dimensionality of the problem. Paths may be understood as directed traversals through a graph whose nodes consist of “job
types”, which we define as industry and occupation pairs.
We want to develop tools to understand and detect high-level
features of both the labor market and the workers moving
through it – career dynamics. To do this, we map the discrete
space of jobs into a d-dimensional continuous space; proximity between jobs will mean that they are “close” to each other
in a non-negligible subset of career paths. This embedding
allows one to visualize the job landscape. Moreover, we can
map individual or groups of career paths to this space, extract
features of their collective structure, and construct statistical
tests comparing groups by means of this mapping.
1 Introduction
At the origin of this work is an analysis of career mobility
using data from the National Longitudinal Survey of Youth
(NLSY), a study that followed several thousand men and
women in the U.S. from 1979 until recently. The participants
were aged 14-21 at the beginning of the study, and constititute a representative sample from the U.S. population. The
NLSY is often used to better understand the forces and factors that influence a person’s career path in its early stages.
Thus, the data contains each individual’s work history from
age 20 to 36, with job status recorded quarterly, for a total
of 64 job tokens per individual. We call this a career.
One immediately recognizes a fundamental challenge
with data of this nature. Jobs - a set of nominal states described in the next section - have no natural ordering or
structure, and the number of states is potentially very large.
Thus, comparisons of individuals’ career paths is limited to
methods for comparing sequences over discrete alphabets.
See (Abbott 1995) and (Durbin et al. 1998).
Our methodology utilizes information in the transitions
between job types to construct a Euclidean space in which
our fundamental unit, the job type, will reside. This embedding method is derived from the WC UT algorithm of (Meilă
and Pentney 2007), a spectral algorithm used for clustering
in directed graphs. We show that this embedding is not only
2 The Career Data
As noted in section 1, when we refer to a career, we mean
a sequence of job tokens of the form (occupation, industry). We use 25 unique industry and 20 unique occupation
codes from the larger set of 3-digit 1970 Census Classification codes. The details of the jobs aggregation as well as
other data cleaning operations are described in (Scott forthcoming). All career sequences have length 64 (quarterly job
state over 16 years). In the 16-year age span studied, approximately 450 unique industry and occupation pairings (hereafter referred to as IxOs) occur. The sample size is 7,816
individual careers. We reweight the sample so that it is consistent with the demographics of the original baseline sample. The weights reflecting this are called the population
weights.
In the next section, we build our embedding based only
on the discrete career data described above. After the embedding is constructed, we examine the mapping of various
available demographic data into this manifold.
Demographic variables used in this study include selfreported race/ethnicity (Black,Hispanic,Non-Black/Non-
c 2010, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
36
ply not looking for work. This was done for two reasons:
(1) These tokens can dominate the transition count matrix
even though they provide limited (or non-content) information; (2) These states are often seasonal and hence contain
little information about career progression;
To avoid losing too much information by removing the
transitions to and from the removed states, it was decided to
use a step-over approach. Specifically, if we observe the sequence j → x → x . . . x → i where x is a state to remove,
the transition j → i will be recorded. This retains the continuity in the sequence and recognizes the fact that j and i are
related (through x).
Once low-frequency jobs and non-content-specific states
are removed, we are left with n = 356 jobs out of the original total of 457.
Hispanic), sex, age and education. While education changes
over time, we simplify its inclusion by dichotomizing workers into those who complete at least a two-year degree by
age 24 and those who do not. Age 24 was chosen because it
is a year or two past the traditional timeline for completing
undergraduate education. Workers who have not completed
a two-year degree by this point face very different prospects
in the labor market. Wages are hourly and inflation-adjusted
to reflect 2008 dollars.
3 The job landscape – embedding the career
data in d-dimensional space
Our first goal is to map the jobs to a d-dimensional space, in
a way that renders closest the jobs between which frequent
transitions are observed. In order to do so, we compute the
affinity matrix A from the original data, where Aij represents the number of times a transition from job i to job j is
observed. Here the workers’ population weights are used in
weighting each transition so that the resulting transtion matrix is consistent with the demographics of the original baseline sample. This asymmetric affinity matrix, Aij = Aji ,
will be used to produce the embedding.
It is worth noting explicitly that our map creation process
collapses information about the timing of the transitions between jobs; as such, transitions that occur early in individuals’ careers are indistinguishable from transitions that occur
later. While it may be interesting to allow the job market to
change over time, or even to examine early transitions separately from later transitions, we do not take this approach in
our paper.
3.1
3.2 Embedding by the Weighted Cuts method
We now have an n × n affinity matrix A, that contains transition counts between the n = 356 retained tokens, obtained
as described in the previous section.
Mapping the tokens to a d dimensional space is done via
the Weighted Cut (WC UT) algorithm introduced by (Meilă
and Pentney 2007), which we briefly describe here. The input data is the matrix of affinities A and a user-specified dimension d. For the purposes of visualization, we will use
d = 3 or d = 4; for other purposes, such as computing
statistics from the embedding, we extend to d = 10. It is
worth noting that since the WC UT algorithm is based on
eigenvalue decomposition, selecting the number of dimensions is not a critical operation. Indeed, every dimension is
computed independently of the others, so adding or removing dimensions does not affect the rest of the embedding.
Preprocessing the affinity matrix
In preliminary embedding experiments, we observed that
embeddings suffer from outliers: jobs that are rarely observed. To improve the embedding, we removed all jobs observed fewer than 5 times (according to population weights),
reasoning that so few observations prevent us from making
any significant observations about these jobs. We verified
experimentally that the embedding of the high and medium
frequency jobs is not sensitive to the exact cutoff value used
to eliminate outliers.
The second operation performed on the affinity matrix before embedding was to remove the diagonal by setting Aii
to zero. The diagonal element Aii counts the transitions of
i into itself (i.e the times job i was not changed for another
job during a quarter). These counts dominate the data, A’s
diagonal totalling about 8 times more than its off-diagonal
elements. The elements Aii tell us very little about the job
in relation to others, and also surprisingly little about the job
itself. On average, workers remain at their jobs for about two
years regardless of the particular job. Hence, the embedding
of the data will only consider the transitions between jobs,
and not how long one stays in the same job before transitioning. This is akin to considering the data in the framework of
a Semi-Markov process where only the Markov component
is of interest.
The third step is to remove the data that represents time
spent unemployed, in school, in vocational training, or sim-
Algorithm WC UT
1. Input A, d
2. Calculate the degrees Di =
trix D = diag{Di }
P
j
Aji and form the diagonal ma-
3. Calculate the d largest eigenvalues λ1:d of the matrix
T
D−1/2 A+A
D−1/2 and the corresponding eigenvectors
2
(1)
(d)
y . . . y . Let Y = [y (1) . . . y (d) ].
4. Compute X = D−1/2 Y , X = [x(1) . . . x(d) ].
5. Map every job i = 1, 2, . . . n to the d-dimensional point
(1)
(d)
(xi . . . xi )
Figure 1: The WC UT embedding algorithm.
The idea behind this method is derived from clustering.
If one wanted to separate the data into d clusters, a “good”
clustering would satisfy two conditions: (1) put tokens with
high affinity in the same cluster, and (2) keep the cluster
sizes balanced. In (Meilă and Pentney 2007) it is proved that
both conditions can be optimized by mapping the data into
the principal subspace of a symmetric matrix obtained from
A (which is asymmetric in general). Thus, the WC UT embedding method is reminiscent of Principal Components
37
H(B ):
Analysis (PCA), and could loosely be thought of as PCA
for directed graph data.
Although our goal is not clustering but embedding, this
method will be satisfactory because it will pull toghether in
space tokens that have high affinity. Meanwhile, the “balance” requirement will attempt to spread the tokens about
evenly, instead of collapsing them all toghether.
In (Meilă and Pentney 2007) the user is allowed to
choose positive token weights by which the balancing will
be judged. For our study, we chose the weight of job i to
be equal to the number of transitions into i, that is j Aji .
Other possible alternatives are to have equal weights, or to
have weights equal to the row sums of A, which represent
the number of transitions out of i. Our choice is preferred
because the resulting embedding has fewer outliers (practically none) and is more robust. In addition, the frequency
of transitions into a job can measure the relative desirability
of that job, just like in other domains the number of links
to a page, or the number of times a paper is cited measure
its authority and therefore its “weight” more accurately than
the outgoing links.
In the career landscape, there are jobs that function as
sources (i.e. have many ougoing transitions) and, as such,
are more typical of the early career stage. Examples include
many clerical jobs, retail sales, and food services. When
these jobs serve only as a source, we would not want our
configuration of the career landscape to reflect them to any
great extent. However, some of the same jobs that are common in the early career are also common to low-wage careers; to the extent that workers are making transitions into
such jobs, we want them reflected in the embedding through
the weights (long-term careers in retail and waiter/waitress
jobs are potential low-wage examples).
3.3
H(B ) = D−1/2 (D − H(A))D−1/2 ,
(1)
with D = diag(A 1).
The perturbed outdegree matrix D can be decomposed
as
D = diag(H(A)1) + diag(AH(A)1) ≡ S + C , (2)
where both S and C are diagonal. This means that we can
use a Taylor series to the diagonal elements of D to obtain
−1/2
D
:
(3)
D−1/2 = S −1/2 (I − S −3/2 C) + o() .
2
Substituting (2) and (3) into 1 gives:
H(B ) = H(B0 )
−1
S −1 C
S C
−1
+
H(B0 ) + H(B0 )
+ S C + o()
2
2
≡ H (0) + H (1) + o() .
(4)
With this definition, both H (0) and its perturbation H (1) are
hermitian. This means that we can use regular perturbation
theory in obtaining the first order effect of the directional
perturbation AH(A ).
We assume that we know the eigenvectors of the MNC UT for the undirected graph H(A):
(0)
H (0) yi
(0) (0)
=
λi yi , i = 1, ..., n ,
(5)
(0)t (0)
where yi yj = δi,j when i = j. Meanwhile, the eigenproblem H(B )yi = λi yi is assumed to have the expansion:
Perturbation analysis and the first coordinate
λi
The WC UT being an extension of the Multiway Normalized Cut (MNC UT) algorithm (Shi and Malik 2000;
Meilă and Shi 2001), the MNC UT can serve as a good starting point to build some intution about the WC UT. One interesting approach is to focus on how the two embeddings
differ. Doing this gives some insight into how WC UT embeds the asymmetric information of the graph, which is absent in the purely symmetric MNC UT. This is an important
aspect of embedding the job market data as this asymmetry is the only information that pertains to the natural career
progression.
To determine how the WC UT differs from the MNC UT,
we consider the directed graph represented by the affinity
matrix A as pertubation from the undirected graph (A +
At )/2. That is, we decompose A into its hermitian H(A) =
(A + At )/2 and anti-hermitian AH(A) = (A − At )/2 components and assume that the anti-hermitian component is a
small perturbation to the undirected graph. To make this assumption explicit, we define A = H(A) + AH(A) and
we we consider how the WC UT embeds this anti-hermitian
component, i.e. the directional perturbation of the undirected graph described by H(A ) ≡ H(A).
Going back to the definition of the WC UT algorithm
1, we are interested in the eigenvalues and eigenvectors of
yi
(0)
=
λi
=
(0)
yi
(1)
+ λi
+
(1)
yi
(1)
+ o() ,
+ o() .
(0)
Expanding yi in terms of the yj ’s gives the standard first
order perturbation:
t
(1)
yi
=
yj(0) H (1) yi(0)
j=i
(1)
λi
=
(0)t
yi
(0)
λi
−
(0)
λj
(0)
H (1) yi .
(0)
yj ,
(6)
(7)
Though an interesting exercise, expressing the directional
perturbation of the graph in terms of the eigenvectors of
the MNC UT embedding has not provided any substantial
insight so far. To extract meaning from (6) and (7), we
need to appeal to the alternative interpretation of the MNC UT, specifically the Markov Chain with transition matrix
P = S −1 H(A). The eigenproblem P xi = γi xi for the
(0)
transition matrix is equivalent to (5) through xi = S −1/2 yi
(0)
(0)
and γi = 1 − λi .
(0)
The largest eigenvalue γ1 = 1 contains no information about the graph since P 1 = 1 by virtue of P being a stochastic matrix. So any information contained in
38
(1)
x1 = 1 + x1 + o() can only come from directional perturbation to the graph, making the first coordinate of the embedding particularly relevant here.
To assess what directional information x1 , and hence y1 ,
contains, it is worth taking a closer look at (6). Specifically, the interesting question is which 0th order eigenvectors will most contribute to the first order perturbation. In
(0)t
other words, for which yj is the coefficient
(0)t
yj
x(4)
5
0
(0)
H (1) y1
(0)
−λj
−3
x 10
10
, j = 1
−5
−0.015
(8)
−0.01
−8
−0.005
(0)
λj
−6
−4
0
going to be large? Obviously, a smaller
implies a larger
coefficient, but this is not saying much beyond the fact that
(0)
smaller λj ’s are generally associated with the important
eigenvectors. What is interesting is to determine when the
numerator is large.
(0)
From (4) and the fact that the yj ’s are eigenvectors of
H(B0 ), (8) takes the form:
(0)
λj
(0)t (1) (0)
(0)t
(9)
yj H y1 =
+ 1 xj C1 .
2
−2
0
0.005
2
0.01
x(3)
−3
x 10
4
6
x(2)
Figure 2: Embedding for coordinates x(2) , x(3) , x(4) . The
color map for this embedding corresponds to the job frequency with red for low frequency, green for medium frequency and blue for high frequency.
Also, somewhat remarkably, no fragmentation or clustering is visible; the job landscape appears continuous. In the
figure, the tokens are colored by frequency, making it visible that the embedding is not stratified by this feature. Both
these characteristics suggest that the geometry obtained represents collective properties of the set of jobs, and is not
overwhelmingly dependent on a small subset of high frequency jobs.
Before proceeding to draw conclusions from the mapping,
we need to validate that the embedding represents genuine,
albeit yet unknown, features of the population and that it is
not an artifact of the algorithm employed.
(0)t
(0)t
The term of interest is xj C1 = xj (AH(A)1). This
(0)
term will be largest if xj is parallel to (AH(A)1), meaning that the first order perturbation to x1 will favor eigenvec(0)
tors xj which are closely aligned with (AH(A)1). This
vector, by definition, corresponds to the out-degrees minus
in-degrees of each job of the graph divided by two. In other
words, the first order perturbation of x1 will tend to align
itself with the net divergence/flow of each job. As such, the
directionality of the graph is partially embedded in the first
eigenvector x1 in that it separates nodes with different divergence, such as source vs. sink, while grouping nodes with
similar divergence.
In the context of the job market, this is borne out by survey data. Although the first coordinate shows more structure
than simply the first order perturbation described above, it
does seperate jobs according to whether they have net positive or negative divergence quite well. This interpretation of
the first eigenvector is specially interesting here in that directional information, including divergence, is mainly temporal, i.e. along the natural career progression. Hence, the
first eigenvector of the WC UT embedding is highly correlated with time.
4.1 Stability analysis by Bootstrap
For this purpose we Boostrap the data to obtain new embeddings. We then compare these new embeddings with the
original embedding by using Procrustes and then computing
the covariance of the coordinates for each job.
The Bootstrap confirms that the embedding is stable. For
clarity, we plot the Bootstrap covariance ellipsoids (which
would correspond to 68% confidence regions for normally
distributed data) for the high frequency jobs only. Here, we
define as high frequency those jobs that appear in (95%) of
our B = 1000 Bootstrap samples. These covariances are
shown in Figure 3, while all the low frequency token locations are marked by gray dots.
Note that while the displays are in three dimensions, the
embeddings and Procrustes alignments were performed in
d = 5 dimensions.
The figure demonstrates that for jobs that are well represented in the sample, their relative locations are extremely
stable. This effect is not just a predictable consequence of
the concentration of frequency estimates. The mapping from
the counts Aij to the locations X is highly non-linear - in
particular, it involves division by the counts Dj . To have
4 Validating the embedding
We applied the algorithm WC UT to the 356 × 356 matrix
A obtained by the preprocessing described in Section 3.1.
The resulting embedding is presented in Figure 2 in three
dimensions.
At first glance, one notes that the token distribution in
space is relatively even, and that there are virtually no outliers in the 3 dimensions plotted. While these qualities by
themselves do not guarantee success, they are necessary for
having an informative set of coordinates.
39
that have held that token over the course of the study. For
salary, the color axis corresponds to the mean salary in
cents for the specific job (note: these are inflation-adjusted
$2008). Finally, for the time variable, the color axis represents the mean time at which the token has been observed
in the study in years. The plots also weight the area of each
token in proportion to its frequency in the study so as to display its relative importance.
For gender, we also show a “pair” plot of each coordinate
along with the same color axis as the three-dimensional plot
in Figure 5. The gives a more detailed picture of the embedding for the first 4 coordinates.
−3
x 10
1
10
0.9
0.8
5
(4)
0.7
x
Figure 3: Job embedding with covariances estimated by
Bootstrap, dimensions x(2) − x(4) . High frequency jobs covariance are shown along with color map representing mean
gender (with females depicted in red and males in blue).
Low frequency jobs are shown as grey dots.
0
0.6
0.5
−5
0.4
6
5.5
0.3
5
4.5
stability of the Boostrap estimates, one needs the Jacobian
of the mapping to be stable as well. This is what our Figure
3 demonstrates.
5
0.2
−3
x 10
4
3.5
0.1
3
x
Visualizing the jobs landscape: meaning of
the coordinates
(1)
2.5
6
4
0
2
−2
−4
−8
−6
0
−3
x 10
x
(2)
Figure 4: Gender and the embedding (coordinates x(1) , x(2) ,
x(4) ). Each token color represents its proportion of females, with 100% female in red and 100% male in blue (the
male/female proportion in the survey is 0.48 vs 0.52). A token’s size is proportional to its frequency relative to other
tokens.
Next, we considered how the embedding relates to demographic variables such as gender, salary, and education. We
find that certain coordinates are strongly associated with
these variables. Specifically, gender is related to x(2) , while
salary is related to x(1) and to a lesser extent, x(3) , and x(4) .
As for race/ethnicity, it does not seem to be associated to
any coordinates, at least of the first 10 considered so far.
This remains true even when the data is resampled so that
the racial/ethnic groups are present in the same proportion.
For this reason, race/ethnicity will not be considered further
in this paper.
There is another variable of interest, which is time. For
obvious reasons, this variable is correlated with salary, and
is associated with x(1) and x(4) . In light of the pertubation
analysis of the WC UT algorithm, this is not surprising. Indeed, time (and to some extent salary) is the obvious source
of asymmetry in the job market. As the first coordinate’s deviation from a constant vector is controlled by this asymmetry (specifically, the first eigenvector tends to separate points
according to whether they are in-degree or out-degree dominant), it is obvious why the first coordinate orders tokens
based on their average temporal position in individuals’ career.
Figures 4, 6, and 7 show the embeddings that correspond
to demographic variables gender, wages and time. For gender, the color axis corresponds to the proportion of females
As a confirmation of which coordinate is associated with a
given demographic variable, the regression results are shown
in table 1. This table shows the variation of the linear component of the regression model over the range of each coordinate (for the coordinates retained using BIC as a model
selection criterion). As such, higher coefficients account for
a larger linear change of a demographic variable over the
range of the embedding.
6 Evolution in time
Another point of interest is to understand how groups with
different education levels evolve in time. Education is tied to
mobility, but the interplay between the amount of education,
the type of work and gender are not fully understood. Under
our fairly granular IxO scheme, are less-educated workers
doing a qualitatively different type of work, or are they just
on the lower end of the pay scale and perhaps skill set?
We define four types of workers, classified by education
at age 24 (two-year degree being the separating point) and
40
x(1)
−3
100
6
50
4
0
2
4
6
−3
x 10
2
−0.01
6
−3
x 10
6
4
0
0.01
2
−0.02
x 10
−3
0
0.02
2
−0.01
3.4
x 10
4
10
3.3
0
0.01
−3
40
0.01
0.01
0
20
0
0
−0.01
2
4
6
0
−0.01
0
0.01
−0.01
−0.02
0
0.02
−0.01
−0.01
3.2
5
3.1
0
0.01
x(4)
x
(2)
x 10
0.01
−3
x 10
x
(3)
0.02
0.02
0
−0.02
100
0
2
−0.02
6
−0.01
4
0.02
50
0
0.01
0
−0.02
3
0
0
0
−0.02
0.02 −0.01
2.9
0
0.01
−3
x 10
0.01
0.01
0.01
100
0
0
0
50
2.8
x
(4)
−10
−0.01
2
4
6
x(1) x 10−3
−0.01
−0.01
0
x(2)
0.01
−0.01
−0.02
0
x(3)
0.02
0
−0.01
−5
6
0
x(4)
0.01
5.5
5
4.5
4
3.5
x 10
0
3
2.5
−3
x 10
5
(2)
−3
Figure 5: Gender and the embedding, four first coordinates.
Each token color represents its proportion of females, with
100% female in red and 100% male in blue.
x(1)
x
Figure 6: Wages and the embedding (coordinates x(1) , x(2) ,
x(4) ). Each token’s color represents the average wage, on a
logarithmic scale (base 10). A token’s size is proportional to
its frequency in the population.
gender to examine how each group moves through the embedding. We know that time and salary are most strongly
associated with with the first coordinate, and that the second
coordinate separates IxOs by gender. Hence, we use coordinates x(1) and x(2) to study these four groups.
To get an idea of how each group evolves, we tracked the
center of mass (mean) of each group in the embedding at
successive time periods. We find significant difference in
behavior between the four groups for x(1) and x(2) . Figure
8 shows the evolution of each group along x(1) and x(2) ,
while Figure 9 shows their evolution with time along x(1)
only. For coordinate one, the less educated workers move
through time (as indicated by descreasing level of that coordinate) more slowly – this is suggestive of making less “typical” progress, such as shedding entry-level positions. The
second coordinate, which showed some relationship to gender, now suggests something more subtle: the genders are
well separated within education groups, so the type of work
that forms the career for these workers is fairly distinct and
organized around the information in this coordinate.
7
−5
are: (Pentney and Meilă 2005) a MNC UT embedding for complex eigenvectors, (Zhou, Schölkopf, and
Hofmann 2005) which constructs the symmetric matrix
−1/2
−1/2
−1/2
−1/2 T
AD0
)(Di
AD0
) then applies
Az = (Di
MNC UT to this matrix, and the directed Laplacian of (Andersen, Chung, and Lang 2007).
We applied the preprocessing in section 3 to all embeddings.1 Each embedding was aligned with the WC UT embedding using coordinate axes 2 to 7 by the Procrustes
transformation. Coordinate 1 is constant in all but the
WC UT embedding and was omitted. The Procrustes distortion, quoted for each embedding, represents the proportion of the variance not explained by the alignment and is a
measure of agreement with the WC UT (0 being perfect).
The embeddings by MNC UT and Diffusion map are similar to the WC UT embedding, with the Diffusion map exhibiting the effect of the rescaling of the axes by the λs
(distortions of 0.01 and 0.04 respectively). This is exactly
what we expect, given that the asymmetry in A is weak
(i.e. (A − AT )/2 is small).2 Interestingly, the directed
embedding of (Pentney and Meilă 2005) and of (Zhou,
Schölkopf, and Hofmann 2005) are also relatively close to
the WC UT embedding (distortions 0.11 and 0.31). However, the directed Laplacian embedding is very different (distortion 0.97). This mapping is neither smooth nor informative, having much of the data collapsed in the origin, and the
rest as outliers at different distances. In addition, we also
performed correlation tests of all the demographic variables
on all the coordinates from these embeddings. The correla-
Related work
We compared our embedding with other graph embedding
methods from the literature. Most graph embedding methods assume that the graph is symmetric. In order to use
them, we symmetrize the matrix A by As = (A + AT )/2.
The best known of these are the MNC UT /Laplacian eigenmap method, (Shi and Malik 2000; Meilă and Shi 2001),
(Belkin and Niyogi 2002) and the Diffusion map (Lafon and
Lee 2006). The WC UT embedding is identical to the MNC UT embedding if A is symmetric. The Diffusion map differs from the MNC UT by rescaling the eigenvectors with
the corresponding λ to a positive power. In our embedding
this power is 1.
Embedding methods that work for directed graphs
1
We also obtained embeddings by these methods which omit
some of the preprocessing steps, but the results were uniformly
worse in terms of outliers.
2
We omit the plots for lack of space.
41
(1)
13
x
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(8)
x(9)
x(10)
12
−3
x 10
11
10
10
9
5
x(4)
8
7
0
−10
6
−5
Gender
1.49
-6.28
1.22
1.04
1.00
-0.58
1.13
1.22
-3.29
-0.68
log10 Wages
-0.54
0.10
0.18
-0.16
0.08
-0.07
0.07
-0.20
0.13
0.02
Time
-11.64
1.42
-0.96
2.67
0.48
0.84
0.68
0.91
Educ. Gr.
0.30
-6.14
2.01
0.41
-1.10
0.60
0.93
-1.50
-
−3
−5
6
x 105
0
5.5
5
4.5
Table 1: Variation of the linear part of the models for each
demographic variable. We used linear regressions for continous variables (wages and time) and logistics regressions for
the categorical variables (gender, education groups and education factors). The coordinates selection (removed coordinates are values are replaced with a dash “-”) is performed
using BMA (BIC) on the first 10 coordinates of the embedding.
4
4
3.5
−3
x 10
3
2.5
5
x(2)
(1)
x
Figure 7: Time and the embedding (coordinates x(1) , x(2) ,
x(4) ). Each token’s color represents the average position of
this token in individual careers. A token’s size is proportional to its frequency in the population.
itative way. Our pilot experiment on mean trajectories for
various gender×education groups, which shows early and
strong segregation for some groups, illustrates one of the
possible insights.
While the existence of segregation with respect to the selected demographics is not a new finding (in fact, it is relatively easy to demonstrate such segregation without resorting to embedding), we show that the segregation relates to
many demographic variables in a concerted way, and that it
is discernible from individual transitions alone. Following
this finding, future studies could focus on smaller and more
homogeneous groups to discover the speed at which they advance on their career paths and the location of their niche in
the manifold.
In addition, as the reader has perhaps noted, not all manifold coordinates represent known demographic features.
Thus, finding their meaning represents an opportunity for
further research.
From a methodological perspective, we found it interesting that a method designed for clustering (see the motivation in (Meilă and Pentney 2007)) works so remarkably well
in a pure embedding task (the manifold we obtain exhibits
very little, if any, clustering). This strongly suggests that
the WC UT should be considered a competitive algorithm for
other embedding tasks.
Another feature of the WC UT that benefits our task is the
explicit retrieval of a global directionality axis in the first
coordinate of the embedding. This is a previously unknown
result. For the career data, this axis naturally aligned with
time. As a final point of interest on the methodology side,
we note that it can be shown (although we do not do so here)
that by rescaling the coordinate axes by λi , as we did in our
embedding, one can find a relation between the Euclidean
distance in the embedding and the original transition matrix
P , that is similar to a diffusion distance.
tions with the WC UT coordinates were uniformly stronger
and obviously agreed with the regression results of table 1.
In short, all methods that find a smooth, informative embedding find essentially the same mapping, with small variations. Of them, the WC UT has the unique advantage of
extracting the time component in the first eigenvector, something that is impossible using any of the other methods. We
saw that this coordinate is significant for the career data.
8 Discussion and conclusions
Establishing a coherent embedding for categorical sequences is an inherently challenging problem given the lack
of a natural metric in this domain. This is the first time spectral embedding has been applied to career data. The embedding method used is a variation of the WC UT algorithm, one
of the few existing methods that explicitly take into account
directionality in a graph’s edges.
This approach to mapping has two main traits: (1) it discards explicit time information, but incorporates asymmetry
in the transition information, and (2) it maps jobs, which
are discrete tokens of the form (occupation, industry) into a
continuous, low-dimensional space.
Our goal was first scientific and then methodological. We
aimed to obtain a manifold that captures meaningful features of existing jobs. We demonstrated that the principal dimensions of the embedding relate to important demographic
variables, which were not input into the algorithm. In fact,
not only are these variables correlated with the embedding
coordinates, but they exhibit visible continuity along various
coordinates in spite of considerable noise.
These results suggest the possibility of using the manifold
as a tool for answering a variety of questions of interest to
social scientists and economists in a quantitative and qual-
42
−3
−3
x 10
3.6
x 10
3.4
4
3.7
3.3
2
3.8
3.2
0
3.9
(1)
x
x(2)
3.1
−2
−4
−6
4
3
Low Educ. Male
Low Educ. Female
High Educ. Male
High Educ. Female
5.5
5
2.9
4.1
2.8
4.2
2.7
4.5
4
(1)
x
3.5
4.3
3
0
2
4
6
8
10
12
14
16
Time (in years)
−3
x 10
Figure 9: Education Groups’ Progression over Time for First
Coordinate with Standard Error (Black). Blue = Low Male,
Green = Low Female, Red = High Male, Yellow = High
Female.
Figure 8: Education Groups’ Progression. Blue = Low
Male, Green = Low Female, Red = High Male, Yellow =
High Female. The gray scale corresponds to log10 of the
mean monthly wages in cents.
Scott, M. forthcoming. Affinity models for career sequences. Journal of the Royal Statistical Society - Series
C.
Shi, J., and Malik, J. 2000. Normalized cuts and image
segmentation. PAMI.
Zhou, D.; Schölkopf, B.; and Hofmann, T. 2005. Semisupervised learning on directed graphs. In Saul, L. K.;
Weiss, Y.; and Bottou, L., eds., Advances in Neural Information Processing Systems, number 17. MIT Press.
References
Abbott, A. 1995. Sequence analysis. Annual Review of
Sociology 21:93–113.
Andersen, R.; Chung, F. R. K.; and Lang, K. J. 2007. Local
partitioning for directed graphs using pagerank. In WAW,
166–178.
Belkin, M., and Niyogi, P. 2002. Laplacian eigenmaps and
spectral techniques for embedding and clustering. In Dietterich, T. G.; Becker, S.; and Ghahramani, Z., eds., Advances
in Neural Information Processing Systems 14. Cambridge,
MA: MIT Press.
Durbin, R.; Eddy, S.; Krogh, A.; and Mitchison, G. 1998.
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. New York: Cambridge University
Press.
Lafon, S., and Lee, A. B. 2006. Diffusion maps and coarsegraining: a unified framework for dimensionality reduction, graph partitioning, and data set parametrization. IEEE
Transactions on Pattern Recognition and Machine Intelligence 28(9):1393–1403.
Meilă, M., and Pentney, W. 2007. Clustering by weighted
cuts in directed graphs. In SIAM Conference on Data Mining.
Meilă, M., and Shi, J. 2001. A random walks view of spectral segmentation. In Jaakkola, T., and Richardson, T., eds.,
Artificial Intelligence and Statistics AISTATS.
Pentney, W., and Meilă, M. 2005. Spectral clustering of
biological sequence data. In Veloso, M., and Kambhampati, S., eds., Proceedings of Twentieth National Conference
on Artificial Intelligence (AAAI-05), 845–850. Menlo Park,
California: The AAAI Press.
43
Download