Bruce Alberts, in an early survey, termed assemblies of proteins as protein machines of cells. We know today that protein assemblies carry out almost all of the biochemical, signalling, and functional processes in cells. Although what might seem like an arbitrary set of individual proteins coming together to perform arbitrary functions, protein assemblies can be overly specific and enormously complicated. For example, the spliceosome is composed on five small nuclear RNAs and more than 50 proteins, and is thought to catalyse and ordered sequence of more than 10 RNA rearrangements at a time as it removes an intron from a RNA transcript. The discovery of this intron-splicing process won Phillip A. Sharp and Richard J. Roberts the 1993 Nobel Prize in Physiology or Medicine.
Protein assemblies are known to be in the order of hundreds even in the simplest of eukaryotic cells. However, our knowledge of these assemblies is still fragmentary, as is our conception of how each of these assemblies work together to constitute the ‘higher order’ functional organisation of cells.Therefore, a faithful reconstruction and characterization of all protein assemblies is crucial to understand the functioning of the cellular machinery.
It is now known that proteins seldom perform their functions alone (estimates suggest that over 80% of human proteins do not function alone) but, instead interact to function as macromolecular assemblies. Based on the nature of these interactions, protein assemblies can be classified into three main kinds — protein complexes, functional modules, and signalling and metabolic pathways.
Protein complexes are stoichiometrically stable structures, and are formed when two or more proteins interact at a specific cellular time and space. Protein complexes form the basic building blocks of larger functional structures within cells. Complexes can be both permanent, i.e., once assembled can function for the entire lifetime of cells (e.g. ribosomes), or transient, i.e., assembled temporarily to perform a function and disassociate after that (e.g. kinase-cyclin complexes during the cell cycle).
Functional modules are formed when two or more complexes come together at a specific cellular time and space, and interact between themselves and other individual proteins to perform a function, and disassociate after that. For example, the DNA replication machinery formed from the assembly of DNA polymerases, DNA helicase, DNA primase, and the sliding clamp within the nucleus to ensure error-free replication of the DNA.
Pathways are formed formed via an ordered sequence of interactions between complexes and individual proteins to transduce signals (signalling pathways) or metabolise substrates (metabolic pathways). Unlike complexes and functional modules, however, pathways do not require all components to co-localize in time and space.
The development of high-throughput proteomics technologies including yeast two-hybrid, protein complex co-immunoprecipitation, and affinity-purification-based screening have revolutionized our ability to interrogate individual protein interactions within cells on a massive scale. Up to 70% of the interactions in some of the model organisms including yeast, fruit fly, and worm have now been mapped, and the identification from higher-order species including mouse and human is rapidly underway (catalogued in several public databases; Table 1).
Table 1: Publicly available resources for protein-protein interactions
The binary interactions inferred from these experimental techniques are assembled into a protein-protein interaction (PPI) network. The PPI network provides a global view of the set of all interactions (the interactome) from an organism, and provides a mathematical framework to analyse these interactions. The computational problem of identifying protein complexes from the PPI network assumes that complexes are embedded as modular structures within the network. Topologically, this modularity refers to densely connected sets of proteins separated by less-dense regions. Biologically, the modularity refers to division of labour, and robustness of the network against internal (e.g. mutations) and external (e.g. chemical) attacks. Computational methods to detect protein complexes therefore mine for modular subnetworks from the PPI network.
A plethora of methods have been proposed over the years for protein complex prediction using the PPI network; summarized below are some of them (Table 2). A detailed survey and description of these methods can be found from the following publications:
- Li X, Wu M, Kwoh CK, Ng SK. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics 2010, 11(Suppl 1):S3.
- Srihari S, Leong HW. A survey of computational methods for protein complex prediction from protein interaction networks. J Bioinform Comp Biol 2012, 11(2):1230002.
- Srihari S, Yong CH, Patil A, Wong L. Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes. FEBS Letters 2015, 589(19A):2590-2602 [PDF].
Table 2: Computational methods for protein complex prediction from protein interaction networks. The methods are classified based on the kind of strategy (e.g. network clustering) and/or the kind of biological information (e.g. evolutionary conservation) used to identify protein complexes. The associated softwares are available as Cytoscape plug-ins (Cy), command-line programs (CL) or as online (OL) web servers under the mentioned links (source).