||People detection, tracking, pose estimation and activity recognition in images and video are important and challenging tasks with many practical applications. The goal of my research is to develop solutions for these tasks that are applicable in challenging real-world conditions such as onboard footage from moving vehicles, YouTube videos or images on the web. Current computer vision systems rely heavily on machine learning techniques in order to learn how to perform visual recognition tasks from training examples. However, existing methods involve manual design of the whole recognition pipeline that involves defining the key components of the model, its structure, and encoding prior knowledge by choosing appropriate prior distributions, independence assumptions and manually set parameters. Building complex trainable computer vision systems without requirement for complex manual design is another goal that I pursue with my work. On this route I am currently focusing on methods that allow to leverage large-scale datasets and can be trained end-to-end. I am also actively exploring new ways to acquire large scale datasets by means of computer graphics, crowd-sourcing and semi-supervised learning.