Essential Data Science for Petroleum Geoscientists and Engineers

A Short Course Prepared by Analytic Signal Limited

About this course

Interest in data science and machine learning is rapidly expanding, offering the promise of increased efficiency in E&P, and holding the potential to analyse and extract value from vast amounts of under-utilised legacy data. Combined with petroleum geoscience and engineering domain knowledge, the key elements underlying the successful application of the technology are: data, code, and algorithms. This course builds on public datasets, code examples written in Python, statistical graphics, and algorithms from popular data science packages to provide a practical introduction to the subject and its application in the E&P domain.

Who should attend?

This is an introductory course for reservoir geologists, reservoir geophysicists, reservoir engineers, and technical staff who want to learn the key concepts of data science. By developing your data science skills you’ll be better equipped to analyse your project data, build predictive models, and apply them in your workflows. You’ll also be able to evaluate and ask the right questions about the models created by others, be they in-house data science specialists or external partners.

Learn to use geospatial E&P data to make graphics like this interactive basemap of the UKCS. Zoom, pan, or click the legend to hide map layers.
Learn to read and analyse well log data from LAS files, make statistical graphics like this crossplot and fit models to predict missing logs. Click the legend to hide stratigraphic formations.

What you'll learn

The course comprises a mix of lectures and hands-on computer workshops. You’ll gain a basic working knowledge of coding in Python and the use of a toolset for importing, visualizing and building models from data. You’ll also gain a powerful working environment for data science on your laptop, which together with code examples provided by the course will give you a jump start to applying the techniques you’ll learn to your own projects. Check out this gallery of visualization samples drawn from the course workshops.

What are the prerequisites?

The course is at an introductory level and all subject matter will be taught from scratch. No prior experience of statistics, Python coding or machine learning is required, although knowledge of basic maths and statistics is useful. Hands-on computer workshops form a significant part of this course, and participants must come equipped with a laptop computer running Windows (7, 8, 10) or MacOS (10.10 or above) with sufficient free storage (4 Gb).

Course Content

1. Introduction to Data Science
  • An overview of the data science process and how it can be applied to E&P data.
  • Hands-on experience of an open data science toolkit. The toolkit provides an ideal working environment as you continue your data science journey beyond this course.
2. Python Fundamentals and Computational Thinking
  • Fundamentals of programming in Python.
  • Hands-on experience of Python coding including variables, expressions, data structures, functions, and reading and writing data files.
  • Computational Thinking: the analytical and logical processes of decomposing a complex task and expressing it in a form that can be performed by a computer.
3. Exploratory Data Analysis
  • An introduction to exploratory data analysis, visualization tools, and descriptive statistics.
  • Hands-on experience of visualization and performing exploratory data analysis, including descriptive statistics, data cleaning, and data transformations.
  • Hands on experience of reading and analysing many E&P data types, including wireline logs, well tops, seismic, and production data.
4. Supervised Machine Learning
  • An introduction of supervised machine learning, including algorithms for classification and regression, their advantages and limitations.
  • Hands-on experience of machine learning, building and evaluating supervised models.
5. Unsupervised Machine Learning
  • An introduction of unsupervised machine learning, including algorithms for outlier detection and clustering, their advantages and limitations.
  • Hands-on experience of machine learning, building and evaluating unsupervised models.
Learn to analyse production data and make treemaps like this summary of UKCS gas production. Click cells to navigate the hierarchy of area, operator, and fields.


Dr David Psaila has over 25 years experience in the oilfield services industry, developing technologies for reservoir geophysics, reservoir modelling, geostatistics, and exploratory data analysis. He has worked in R&D, commercial software development, and as an onsite consultant in client offices. David has a wealth of worldwide experience of reservoir studies including quantitative time-lapse interpretation, pore pressure prediction, monte carlo volumetrics, and stochastic inversion. He has also developed and taught courses on reservoir geostatistics and stochastic inversion. David holds a degree in Geology from the University of Oxford and a PhD in Geophysics and Geostatistics from the University of Leeds.