Site icon Faro Analytics

How to Get Free Oil and Gas Data for One of the Most Active Basins

When you start searching the internet for oil and gas data, the first thing you will probably come across are sites for expensive data services. For small consultant groups and companies, this can be quite a dent in a budget. Why should you pay for data that is easily available when that money can be spent on buying interests or actual operating expenses? A lot of what you are looking for can be acquired for free or at a really low price. In this post we will give you the links to some of these sites and show you the way to get at some data that is very useful right now.

State Sites

New Mexico Data

So we said at the beginning of the article that we would show you how to go about getting data for an area that is popular right now. New Mexico’s Eddy and Lea counties are in the northwest portion of the Permian Basin – a large amount of the Delaware basin. We are going to show you where to get all of it at once (and we mean ALL of the wells…in the state), how to view it, and how to work with it by giving you a small example that will pull all of the effective lateral lengths with their corresponding APIs and some pointers on how to clean the data before using it. Also, keep in mind, NM updates their production quarterly.

Where to Find the New Mexico Data

To start, if you are looking for just a single well or maybe a few wells, you should use their well search . Otherwise, to download all of it, go HERE and follow the link to the FTP server. From there, select the OCD Data folder. The OCD_DataDictionary.xls file will really help with trying to figure out what data you would want from the proper file. The T_WC_VOL (production) and OCDExport (other general data) will be the zip files that contain the bulk of the information. There is a lot of good data in most of the FTP folders, but the two files mentioned are going to be key.

The Not-So-Easy Part

All of the files you will download, minus the data dictionary, are XML files. There are many other formats I prefer over XML, and I like to joke that it is “literally the least you could do” in getting data out there in a usable form. If you have an XML reader, great. If not, you will probably try to open it with some other program – perhaps using notepad, some browser, or any application that converts something to text. Seeing as how a majority of these files are larger than 1 GB in size, you may be waiting a while for your request to complete – if your computer just doesn’t crash trying to handle it. There is an incredibly easy way to view, parse, and use these files in Python.

Program Description

The program requires 3 libraries: Pandas, Numpy, and ElementTree. If you go on to cleaning the data, you will probably also want to import Matplotlib.pyplot to build some histograms to see where data has to be culled or find outliers. After the import, it only takes 2 lines of script to view some top portion of the file and get an idea of how it is structured. You can replace the ‘xxx’ portion of the statement to view as many characters as you want until you have a good understanding of the patterns. The two pickle statements that are commented out below the main data acquisition loop allow you to save or read the final file from the loop. I usually keep those two lines in a program if something takes a bit of time to run so I have that portion saved and don’t have to rerun that part. By the time the looping is done, you will have data for 122,470 wells in terms of IDs and perf footage values. After that, the program combines the identification values to get full API numbers and subtracts the perf values to get effective lateral lengths. The full program can be found HERE.


I would suggest that if you use this, you do some cleaning on the data. This is just a basic example to show how to get the data and give some pointers on what to watch out for. Speaking of pointers, when you sort the data by the perf locations and distances, you will find that either companies are drilling 99,999′, 9,999′, and 9′ wells incredibly often or that they are able to drill in negative feet. Different people will clean the data in different ways. Most up for debate would be what a user would call a cut off for the minimum and maximum lengths of a lateral in this part of the world. A good way to tie all of this together is to use this data extract script to get more qualitative well data and figure out which wells are horizontals in the first place, where you have duplicate wells and take the greater/lesser of the two values, and if that general well data you just pulled to clean this well data is, itself, clean. Data set secret: NM has a lot of wells in their database tagged as verticals that suspiciously end with a number and the letter ‘H’ and were drilled in the last 3-5 years, so heads up.

Exit mobile version