In an age where organizations rely on data analytics to make crucial decisions, Databricks provides an efficient platform where data can be stored and accessed from a single place. Founded in 2013, Databricks is now considered a global leader in unified data analytics and is also valued at around $6.2 billion.
Databricks is a computer software company based in California, USA. It specializes in the development of a platform that unifies the organization's data in the cloud which will eventually be used for data analytics.
Storing and analyzing massive amounts of data is not as simple as it sounds. This is especially true of large corporations as they are faced with thousands of transactions every single day. The importance of data analytics is paramount for companies moving forward, especially if they plan to develop a new product or venture into a new industry without having to incur heavy financial losses. But data analytics does not really begin with the analytics, but with the storage.
Databricks isn't the first unified data analytics or data science platform in the industry, and it won't be the last. But what sets Databricks apart from its competitors such as Snowflake and HDInsight is the simplicity and the efficiency of the said platform. The Apache Spark is the cluster-computing framework where Databricks is based on. According to numerous customer reviews, Databricks is 8x faster when it comes to caching, indexing, and advanced querying compared to other analytics platforms. It also aims to simplify the machine learning (ML) cycle, placing most ML frameworks in one central hub that results in easier data ingestion, model building, productioniziation, and experiment-tracking.
It also helps that the company formed a partnership with tech giant Microsoft, resulting in the creation of a data science platform called Azure Databricks. Azure is Microsoft's entry into the cloud computing market. The partnership between the two companies results in Databricks becoming an integrated Azure service. With Azure Databricks, data scientists and data engineers will get access to numerous Azure services such as Azure Active Network (AAD), SQL Data Warehouse, and Power BI. AAD is a security framework that safely secures confidential corporate data without the need for users to heavily tinker with numerous security configurations.
Founded by professors and students from the University of California (UC), Databricks has become one of the leading unified data analytics providers in the world. The company so far raised $883 million and has generated $350 million in revenues for the third quarter of 2020. After the most recent $400 million public funding, Databricks' valuation is now set at $6.2 billion.
Despite the company's rapid rise in the past few years, it initially struggled to generate revenue. In a recent interview, co-founder and then VP of Engineering and Product Management Ali Ghodsi had already set his sights on quitting since the company was massively underperforming. But investor Ben Horowitz had other plans, suggesting that Ghodsi would take over as CEO, replacing fellow co-founder Ion Stoica. Much like Stoica and the rest of the Databricks co-founders, Ghodsi had no prior experience in business, but that didn't stop him from creating a plan that would eventually turn out to be the company's saving grace.
Ghodsi immediately changed the direction of the company, going all-in on enterprise sales, and hiring highly experienced salespeople. Soon after, revenues jumped tremendously and big companies such as Comcast, Shell, HP, Expedia, and Regeneron were now using Databricks. The company also caught the attention of Microsoft, resulting in a huge investment and a partnership that has seen the birth of Azure Databricks.
Databricks is headquartered in San Francisco, California, and also operates in 16 locations across 11 countries including London, Amsterdam, and Singapore. The company is currently employing around 1,400 workers.
Databricks' history can mostly be attributed to academics making it big in the business world. The company was founded in 2013 by UC Berkeley professors Ali Ghodsi, Ion Stoica, and Matei Zahara, as well as UC Berkeley PhD students and graduates Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji.
The co-founders had previously developed numerous open-source data management platforms such as the Apache Spark, Delta Lake, MLflow, and Koalas.
The company obtained its first round of public funding in 2013, raising $14 million in a Series A funding round led by Andreessen Horowitz.
Between 2014 and 2019, Databricks raises $883 million in the form of multiple Series fundings. The investment rounds were participated by Andreessen Horowitz, Microsoft, New Enterprise Associates, Data Collective DCVC, SineWave Ventures, Green Bay Ventures, Geodesic Capital, Battery Ventures, Coatue Management, BlackRock, T. Rowe Price Associates, and Tiger Global Management.
Microsoft participated in the Series E funding in 2019 that led to the creation of Azure Databricks. In the same year, Databricks announced that it had generated revenues of over $200 million.
Databricks was recognized by research firm Gartner in the 2020 Magic Quadrant as a Leader in Data Science and Machine Learning Platforms. The company also ranked 5th in the 2020 Cloud 100.
The University of California (UC) in Berkeley, California is essentially the birthplace of Databricks. Ghodsi, Stoica, Zaharia, Wendell, Xin, Konwinski, and Shiraji were all involved in developing the Apache Spark in the university's AMPLAB laboratory before moving on to develop Databricks.
Stoica is also one of the founders of AMPLAB which focuses on working with big data projects. Aside from Apache Spark, AMPLAB also invented other open-source projects such as computer management software Apache Mesos and virtual distributed file system Alluxio. Stoica is currently the Executive Chairman of Databricks.
Ghodsi is the company's CEO while the other founders also hold significant positions. Both professors at UC Berkeley before founding Databricks, Ghodsi and Stoica are still teaching at the university up to this day.
After generating very little revenue in its early years, Databrick's revenue skyrocketed to $100 million in 2018. Revenues increased again in 2019, with the company generating $200 million. With these numbers, annual recurring revenues have doubled between 2018 and 2019 and more than tripled between 2017 and 2018. In the third quarter of 2020 alone, the company accumulated $350 million in revenues.
The company's $6 billion valuation and impressive revenue numbers have only fueled talks of a Databricks IPO. Ghodsi and the rest of the management team have not confirmed the company going public, but reports are indicating the company will offer an IPO by 2021.
Databricks doesn't only face competition from companies offering unified data analytics, but also companies offering data science and machine learning platforms. Among its major competitors include Snowflake, Google (BigQuery), Amazon Web Services (SageMaker), and even Microsoft (Azure HDInsight).
Much like Databricks, Snowflake is a global leader. Its client portfolio includes Adobe, Lionsgate, Capital One, DocuSign, Sony Pictures, and 2K, among many others. The speed of its data warehouse when querying data ranks among the best in the industry. Also, it is praised for its easy-to-use SQL query engine and flexibility in using it together with Amazon S3, Google Cloud, and Microsoft Azure.
Google BigQuery and Amazon Web Services Sagemaker are also popular among business organizations, universities, and other private institutions. Both are praised for being platforms that easily query large data sets while being able to store these data sets for use at any time. Also, the majority of users do not have complaints with BigQuery and Sagemaker when it comes to different types of data integration such as data pipeline automation and data sources integration.
Azure Databricks isn't the only Microsoft data science software in the market. HDInsight, previously developed by data software company Hortonworks, functions the same way as the other platforms, making it a direct competitor to Databricks. HDInsight is integrated with Azure and can be used with other open-source frameworks such as Hadoop, Spark, Hive, LLAP, and Kafka.
There are also numerous startups that have ventured into developing big data management software, including Theia, Fivetran, Ahana, Aparavi. Stytch, and Cockoach Labs. Theia allows users to manage content across different analytics tools and is recognized by Gartner as a “Cool Vendor”. Fivetran also offers a data integration platform similar to Databricks and is listed in the 2020 Cloud 100.
Things may have turned out differently if CEO Ali Ghodsi never took that big risk. Databricks is now of the leaders in data science and machine learning, and as more companies are looking to utilize data analytics in their daily operations, Databricks seems to have the best solution in the market.