Import and Analyze Knowledge Graph Results on a Large Scale
If analyzing SERPs is the first step in understanding your rankings on search engines, then analyzing the knowledge graph can be thought of as step zero.
SERP positions for a certain keyword show how each page is ranked in comparison to all other eligible pages. Knowledge graph scores on the other hand, show the ranks of the different meanings that a word can take for Google (a person, a city, a brand, etc.).
From Google's documentation: This API is not suitable for use as a production-critical service. Your product should not form a critical dependence on this API.
It's not clear whether this is from a technical reliability or a content correctness point of view, but it is what the docs mention. So please keep this in mind when using it.
Account Setup
In order to be able to send requests, you will need to create a project, set up billing, and activate the knowledge
graph API for your
project. You will then need to create credentials (API Key).
Once you have that, you can use it as your key
parameter when running
requests, as shown below.
How to use Google's Knowledge Graph API
What is "google"? Is it a search engine, a company, a brand, a very large number? What else is it?
And if it is all of those things, what is the relative ranking of each? What is the source of the information, its URL, images (if any)?
>>> google = knowledge_graph(key=key, query='google')
>>> google
query resultScore result.@type result.description
0 google 203191 ['Corporation', 'Organization', 'Thing'] Technology company Google
1 google 49462 ['WebSite', 'Thing'] Google Search
2 google 19142 ['WebSite', 'Thing'] nan Gmail
3 google 13251 ['Brand', 'WebSite', 'Thing'] Website Google Maps
4 google 7549 ['WebSite', 'SoftwareApplication', 'Thing'] Website Google Drive
5 google 6853 ['WebSite', 'Thing'] Website Google Play
6 google 6543 ['SoftwareApplication', 'Thing'] Web browser Google Chrome
7 google 4312 ['Corporation', 'Organization', 'Thing'] Multinational conglomerate company Alphabet Inc.
8 google 3395 ['SoftwareApplication', 'Thing'] nan Google Account
9 google 1306 ['Thing'] nan Google
>>> google.columns
Index(['query', 'resultScore', '@type', 'result.@type', 'result.description',
'result.image.contentUrl', 'result.image.url',
'result.detailedDescription.url', 'result.detailedDescription.license',
'result.url', '', 'result.@id', 'query_time'],
The above table is a sample response from the knowledge_graph()
Many more columns are available as you can see in the second line above.
We can see that "google" is a company, with a result score of 203,191 and it is
a search engine/website with a result score of 49,462. It is then understood as
an email application, a mapping application, and so on, as you can see in the column.
You can also see that we get the types under which this result falls, in the result.@type column. Multiple types show the type inheritance, and as you can also see, everything is a "Thing". This is the top element in the type hierarchy under which everything belongs.
Like the Google SERP and YouTube SERP, functions this funcion works in the same manner, creating, sending, and aggregating the product of the arguments passed to it.
For example if you run
>>> knowledge_graph(key=key, query=["google", "bing"], languages=["en", "fr", "de"])
The function will send 2 (queries) x 3 languages = 6 requests.
(google, en), (google, fr), (google, de) , (bing, en), (bing, fr), (bing, de)
This is actually the main value of having this function, because you usually want a large sample to evaluate certain keywords across languages or types.
Let's check what "seo" and "search engine optimization" mean in different languages.
>>> seo = knowledge_graph(
... key=key,
... query=["seo", "search engine optimization"],
... languages=["en", "es", "de"],
... )
>>> seo
query languages resultScore result.@type result.description
0 search engine optimization de 3587 Suchmaschinenoptimierung ['Thing'] nan
1 search engine optimization de 321 Lokale Suchmaschinenoptimierung ['Thing'] nan
2 search engine optimization de 252 Suchmaschinenmarketing ['Thing'] nan
4 search engine optimization en 71756 Search engine optimization ['Thing'] nan
5 search engine optimization en 5056 Search engine marketing ['Thing'] nan
6 search engine optimization en 576 SEOP, Inc. ['Organization', 'Corporation', 'Thing'] Company
13 seo de 3313 Seoul ['AdministrativeArea', 'Thing', 'City', 'Place'] Hauptstadt von Südkorea
14 seo de 1509 Seo Yea-ji ['Thing', 'Person'] Schauspielerin
15 seo de 584 Suchmaschinenoptimierung ['Thing'] nan
33 seo es 1509 Seo Ye-ji ['Person', 'Thing'] Actriz
34 seo es 584 Posicionamiento en buscadores ['Thing'] nan
35 seo es 316 Jin ['Person', 'Thing'] Cantante
53 seo en 8760 Search engine optimization ['Thing'] nan
54 seo en 3313 Seoul ['AdministrativeArea', 'Thing', 'City', 'Place'] Capital of South Korea
55 seo en 1435 Sulli ['Thing', 'Person'] South Korean actress
>>> seo.columns
Index(['query', 'languages', 'resultScore', '@type', '',
'result.@type', 'result.@id', 'result.image.contentUrl',
'result.image.url', 'result.detailedDescription.license',
'result.detailedDescription.articleBody', 'result.description',
'result.url', 'query_time'],
It's interesting to see how the same word can mean different things in different contexts.
- knowledge_graph(key, query=None, ids=None, languages=None, types=None, prefix=None, limit=None)[source]
Query Google's Knowledge Graph with any combination of parameters.
Note that Google's documentation states that "This API is not suitable for use as a production-critical service." So please keep this in mind.
- Parameters:
key (str) -- Your Google developer key.
query (str) -- A literal string to search for in the Knowledge Graph.
ids (list) -- A list of entity IDs to search for in the Knowledge Graph.
languages (list) -- The list of language codes (defined in ISO 639) to run the query with, for instance en.
types (str) -- Restricts returned entities to those of the specified types. For example, you can specify Person (as defined in to restrict the results to entities representing people. If multiple types are specified, returned entities will contain one or more of these types.
prefix (bool) -- Enables prefix (initial substring) match against names and aliases of entities. For example, a prefix Jung will match entities and aliases such as Jung, Jungle, and Jung-ho Kang.
limit (int) -- Limits the number of entities to be returned. Maximum is 500. Default is 20. Requests with high limits have a higher chance of timing out.
- Returns:
kg_df (pandas.DataFrame) -- A DataFrame of all responses.
https (//