Critical Thinking Ability of High School Students in Daily Life Acid-Base Concept

a Department of Chemistry Education, Universitas Negeri Yogyakarta, Kampus Karangmalang, Yogyakarta 55281, Indonesia b Department of Chemistry, Indonesia Defense University, Kawasan IPSC Sentul, Bogor 16810, Indonesia c Bereau of School Curriculum Analysis, Design, and Interpretation, Chemistry Alumny Forum (FoAM), Universitas Gadjah Mada 1 das_salirawati@uny.ac.id*; 2 erfan@uny.ac.id; 3 anggiyaniratnaningtyas@uny.ac.id; 4 rhmtbsq@gmail.com * corresponding author


INTRODUCTION
In this disruptive era, teachers are urged to be able to arouse students' thinking abilities at the higher cognitive level so that the students are wonted to thinking critically, creatively, innovatively, and used to contend with challenges (Dilekli & Tezci, 2016;Hwang & Chen, 2017). However, in reality only a few teachers giving questions relate to the application of concept in chemistry for their students in the learning process (Danczak et al., 2020). The teacher should present more divergent questions so that students are accustomed to using logic and reasoning. The question "why, how, and explain" should be raised more than convergent questions (what, who, https://jurnal.unimus.ac.id/index.php/JPKIMIA/index when, and where) that only designed for short and simple answers and less involving mental processes (reasoning) (Eliasson et al., 2017;Kipper & Rüütmann, 2010).
In chemistry curriculum 2013, the teachers were mandated to be able to create challenging learning, to evoke a critical, logical, creative, and innovative attitude to students. Conversely, Indonesia"s result in the OECD Program for International Student Assessment (PISA) 2015 shows that Indonesia only ranked 62th from 72 counties, even though, there was a slight improvement compared to 2013 (ranked 71 st from 72 country) and worse than in 2009 (ranked 57 th from 72 country) (PISA, 2016). However, the latest PISA result for Indonesian students in 2018 was quite low (ranked 74 th from 79 country) in reading, mathematics, and science category. The poor results of the PISA 2018 report combine with Trends in International Mathematics and Science Study (TIMSS) 2015 report for Indonesia indicate that Indonesian students troubled dealing with high-level aspects problems that need critical thinking skills i.e. analysis, assessing arguments, deduction and induction, decision making and problem solving, even though they were quite excellent dealing with the theoretical and memorizing problems (Mullis et al., 2016). This evidence exhibit learning in Indonesia relatively has not able to invite students to raise their critical thinking skills to solve problems. Whereas, in chemistry classroom, critical thinking skills immensely required to advance a strong understanding of basic chemistry concepts, students should have ability to relate their understanding of the symbolic to the atomic and macroscopic levels in daily life phenomenon (Bain et al., 2014;Hernández et al., 2014).
The need to advance the critical thinking skills of the high school (HS) students has driven to the construct and employment of a breadth of teaching innovations, and the development of a range of methods of assessing the impact of these interventions. Many of these assessment methods utilize validated, commercially available tests. Critical thinking skills tests that commonly commercially available were the California Critical Thinking Skills Test (CCTST) (Assessment Insight, 2013), the Watson-Glaser Critical Thinking Appraisal (WCGTA) (AssessmentDay, 2019), the Watson-Glaser Critical Thinking Appraisal Short Form (Pearson, 2018), the Cornell Critical Thinking Test Level Z (CCTT-Z) (The Critical Thinking Co, 2019), the Ennis-Weir Critical Thinking Essay Test (EWCTET) (Ennis, 1993), and the Halpern Critical Thinking Assessment (HCTA) (Halpern, 2010).
All of the tests above did not require any specific expertise and use a general knowledge background to make a reasonable attempt on the tests. Each test was guide by a manual containing specific instructions, norms, validity, reliability and item analysis. Several report of empirical research suggest that the WGCTA was the most outstanding test in use (Huber & Kuncel, 2016). However, recent trends reported the CCTST has achieved reputation amongst researchers since its inception. Whereas the commercial were popular, some report found these testing were inconsistent; some studies reported significant changes while others reported no significant changes in critical thinking (Behar-Horenstein & Niu, 2011). Previous publication of the CCTST or the WGCTA mentioned that some studies reporting increases, decrease, or no change in critical thinking skills over time (Carter et al., 2015). These reviews highlight the importance of experimental design when evaluating critical thinking. Review of the 27 studies found that only 7 of them exhibit significant changes in critical thinking (McMillan, 1987). McMillan concluded that the tests which were constructed by the researcher which address specific critical thinking outcomes were better compared woth the critical thinking as a broad and generalized construct.
The evidence recommend that if these assessments would be implemented to the chemistry students, the context of the assessments should be in the field of chemistry so that the students may better engage with the assessment in a familiar context and reflect their actual critical thinking ability (Ennis, 1993;Halpern, 1998;McMillan, 1987). Some publication of critical thinking tests specified to chemistry students were found in the literature. Kogut (1996) developed exercises where students were required to note observations and underlying assumptions of chemical phenomena then develop hypotheses and experimental designs of particular topics such as trends in the periodic table or the ideal gas law. Similarly, the six questions consisting of a statement requiring an understanding of chemical knowledge was also developed by Jacob (2004). Students were expected to predict the conclusion was valid, possible or invalid and provide a short statement to explain their reasoning. Garratt et al. (2000) developed a book dedicated to developing chemistry critical thinking entitled "A Question of Chemistry".
In our best knowledge, there was very limited investigation performed to measure specific critical thinking ability in applying a chemistry concept in daily life of HS students. Thus, there was a requirement to develop a chemistry critical thinking instrument models which could be used to assist chemistry teacher, educators, and chemistry education researchers in evaluating the effectiveness of teaching interventions designed to develop the critical thinking skills of chemistry since in HS level. The developed instrument models should be in the specific topic so could accurately reflect the critical thinking skills of chemistry HS students. This study aimed to https://jurnal.unimus.ac.id/index.php/JPKIMIA/index measure the critical thinking ability of HS students on a specific topic which was the concept frequently applied in daily life: acid and base by developed critical thinking question.

METHOD 2.1. Research Instrument
The research instrument was 15 multiple choice problems with five option answer represent 7 criteria of critical thinking developed based on critical thinking skills assessed by commercially critical thinking test (Assessment Insight, 2013;AssessmentDay, 2019;Ennis, 1993;Halpern, 2010;Pearson, 2018;The Critical Thinking Co, 2019) and syllabus of 2013 chemistry curriculum (Kemendikbud, 2017). The multiple choice questions with five option answer was specialized by literacy concerning chemical phenomena in daily life related to the acid and base concept.
The items developed prioritize the representation of critical thinking skills, aside the distribution of cognitive skill. The content refers from various references and packaged into a concise and contextual question so that the students were required to thoroughly criticize to answer it. The 7 criteria and the indicators of each question item were presented in Table 1.

15
Raises the idea of overcoming the problem through the application of a neutralization reaction by properly utilizing the chemicals around them. Validation of the instrument was piloted based on Yusoff (2019) by calculating Content Validity Index (CVI) following six step of content validation: (a) preparing content validation form, (b) selecting panel of experts, https://jurnal.unimus.ac.id/index.php/JPKIMIA/index (c) early instrument review, (d) revision, (e) confirmation from experts, (f) providing score each item, and (g) calculating CVI. The selected panel of experts were content expertise (Dr. Isana Supiah YL, Department of Chemistry Education, Universitas Negeri Yogyakarta), assessment expertise (Dr. Sri Yamtinah, Department of Chemistry Education, Universitas Sebelas Maret), and chemistry education expertise (Dr. Maria Paristiowati, Department of Chemistry Education, Universitas Negeri Jakarta). After calculating CVI, the valid instrument was then administered to a group of students. Reliability testing is performed to see if the indicator used is reliable or not by referring to the coefficient of Cronbach's Alpha; with the following basic decision making (Taber, 2018): (2) if Cronbach's Alpha ≥ 0.6 then the construct used reliably; (b) if Cronbach's Alpha <0.6 then the constructs used are not reliable.

Sample Size Selection
The population of this research was the natural science specialization of 2nd-grade HS students in Yogyakarta (consist of six public HS and six private HS) in the academic year 2018/2019. The six public HSs were SMA N 3, SMA N 4, SMA N 6, SMA N 8, SMA N 9, SMA N 10, and six private HSs were SMA Muhammadiyah 1, SMA Muhammadiyah 2, SMA Stella Duce 1, SMA Stella Duce 2, SMA BOPKRI 1, and SMA BOPKRI 2. The samples were taken by cluster random sampling through draw two classes in each of these HSs. The total sample administered in this work was 694 students.

Ethics
All of the study samples in this research were informed that their participation was voluntary, anonymous, no affect their academic or professional records, and they were free to withdraw from this study at any time.

Research Design
This research was an ex-post facto study adapted from Ary et al. (2018), without any treatment to the object has been tested; the research was carried out in accordance with existing circumstances. Data collection conducted by administered students who became sample study using developed critical thinking instrument models that had been validated by three reviewers. The qualitative data obtained as an answer patterns from each student represent critical thinking ability in daily life application of acid and base concept. The data was analyzed by count the number of the correct answers for each item and transformed it into percentage of correct answers (quantitative data, %) for each item represent a criterion of each HS. Flow chart of the methodology was presented in Figure 1.
Poor The percentage of correct answers was converted from quantitative data (%) into qualitative category adapted from Johnson & Christensen (2019) ( Table 2). The critical thinking skill"s grade (CTSG) obtained from conversion of correct answers to a 100 scale, by formula: CTSG = 10 × (correct answer)/1.5.

Item Analysis
The items were one-best type, having a single stem and five answer options, one of them being correct answer and the other being distractors. The students were required to select the correct choice and fill in the answer sheet given separately. Each correct response was awarded 1 mark. No mark was given for blank response or incorrect answer and there was no negative marking. Thus, the maximum possible score of the overall test was 50 and the minimum 0.
The item difficulty index is calculated as a percentage of the total number of correct responses to the test items. It is calculated using the formula p-value=R/T where p-value is the item difficulty index, R is the number of correct responses, and T is the total number of responses (which includes both correct and incorrect responses). When multiplied by 100, p-value converts to a percentage, which is the percentage of students who got the correct item. The higher the p-value, the easier the items. The p-value between 20 and 90% are considered as good and acceptable. Among these, items with p-value between 40 and 60% are considered excellent, because discrimination index (DI) is maximum at this range. Items with p-value <20% (too difficult) and more than 90% (too easy) are not acceptable and need modification. It needs to be conceptualized that a p-value is basically a behavioral measure. Instead of explaining difficulty in terms of some intrinsic substantial characteristic of the item, difficulty is defined in terms of the relative frequency with which those taking the test choose the correct response (Quaigrain & Arhin, 2017).
The item DI is the point biserial correlation between getting the item right and the total score on all other items. Then, the total number of students in the upper 27% (UG) who obtained the correct responses and the lower 27% (LG) who obtained the correct responses were counted. The DI was calculated using the formula DI=2(UG−LG)/n, where n is the number of people in the two groups (Hingorjo & Jaleel, 2012). The higher the DI the better the test item discriminates between the students with higher test scores and those with lower test scores. Guidelines on classical test theory item analysis items were categorized in their discriminating indices: (1) if DI ≥ 0.40, then the item is functioning satisfactorily; (2) if 0.30 ≤ DI ≤ 0.39, then little or no revision is required; (3) if 0.20 ≤ DI ≤ 0.29, then the item is marginal and needs revision; (4) if DI ≤ 0.19, then the item should be eliminated or completely revised.
Interpretation of items is based upon the distribution of responses among the correct answers and distractors. Nonfunctional-distractors (NF-Ds) were those selected by less than 5% of students (Towns, 2014). Distractor efficiency ranged from 0 -100% and was determined on the basis of the number of NFDs in an item. Four NF-D: DE = 0%; 3 NF-D: DE = 25%; 2 NF-D: DE = 50%; 1 NF-D: DE = 75%; No NFD: DE=100%.

Developing Research Instrument
This study was aim to measure the level of critical thinking ability in the acid-base topic of 2 nd -grade students of six public HS, six private HS, and combination of both of them in Yogyakarta for the 2018/2019 academic year based on the answer patterns of the students toward the developed critical thinking questions.
A literature review has been performed especially from the establish instrument: CCTST (Assessment Insight, 2013), WCGTA (AssessmentDay, 2019), the Watson-Glaser Critical Thinking Appraisal Short Form (Pearson, 2018), CCTT-Z (The Critical Thinking Co, 2019), EWCTET (Ennis, 1993), and the HCTA (Halpern, 2010) focused on searching for various phenomena in daily life related to the chemistry concept. The result was the seven criteria and their indicators represent the critical thinking ability criteria in daily life acid-base concept.
The questions were made in the form of multiple-choice questions with five choice options characterized by literacy about chemical phenomena related to the acid-base topic. Before being implemented, critical thinking questions were reviewed first by the three reviewers consisting of content, an evaluation, and an education expert. The results of the review from the three experts were in the form of input and recommendation for question improvement, both in the terms of the correctness of the questions according to the material substance in the questions (content expert), the quality of the questions in terms of construction and language aspects (evaluation expert), and the function and meaning of the questions as a test of critical thinking skills of students (chemistry education expert).
All input and recommendations from the three experts were followed up by improving the critical thinking questions according to the input/ recommendation given, except for input from the education expert for https://jurnal.unimus.ac.id/index.php/JPKIMIA/index question number 10 (Table 3). This was because option E in question number 10 is intentionally given so that the student with good reasoning can correlate this option with the statement on the subject matter (question stem construction). Thus, this question can be answered properly if the students are critical. After revision, the three experts declared that the questions of critical thinking ability were excellent and worthy as a research instrument. The CVI calculations on the item scale by three experts were tabulated in Table 4. Based on the calculation, it can conclude that item-CVI (I-CVI), Scale-CVI based on the average method (S-CVI/Ave) and Scale-CVI based on the Universal Agreement method (S-CVI/UA) meet satisfactory level, and thus the scale of questionnaire has achieved satisfactory level of content validity, so all of the items were acceptable (valid) for implementation step. The valid instrument was presented in Table S1 of supplementary materials. Editorial sentences. 7 The writing of K b . 12 The use of the non-standard term "where" (because it is not an interrogative sentence) Evaluation Expert 2 Options A and C do not correlate with chemistry, should be replaced 3 The answer has no relation with the prohibited info, so the info in the stem does not work 6 The answer key is not related to information, preferably not because of the "forget" element 9 Option E is used in Perusahaan Air Minum (PAM), changed to be false. 11 Mylanta dan Promag contain the same substance, option E is not correct Education Expert 1 Options D and E are not related to reading with a food focus, so students can get confused 3 In the reading, it is stated that gas was produced, so students will point options C and E (there is a word gas) 11 The reading teaches children to take matters into their own hands by treating others. This should not be conducted. Reading settings should be changed with the same purpose. 10 Do the students even think about option E? 12 Should not fructose be written as sugar in fruits? 13 If asked for a tool, the answer options are only A and B. Furthermore, the revised questions started to be reorganized and two versions of the questions were made, namely A and B by simply randomizing the question numbers without changing the contents of the questions. This was conducted considering the critical thinking skill test was imposed on students in two classes for each HS in sequence, so that if the order of the question numbers was the same, there was a concern that the questions would be biased. In other words, randomizing the number of the questions into two types of questions aims to anticipate question leakage because the questions consist of only 15 question items.
Two versions of the questions that had been made were ready to be implemented for data collection at the 12 HS that had been determined as the study sample. After ensuring the acid-base topic had been taught by the chemistry teacher at the HS sample, coordination with the schools was then carried out to determine the class and the day of data collection (critical thinking ability test). The results of the answer sheets from each student in each HS were then entered into the basic data table that had been prepared so that the average percentage per criteria and the overall percentage of criteria for each HS sampled could be calculated.

Measurement of Critical Thinking Ability
The internal consistency (reliability) of the instrument was satisfactory with the Cronbach alpha 0.704 in the total participant and in each item. The reliability testing by referring to the coefficient of Cronbach"s alpha shows the instrument was reliable (>0.600). Based on the recapitulation provided in Table 5, it indicates that for each criterion tested the highest percentage for those who answered correctly did not always occupy by the students from the same school, both for public and private HS. As an example, for criteria 1 (ability to distinguish facts, non-facts, and opinions) represented by questions number 1, 2, and 3, the highest percentage for public school was occupied by SMA N 6 Yogyakarta (code C), which was 55.8%, while it had the smallest percentage in criteria 6 (ability to identify cause and effect) as represented by question number 12 and 13. Likewise for private HS, SMA Stella Duce 2 (code J) had the highest percentage in criteria 1 (51.0%), however in criteria 2 (ability to distinguish definitive and temporary conclusions) represented by questions number 4 and 5, it had the lowest percentage (19.6%). This result also applies to other HS which indicate that students from each school show the prominent critical thinking ability on one criterion only but shows a relatively low ability at the other criteria.
Conversely, the smallest percentage was occupied by SMA N 9 Yogyakarta (Code E) in deficient category (34.0%). Likewise, for private HS, the highest percentage was occupied by SMA Stella Duce 2 Yogyakarta (Code J), in the moderate category (41.1%). On the other hand, the smallest percentage was occupied by SMA BOPKRI 1 (Code K) in the deficient category (28.1%).
When viewed from the average percentage of all criteria for public HS (40.2%) and private HS (34.8%) it showed a relatively moderate difference with an overall average (12 HS) of 37.5%. If the value was converted into qualitative criteria, then the average for public HS was in the moderate category, while private HS was in the deficient category, and the overall average was in the deficient category.
Based on these results, it shows that public and private HS, which have been known as favorite HS, was not always have students with better critical thinking ability than non-favorite school. Although this critical thinking test is only limited to the acid-base topic, at least it becomes empirical data that can illustrate that so far students were still relatively rarely accustomed to facing questions that can reveal students' critical thinking ability. So, when faced with this question, they may not be ready to digest the direction of each question they were working on (Martensson & Hansson, 2018).
Therefore, the results of this study were expected to be an input for teachers to plan a better teaching, associate subject matter with the events or phenomena that occur in the daily life or natural surroundings, so that the ability of students to criticize was well-practiced. This can be achieved if the teachers want to continuously learn, broaden horizons, and improve their quality and their professionalism by providing learning innovations so that students were not only fed with theoretical concepts but more directed at the application of the concept in life. The low learning outcomes of the students can be caused by learning chemistry only in the classroom, focuses on the theory without being directed to link the learning content with everyday life (Cano et al., 2014).
Thus, the subject matter provided is beneficial for students in solving life problems, because it provides a provision for deep concepts concerning the explanation of various natural phenomena that may occur around their lives. Parallel with that, Espinosa et al. (2013) reported that students which learning chemistry in the classroom have many hopes, one of which is a willingness to understand the benefits of chemistry in everyday life. Through thinking critically, students can find various ways for solving problems (Moore, 2001).  ) 37.83 Observing from the scores obtained by students both from public and private HS, it showed that the highest average score was occupied by SMA N 3 Yogyakarta (Code A, 47.55), while the lowest average value was https://jurnal.unimus.ac.id/index.php/JPKIMIA/index occupied by SMA BOPKRI 1 Yogyakarta (Code K, 28.05) (Table 6). Interestingly, some of the public HS had an average score below private HS i.e. average score of SMA N 9 Yogyakarta (Code E, 33.95) was below from SMA Stella Duce 1 (Code I, 36.10), SMA Stella Duce 2 (Code J, 40.95), and SMA BOPKRI 2 (Code L, 39.00) (Note: the average score of six public HS, six public HS, and twelve HS in Table 6 and Table 5 were actually equal, however the value was quietly different because of the rounds of decimal digit).
Based on these results, the problem of critical thinking ability was not determined by where the students came from, however it depends on how students get used to and were practiced in using their critical attitudes in the learning situations. The possibility of chemistry teachers in private HS may be more concerned with questions that are sometimes a bit strange as asked by students, even though such questions are questions that can only cross the minds of the students who have a critical attitude (Robinson, 2011).
A student categorized as an unsuccessful student if have score less than 60 (Sunarti, 2014). This means that the students cannot master the topic in the questions. In this case, because all average value was less than 60, it means that the critical thinking ability of students from 12 HS was still relatively low (deficient). These results were not the only one problem indicator regarding the low critical thinking ability of the students which is need higher-order thinking skills. Previous studies also show this condition (Khasanah et al., 2017;Pursitasari et al., 2020).
The low critical thinking ability of students may cause by several issues. Firstly, students tend to be passive and do not respond the questions given by the teacher. The condition makes students difficult to understanding the concept (Angelo, 1995;Basuki, 2020;Basuki et al., 2018;Fitriani et al., 2019) Therefore, the teachers were necessary to have learning innovations to improve the critical thinking ability of students (Afandi et al., 2019).
Secondly, students have low critical attitude. Critical attitudes is a disciplined process that is intellectually active and skilled at conceptualizing, applying, analyzing, synthesizing, and/or evaluating information collected from various processes (Moore, 2001). One of the characteristics that students have a critical attitude is when they get information in the form of new concepts from the learning process; they quickly evaluate them in their cognitive structure so that when the concept is unclear, they will immediately ask questions. With the questions asked, students' critical thinking would be utilized in exploring problems that demand immediate answers. Relatively, many teachers do not like students who asked too many questions. Even though such children were what we expect many to be in the classroom so that the class becomes alive and teachers" horizon becomes broad, which of course affects other students in gaining knowledge more than just written in a book.
Thirdly, the teacher presented a low quality of learning process (Moore, 2001). If a teacher delivers a subject matter that students feel was less profound and then they ask questions, that were a sign that the student thinks critically, and the teacher must respond immediately so that this critical thinking continues to grow, not the other way, dying slowly because the teacher breaks the spirit by not responding to their questions. Therefore, the results of this study were expected to trigger the chemistry teachers to improve their learning quality by designing innovative lessons so that students were able to think critically. The low quality of learning process also make student tend to be passive and low participation in discussion (Oktaviana et al., 2016).
Based on the overall results obtained in this study, it implies that senior HS chemistry teachers, in particular, have to create learning that accustom and practice the critical thinking ability of their students. If learning on his own is considered too difficult, a teacher can share his knowledge with fellow friends who share the same field of knowledge, either in official forums (MGMP) or in small discussion forums through direct meetings or networks. The teacher also can apply the various methods i.e. Problem/Project Based Learning (Sumarni & Kadarwati, 2020;Yuliyanto & Rohaeti, 2013) Contextual Learning (Susanti et al., 2018), STEM (Mutakinati et al., 2018), and so forth which possible to linking the chemistry topics with the phenomena and events in real life of students can improve their critical thinking ability.
Teachers must realize that the era of globalization is an era full of challenges and competition, so that students in the future not only require simple intellectual thinking skills, but also critical thinking, communication, creative, and collaborative skills (Sipayung et al., 2018).
Hope the results of this study can be followed up in the form of community service activities, especially the education community: senior HS chemistry teachers who are packaged in a training workshop on the preparation of critical thinking ability question for each subject matter contained in chemistry subjects in senior HS as well direct practice so that the results of this study useful in providing provisions for teachers in developing critical thinking skills questions. https://jurnal.unimus.ac.id/index.php/JPKIMIA/index

Item Analysis
The test consisted of 50 items. The scores of 247 students ranged from 11 to 42 (out of 50). After getting the result, students were ranked in order of merit from the highest score the lowest score. The first 27% students made the high group and the last 27%, the low group. The p-value, DI and DE were analyzed for each item ( Table  7). Majority of the items (11 items, 73%) were of average (recommended) difficulty. Similarly, majority of items (73%) had good DI in no revision required and satisfactorily criteria. A combination of the two indices revealed that 5 (33%) items could be called 'ideal' having a p-value from 30 to 70, as well as a DI > 0.24. The total number of distractors were 60 (4 per item) out of which 13 (21.6%) were NF-Ds. Ten (67.7%) items had maximum two NF-Ds, while 5 (33.3%) items had effective distractors. Items with 2 NF-Ds had a high p-value and good DI; items number 1, 4, 6, and 14 with one or no NF-D were too difficult (p-value<20), and with poor DI (DI<0.19) that required to revision the items.

. Recommendations and Future Outlook
Several recommendations can be made as follows: (1) for instructor, the results of this study indicated that the critical thinking skill of students was relatively low, especially in chemistry subject. Therefore, it is necessary to organize training or workshop for teachers of chemistry subjects in particular and teachers of other subjects in general in creating learning tips that can reveal critical thinking skills of students, so that teachers can apply it in the learning process in their respective schools; (2) for teachers, the results obtained from this study are preliminary findings that should be followed up with activities to improve the learning process, self-development, increase professionalism, utilized the internet to enrich knowledge, and increase sharing with peers to be able to create familiar learning and sharpen the critical thinking skills of the students; (3) for other researchers, this study can be used as a reference and data source for use in research with different materials, larger population coverage and a wider area, and research on other high-order thinking skills, such as the ability to think creatively, innovatively, and scientifically; (4) for observers of chemistry education, in particular, considering the results of this study indicated that the critical thinking skills of students were relatively low, it is necessary to research critical thinking skills for teachers. This is maybe due to the low critical thinking skill of students because the teacher himself does not necessarily have adequate critical thinking skills.

CONCLUSION
The critical thinking questions have been developed based on the results of the reviews conducted by three reviewers as the topic, evaluation, and chemistry education experts. After improved, the quality of critical thinking questions was excellence (CVI=1.00) and suitable for use as a research instrument to measure critical thinking ability of the students. The reliability of the instrument was satisfactory with the Cronbach alpha 0.704 in the total participant and in each item. The level of critical thinking skills in the acid-base topic of 2 nd -grade students of senior HS in Yogyakarta at the 2018/2019 academic year showed six public SMA were 40.2% (moderate), six private SMA were 34.8% (deficient), and a combination of both (public and private HS) was 37.5% (deficient). Item analysis showed that majority of the items (11 items, 73%) were of average (recommended) difficulty and 73% of items had good DI (no revision required and satisfactorily criteria) with DE 67.5% of all distractors.